While working on the PyGuile, I identified the following design issues.
- The data type trees of Scheme and Python do not have an 1:1 correspondence.
- Do we want to convert a Scheme list into a Python Tuple or a Python List?
- How about an alist (associative list) – should be a Python List of 2-tuples or a Python Dict?
- And in the other direction – do we want to convert a Python string into a Scheme string, symbol or keyword?
- API for adding plugins which convert between Guile and Python representations of useful data types (such as file handles, images or Berkeley sockets).
- How do we want to pass large data structures – convert them immediately, or employ lazy conversion (convert an element only when it is requested)? If we employ lazy conversion, how do we implement the associated bookkeeping? See more about this below.
- How do we deal with the different garbage collection regimes of Guile and Python? In particular, how do we make SCM objects owned by Python objects known to the Guile garbage collector?
- How will we support Unicode? Bear in mind that we want to minimize manipulations of long text strings.
- How to allow each scripting language to seamlessly invoke functions in the other scripting language?
The problem of lack of 1:1 correspondence will be dealt with as follows.
A standard conversion convention, which will work for the overwhelming majority of cases, will be employed. Functions, which have special needs, will have their argument conversions specified by means of a suitable tree-structured template.
When passing a data structure (or object) created in language A to language B, the following cases can happen:
- Opaque pointer – B only passes it around. A performs all processing and B just holds the pointer for future reference.
- B accesses a single element (or small number of elements) in the data structure.
- B loops over all elements of the data structure.
- B needs arbitrary access to several elements of the data structure (example: image processing).
Those cases can be dealt with as follows:
- Case 1 can be handled by wrapping a language A pointer by a language B object, which carries opaque data around.
- Cases 2,3 can be dealt by means of custom data access procedures (such as Python’s __getitem__()). An element will be converted only when it is actually requested. Elements in nested data structures can be dealt with as in case 1.
- Case 4 can be handled by implementing a mechanism for plugging in and registering custom conversion functions for specific data types.
In practice, the most tough design issue, which I identified so far, is the management of the SCM objects owned by Python objects.
When a SCM object is assigned to an attribute of a Python object, some registration mechanism needs to
be invoked so that the SCM object can be reclaimed by the Guile garbage collector if the Python object goes out of scope. The registration mechanism needs also to take care of marking the SCM objects while they are owned by a living Python object.