There is no 1:1 mapping between Scheme and Python data types. As a consequence, there are several cases, in which PyGuile has to guess how would the user like to have the arguments and result of a Python function converted. Instead of guessing, we would like to empower the user to be explicit about the kinds of conversion which he wants.
The following is a census of ambiguous data conversion cases, which I identified.
- Scheme pair ->
- Python 2-Tuple
- Python 2-List
- Scheme list ->
- Python Tuple
- Python List
- Nested tree of pairs (2-Tuples or 2-Lists)
- Python 2-Tuple or 2-List ->
- Scheme pair
- Scheme list
- Scheme rational (if the Pythonic data structure consists of two integer values)
- Scheme alist (association list) ->
- Python Dict
- Python Tuple/List of 2-Tuples
- Python string ->
- Scheme string
- Scheme symbol
- Scheme keyword
Additional considrations:
- Case sensitivity of symbols and keywords
- String representation of keywords in Guile has leading dash – to retain or remove it in the Python side of affairs?
- Python 1-character string ->
- Scheme char
- Scheme string
Additional considration: utf-8 encoded glyph is a sequence of few characters.
- Python int ->
- Scheme int
- Scheme bignum
- Scheme char
- Python None -> One of several possible values: ‘(), #f, SCM_EOL, ‘*None* or another custom Scheme value.
- Python (),[],{} ->
- Scheme ‘()
- SCM_EOL
- Custom Scheme value
- Scheme ‘() ->
- Python ()
- Python []
- Python {}
- SCM_EOL ->
- Python (),[],{}
- Python None
- Custom Python value
- Scheme rational ->
- Python Float
- 2-Tuple of Python Ints
- Scheme exact/inexact flag in numerical values – if and how to represent it in the Python side of the application?
- Giant data structures with sparse access needs – lazy vs. eager conversion
- Exception objects
- Objects of certain classes (vectors, ports, functions, images, etc.)
There is also the separate issue of string encoding/decoding, with which we deal by mandating that anything passing between Scheme and Python has to be utf-8 encoded.
One of the goals of PyGuile is to make it efficient to invoke Python library functions from Guile. Therefore, efficiency of conversion of function arguments and results is critical.
When there are no user hints, the following inefficiencies occur:
- PyGuile has to make a default (and possibly sub-optimal) choice when encountering one of the above ambiguous cases. Then the script using the data has to reformat it to match the data format to its actual needs.
- PyGuile has to identify the data type of each datum. The present implementation does not go into the internal representation of Guile (SCM) and Python (PyObject) objects, therefore PyGuile has to test for various data types one by one, until one of them matches the argument.
- Sometimes a Python procedure needs to do no processing on one of its arguments. The argument’s value needs only to be passed around as a pointer, or to be inserted into the right place in a result data structure. In such a case, it is desirable to use the most efficient conversion possible i.e. wrap/unwrap opaque objects. This is a generalization of the case of giant data structures with sparse access needs.
Therefore, when performance is critical, hints from the user would help not only to disembiguate the conversion process but also to speed it up.
The user hints will be implemented as follows.
With each function (Python function invoked from Guile, or Guile function invoked from Python) we associate two (possibly degenerate) signatures. One signature will contain the hints for converting the function’s arguments. The second signature will hint how to convert the function’s result. The signatures are Scheme lists, whose leaf nodes are symbols denoting conversion functions.
Chris Jester-Young, in his answer to my question in Stackoverflow, proposed the following function for traversing two corresponding tree structures, and applying the functions in one of them to data in the other one.
(define (map-traversing func data) (if (list? func) (map map-traversing func data) (func data)))
Using it requires unquoting. Example:
(map-traversing `((,car ,cdr) ,cadr) '(((aa . ab) (bb . bc)) (cc cd . ce)))
Our implementation will differ from the above in details, as the signatures’ leaf nodes do not denote proper Scheme functions.