Every project, including software development projects, needs an identity. It needs a definition of its boundaries. It has to be clear about what is inside it and what is outside of it.
Without such a definition, the project would try to be too many things for many people, and as a result, its products would not be really useful for anyone.
A project’s identity makes it easier and faster to make design and trade-off decisions.
Given the above trite introduction, and given that there is a list of goals for the PyGuile project, a list of non-goals is needed as well and here it is.
- Theoretical academic purity – attempt to convert every data type from Guile to Python and vice versa, and to support the whole range of values assumed by each data type.
- Ability to mix code snippets from both Scheme and Python in the same source code file.
- Invocation of machine language libraries (static or DLLs) – for this purpose, there are already existing tools (SWIG and PerlXS).
- Framework for making it easy to add support for interoperation with yet another scripting language.
There are also some goals, which are low priority and I do not plan to shed tears if they prove to be impossible to achieve without significant effort:
- Tail recursion support
While working on the PyGuile, I identified the following design issues.
- The data type trees of Scheme and Python do not have an 1:1 correspondence.
- Do we want to convert a Scheme list into a Python Tuple or a Python List?
- How about an alist (associative list) – should be a Python List of 2-tuples or a Python Dict?
- And in the other direction – do we want to convert a Python string into a Scheme string, symbol or keyword?
- API for adding plugins which convert between Guile and Python representations of useful data types (such as file handles, images or Berkeley sockets).
- How do we want to pass large data structures – convert them immediately, or employ lazy conversion (convert an element only when it is requested)? If we employ lazy conversion, how do we implement the associated bookkeeping? See more about this below.
- How do we deal with the different garbage collection regimes of Guile and Python? In particular, how do we make SCM objects owned by Python objects known to the Guile garbage collector?
- How will we support Unicode? Bear in mind that we want to minimize manipulations of long text strings.
- How to allow each scripting language to seamlessly invoke functions in the other scripting language?
The problem of lack of 1:1 correspondence will be dealt with as follows.
A standard conversion convention, which will work for the overwhelming majority of cases, will be employed. Functions, which have special needs, will have their argument conversions specified by means of a suitable tree-structured template.
When passing a data structure (or object) created in language A to language B, the following cases can happen:
- Opaque pointer – B only passes it around. A performs all processing and B just holds the pointer for future reference.
- B accesses a single element (or small number of elements) in the data structure.
- B loops over all elements of the data structure.
- B needs arbitrary access to several elements of the data structure (example: image processing).
Those cases can be dealt with as follows:
- Case 1 can be handled by wrapping a language A pointer by a language B object, which carries opaque data around.
- Cases 2,3 can be dealt by means of custom data access procedures (such as Python’s __getitem__()). An element will be converted only when it is actually requested. Elements in nested data structures can be dealt with as in case 1.
- Case 4 can be handled by implementing a mechanism for plugging in and registering custom conversion functions for specific data types.
In practice, the most tough design issue, which I identified so far, is the management of the SCM objects owned by Python objects.
When a SCM object is assigned to an attribute of a Python object, some registration mechanism needs to
be invoked so that the SCM object can be reclaimed by the Guile garbage collector if the Python object goes out of scope. The registration mechanism needs also to take care of marking the SCM objects while they are owned by a living Python object.
For long time I have dreamt of invoking Python libraries from scripts written in Scheme. The reason for this is to be able to enjoy the fantastically rich control structures possible in Scheme, yet use familiar libraries to accomplish useful actions, some of which are unavailable in SLIB and other Scheme libraries.
Now at last I am working on realizing this dream. The Scheme implementation being used is version 1.6 of Guile and the Guile extension being developed embeds a Python 2.4 interpreter. In the future, more recent versions of Guile and Python will be used.
The goals of the project are:
- Make it easy to invoke Python libraries from Guile.
- The integration between Python and Guile is to be seamless.
- The architecture of the implementation shall enable optimizations for efficient runtime behavior.
To accomplish those goals, it is necessary to:
- Convert primitive Scheme data types (integers, reals, Booleans, strings, lists) into the corresponding Python data types, and vice versa.
- Be able to invoke functions defined in one language from the other language. This has to be bidirectional in order to support callbacks.
- Be able to pass around pointers to objects (as opaque values) and invoke methods over them.
- Have efficient transfer of control and data between both languages.
- Deal with different garbage collection conventions in both environments.
- Be able to optimize code for a particular pair of language runtime systems.
- Nice to have: support for recursion, especially tail recursion.
- Nice to have: thread-safety.
It is envisioned that the software developed in this project will be part of a larger system, which will allow more scripting languages to interoperate with Guile and with each other.
There is another project – Schemepy – which embeds a Scheme interpreter in Python scripts. This project has different focus and it essentially allows Scheme to be used for those parts of a project, in which its strengths are especially important.
One day I found myself in need of Python code, which retrieves Unicode data from Microsoft SQL Server tables. The code needs to run on a PC with MS-Windows XP.
The dbi and odbc modules, which I used in the past, failed miserably in this task, by forcing the Unicode data to be converted into string data, using the ascii encoder.
So, I had to look for other Python modules. My findings from evaluating the relevant Python modules are summarized below.
- dbi,odbc from pywin32
- Package: pywin32-210.win32-py2.5.exe, available from Python for Windows Extensions.
- Textual data is passed as strings, rather than as Unicode.
- Parameters in SQL queries are marked by ‘?’.
- Dates/times are retrieved as instances of the dbi.dbiDate class (essentially, a wrapped long int).
- I was not successful in using the win32com based code, which worked for
Arik Baratz. According to him, this code uses the Microsoft ActiveX Data Objects 2.8 Library. It requires the modified version 209.1 of pywin32, which comes with version 188.8.131.52 of the ActiveState Python distribution. This modified version adds to the win32com class an extra member – client.
You need to add the following line sometime after the import win32com:
To actually start working, use win32com.client.Dispatch() to establish a connection to the SQL Server.
- Package: pyodbc-2.0.39.win32-py2.5.exe, available from pyodbc – A Python DB API module for ODBC
- Textual data is passed as Unicode.
- Parameters in SQL queries are marked by ‘?’.
- Dates/times are retrieved as instances of the datetime.datetime class.
The Python module chosen is pyodbc.