numeric to numpy transition at shfj
or array technical challenges ;-)
Here i post a few thought of how we can make a good transition to numpy, summarizing what the issue are in each of our intended library usage. AIMS, fff, rpy ?, pygsl ?
AIMS
Aims C Volumes stores data in a (contiguous?) data area which can be exposed as a python arrays. So far (numeric), the array is wrapped with the arraydata() python method, which is calling some hand-written containing an PyArray_FromDimsAndData( 4, &dims[0], PyArray_SHORT, (char *) &sipCpp->at(0)); line. The C Volume still own the datas. Another trick is that the Volume python wrapper constructor set a weakref on the py. array. (The numeric array still add its normal reference on the volume wrapper when created). That reference allows the Volume to know about the array at the python level. The weakref defines a callback which removes itself when the last numeric wrapper get destroyed, and allows to avoid cyclic reference (we didn't know python deals with those correctly ;) The numeric to numpy transition should require some adjustment in the C wrapping code.
fff
fff uses a lot of arrays, for fff_matrix and fff_vectors type. So far, we binded using numeric.
- No allocation is ever made inside of the C part. ie. Arrays are owned by Python
(Might cause problem sometimes => We'll make possible to allocate arrays in the C part and return them as new python objects owning the datas. So far, we never need to have C-owned objects
- Our fff functions assume contiguous arrays, forcing conversion at each call.
Might change, since using strides doesn't look such a big deal.
- Several C objects/structs binded so far.
Not really good since most of them are high-level objects. We'll probably move most of them to python-only.
- Bindings are generated by sip (C++/Python binding generator),
which might cause subtile problems with the malloc/new free/del duality. Since we plan to drop most binded type except arrays, might just do plain hand-made wrappers. In progress. The new api will be using numpy.
rpy
R python bindings allows to call R statistical functions on python types. Can optionnaly compile with Numeric support. From the source code, it seems that all datas, whether python sequence or Numeric arrays, are converted to an Python sequence, then copied into a corresponding R structure. (It thus seems that a numeric array is copied twice, probably some historical reasons). Not so efficient therefore, but nowhere near a showstopper. Since nothing in the binding really make uses of numeric low-level features, converting from numeric to numpy means virtually no work.
See:
PyGSL
Those are bindings for the Gnu Scientific Library. The GSL itselfs does mostly not support strides on arrays. The matrix data type maintains a 2nd-dimension stride (ie. passing from one line to the next) but none 1st dimension one. Suprisingly, the Vector type do support such (1st-dim) stride, but probably only as a mean to index inside of a matrix. The PyGSL bindings apparentely manage to convert the stride information when possible, but complain otherwise.
