Mlabwrap API Discussion
Background
The goal is an Mlabwrap API that works as expected for all matlab types. This does not necessarily mean duplication of matlab's API. See MlabProxyObjects for a discussion of specific problems with the current Mlabwrap API that have stimulated this discussion. The proposed solution of converting all matlab types to recarrays has been mostly abandoned for reasons that follow. The alternative involves Mlabproxyobjects, python objects that proxy corresponding matlab objects.
Problems with No Proxying
- Efficiency issues: requires copying all data back and forth between python and matlab on every mlabwrap call.
- Complicated matlab types: A matlab object can contain all sorts of things, some of which one is very unlikely to be able to marshall (convert to isomorphic python type, e.g. recarray) satisfactorily, e.g. file, socket or function handles, lambdas, references to other objects.
Move to Ctypes
We should attempt a move to ctypes and then rethink the fundamental design decisions, including what should be proxied and when and how the conversion ought to work, because if a ctype port works out it will be much less pain to explore various options and I think some experimentation will be necessary to get it right.
Possibility of Ndarray Subclass
Can we subclass ndarray, or duck-type it, such that, whenever you want to do something ndarray-like, a copy arises, but otherwise, everything is proxied from the matlab workspace? Never copy by default, but implement C API of array such that numpy.array and numpy.asarray return a copy. Is there any way of sharing memory between python and matlab with mmap?
I don't think this will work that well because I think you need to supply the shape (and data, unless you're happy with random garbage, which will of course still eat memory) when you create an ndarray subclass instance.
Well, I was wondering if we could subclass it in C., overriding references to the actual data such that it fetches and sets in the matlab workspace. I see, thinking about this, that this will get hairy until you get down to immutable types, because all other data returned will have to maintain its attachment to the matlab variable.
Bascially I think data sharing between matlab and python is impossible at least without unreasonable effort -- matlab's C interface is just to impoverished/braindamaged (e.g. you can't controll destruction of the data of a matlab array although you can pass something in).
Some API Issues
There might be no way to hide all the semantic differences between python and matlab. However, this is not a necessity. It should only be required that mlabwrap does what is expected in all circumstances, and this may differ from matlab. Here are some issues (We should come up with examples for each of these points):
Copy-on-write (matlab) vs Aliasing (python)
Slicing means copy-on-write (matlab) vs aliasing (python). One also doesn't want to get to clever, because it's impossible to know if some object in matlab space has been mutated (so we can't cache proxy->ndarray conversions) and implementing copy-on-write in python is highly unattractive.
Agreed.
Differeneces in Indexing Semantics
Matlab's indexing is all-at-once/non-recursive
Matlab's indexing is all-at-once/non-recursive (for lack of a better term) i.e. with thing(1:3).subthing{1}.subsubthing it's thing that gets to see *all* the subscripts and getattrs directly, whereas in python thing only gets to see [1:3], returns a new object which sees getattr(self, 'subthing') etc. Apart from efficiency considerations this this is another reason for a proxying scheme -- piecewise marshalling can't work in all cases (although it will for structs).
If subsref has been overridden this is also a problem for proxying, right? Or atleast it makes proxying complicated. Using your example in matlab: thing(1:3).subthing{1}.subsubthing. In this case things's (the matlab object) subsref gets the following indices and attribute names directly: 1:3, subthing, 1, subsubthing. subsref can return anything it wants to in this situation (since its been overridden), e.g. it could return some strange function of its arguments like the string "1:3subthing1subsubthing". In python the recursive fashion of indexing and attribute reference would not allow thing_proxy.get_attr to see all the indices and attribute names. In a proxy object get_attr could (and would) be overridden, but would not be able to get all the arguments it needs to duplicate the matlab behavior.
No stepping, ellipsis or newaxis in matlab
No stepping, ellipsis or newaxis in matlab; in python the dimensionality of an object is intrinsic in matlab it's partly a function of the number of indices (so we can't make some matlab array look completely like an ndarray without converting it); bizarre cell and struct indexing and assignment (see DEAL);
Maybe some of these can be worked out in mlabraw and the subclass of ndarray (i.e. in C). However, I'm not sure I totally understand what you mean. Please help clarify with an example.
No end and 1-based indexing in python
No end and 1-based indexing in python. I think the best option is to fiddle mlab.x[-1] to x(end) etc. but it's worth noting that this is not completely general: if someone builds a class that uses negative subscripts in matlab with a different meaning from end-n, then this won't work (one could still use mlab.subsref though).
Would it be alright to ask the user to call idx = mlab.end(A) for this sort of indexing?
Problems with Duck-typed Proxies
(Badly written) python code might choke on duck-typed ndarray-like proxies
True. For example: code that depends on the result of instanceof(duckObject, ndarray). Are there other examples that are important? Instead of duck-typing can we actually subclass ndarray? What are some of the difficulties with this?
Python and Matlab Numeric Types
As I wrote before There is no 1:1 correspondence between matlab and python types, often several plausible candidates exist for any one type; so any scheme will need to make some tradeoffs based on anticipated usage patterns.
Not clear that this is a major problem. Seems like matlab numeric types all have corresponding python numeric types and python numeric types can be cast to smallest matlab numeric type without data loss (or least data loss when impossible.) Please give an example.
Matlab Operator Overloading
Operator overloading and mixed python-object/proxy object addition, subscripting etc. also likely has some room for pitfalls.
Not sure what this is. Maybe an example could help clarify.
