'''Note:''' The MaskedArray branch has been merged into the numpy trunk. For a high-level overview of nan and masked array support, see also [http://new.scipy.org/Wiki/FAQ#head-fff4d6fce7528974185715153cfbc1a191dcb915 SciPy FAQ]. MaskedArray is a numpy ndarray look-alike that allows one to keep track of missing values. MaskedArray is implemented in the [source:trunk/numpy/core/ma.py numpy.core.ma] module and also in the maskedarray module in the sandbox (see below). The numpy ma module is originally written for Numeric by Paul Dubois and adapted for numpy by Travis Oliphant and (mainly) Paul Dubois. ---- = Migration Guide = 1. a.mask and getmask(a) will now return ma.nomask constant, which is defined as an array boolean scalar False_. If your code relies on an explicit "m is None" check, it should be changed to "m is nomask." In many case this check will now be redundant because nomask provides full array interface. For example "m is None or not sometrue(m)" can now be written as "not m.any()". = Missing features (work in progress) = Some current features of numpy are not yet implemented for ma, either because they were introduced to numpy only recently (eg {{{ndim}}} ?), or because they were never adapted to ma in the first place (eg, the {{{mlab}}} package). As Paul Dubois noted, it does not make sense to extend the handling of missing values to all numpy features (a typical example would be the FFT package). However, ma is still invaluable in many cases, and it's unfortunate that its use is currently a bit limited. A non exhaustive list of missing features is presented below. The features are organized by potential problems and naive suggestions to solve them. More features will be added as I run into them. === Case 1 === The function would work OK with masked arrays if it called {{{ma.asarray}}} instead of {{{numeric.asarray}}} (as it's currently the case). A fix could be to add a {{{mask=False_}}} property by default to any {{{ndarray}}}, and get rid of the {{{MaskedArray}}} class ? A second possibility would be to check in {{{numeric.asarray}}} whether the argument is already a (masked) array. * '''{{{diff}}}''' === Case 2 === The function can be applied only to the data part once missing values are adequately filled. If needed, the masked version is obtained easily by applying the initial mask to the result. An {{{use_missing}}} option could be introduced to allow the use of missing values (the output would be masked), or discard them (default option?). * '''{{{ndim}}}''': The masked array could inherit {{{ndim}}} from its {{{data}}} part. Implemented in changeset:2185. * '''{{{std, var}}}''': An example of implementation of the function (not the method) {{{std}}} is given [attachment:ma_examples.py there]. A suggestion for the method implementation is presented [attachment:ma-200603280900.patch in this patch]. * '''{{{trace}}}''': Fill with 0 if {{{use_missing}}} is False. Please check [attachment:ma-20060321.patch attached patch] ([attachment:ma-20060321.patch?format=raw download]). Implemented in r2267. * '''{{{cumprod, cumsum}}}''': * {{{use_missing=True}}}: The output is masked for indices [''i''...''N''], where ''i'' is the index of the first missing value, and ''N'' the nb of data (including missing). * {{{use_missing=False}}}: Fill the initial missing values by 1 for {{{cumprod}}} or 0 for {{{cumsum}}}. A simple implementation of {{{cumprod}}} and {{{cumsum}}}, without the questionable {{{use-missing}}} flag, is suggested [attachment:ma-200603280900.patch in this patch]. * '''{{{clip}}}''': Please check [attachment:ma-20060321.patch attached patch] ([attachment:ma-20060321.patch?format=raw download]). Implemented in r2267. === Case 3 === The function must be applied to both the data part and the mask. I assume it's the case for most of the functions in '''{{{shape_base}}}''', '''{{{index_trick}}}'''. As an illustration, a quick and dirty adaptation of the concatenator {{{r_}}} could be: {{{ #!python mar_ = lambda seq:ma.array(data=[s.data for s in seq],mask=[s.mask for s in seq]) }}} * '''{{{swapaxes, squeeze}}}''': {{{swapaxes}}} is implemented in the [attachment:ma-20060321.patch attached patch] ([attachment:ma-20060321.patch?format=raw download]). Implemented in r2267. === Case 4 === The trickiest case where missing values must be remain masked during the process. * '''{{{median}}}''': The two functions in [attachment:ma_examples.py this attachment] (seem to) work well for 1- and 2D arrays. The problem gets more complex for higher dimensions. ---- = Ufuncs and Masked Arrays = In changeset:1835, Travis added {{{__array_wrap__}}} hook to the {{{MaskedArray}}} class. This was done in an attempt to fix mixed arithmetics. Unfortunately, there is not enought information within the {{{__array_wrap__}}} hook to correctly generate the mask: {{{ >>> from numpy import * >>> print ma.array([1])/ma.array([0]) [--] >>> print array([1])/ma.array([0]) [0] }}} In order to fix this situation, more information has to be passed to the {{{__array_wrap__}}} hook by the ufunc. Sasha proposes to change the {{{__array_wrap__}}} signature form {{{ def __array_wrap__(self, arr) }}} to {{{ def __array_wrap__(self, arr, context) }}} and make ufuncs pass a tuple context=(func, args, i), where func is the ufunc itself, args is the tuple of ufunc arguments and i is the index of self in the args tuple. Strictly speaking, i is not necessary, but it is available in ufunc and may prove to be helpful in the future. Extended {{{__array_wrap__}}} is implemented in changeset:1898. Once ufuncs can handle the case of mixed arguments to binary operations, it is tempting to get rid of ma wrapers to ufuncs alltogether and implement ma logic entirely in {{{__array_wrap__}}} and {{{__array__}}} hooks. Unfortunately, {{{__array__}}} hook suffers from the same problem: before passing data to ufunc ma array heeds to replace masked values with something safe for the given operation. In order to do this more information in needed than passed to {{{__array__}}} hook. Sasha proposes to make a similar change to {{{__array__}}} hook as for {{{__array_wrap__}}} hook above. The new signature will be {{{ def __array__(self, dtype=None, context=None) }}} The ufuncs will pass to {{{__array__}}} a tuple context=(func, args, i), where func is the ufunc itself, args is the tuple of ufunc arguments and i is the index of self in the args tuple. For backward compatibility ufuncs will allow a two-argument {{{__array__}}} and classes that will take advantage of context will define {{{__array__}}} with a default value for context so that it can be called with one or two arguments as well. Implemented in changeset:1929. ---- = Remaining Issues = Some of the same issues that were resolved in numpy need to be revisited for ma. (See [http://www.scipy.org/wikis/numdesign Numeric3.0 Design Document]) * What does single element indexing return? Scalars or rank-0 arrays?[[BR]] An additional complication is that that single element may be masked.[[BR]][[BR]] * What should a single element indexing return for an unmasked element?[[BR]][[BR]] * What should a single element indexing return for a masked element?[[BR]][[BR]] As of changeset:1882, the answer is ma.masked. The singleton ma.masked is defined in [source:tags/0.9.2/numpy/core/ma.py ma.py] as follows: {{{ #!python masked = MaskedArray([0], int, mask=[1])[0:0] masked = masked[0:0] }}} This changed from MA, where masked was defined as a rank-0 array. This definition leads to some surprising properties: {{{ >>> from numpy.core.ma import * >>> x = array([1,2,3.0]) >>> x[1].shape (0,) }}} At the same time {{{ >>> x[0].shape () }}} This can easily be fixed by changing the definition of "masked" back to rank-0 array. (Done in changeset:1888) {{{ >>> x[1].dtype }}} At the same time {{{ >>> x[0].dtype }}} Unlike the first problem, this one cannot be easily fixed without giving up the ability to check for mising values using {{{ >>> x[1] is masked True >>> x[0] is masked False }}} It is tempting to eliminate the special case and just use x[i].mask.all() and x[i].mask.any(), the constructs that have clear meaning for any number of elements. The downside of changing the return value of x[i] for masked elements is that "x[i] is masked" will silently break in a dangerous way - it will always be false. It may be safer to also change the name "masked" to say "missing" and educate users that x[i] is masked should be changed to x[i].mask.any(), x[i].mask.all() or even just x[i].mask as appropriate and x[i] = masked should be changed to x[i] = missing. * Can arrays be used as truth values directly? ---- = An alternative implementation of MaskedArray = [This page has been move to the [http://projects.scipy.org/scipy/numpy/wiki/MaskedArrayAlternative MaskedArrayAlternative] page.]