| Version 8 (modified by sasha, 7 years ago) |
|---|
For a high-level overview of nan and masked array support, see also SciPy FAQ.
MaskedArray is a numpy ndarray look-alike that allows one to keep track of missing values.
MaskedArray is implemented in the [source:trunk/numpy/core/ma.py numpy.core.ma] module. The numpy ma module is originally written for Numeric by Paul Dubois and adapted for numpy by Travis Oliphant and (mainly) Paul Dubois.
Migration Guide
describe migration from Numeric.MA to numpy.ma here
Ufuncs and Masked Arrays
In changeset:1835, Travis added __array_wrap__ hook to the MaskedArray class. This was done in an attempt to fix mixed arithmetics. Unfortunately, there is not enought information within the __array_wrap__ hook to correctly generate the mask:
>>> from numpy import * >>> print ma.array([1])/ma.array([0]) [--] >>> print array([1])/ma.array([0]) [0]
In order to fix this situation, more information has to be passed to the __array_wrap__ hook by the ufunc. Sasha proposes to change the __array_wrap__ signature form
def __array_wrap__(self, arr)
to
def __array_wrap__(self, arr, context)
and make ufuncs pass a tuple context=(func, args, i), where func is the ufunc itself, args is the tuple of ufunc arguments and i is the index of self in the args tuple. Strictly speaking, i is not necessary, but it is available in ufunc and may prove to be helpful in the future.
Once ufuncs can handle the case of mixed arguments to binary operations, it is tempting to get rid of ma wrapers to ufuncs alltogether and implement ma logic entirely in __array_wrap__ and __array__ hooks. Unfortunately, __array__ hook suffers from the same problem: before passing data to ufunc ma array heeds to replace masked values with something safe for the given operation. In order to do this more information in needed than passed to __array__ hook. Sasha proposes to make a similar change to __array__ hook as for __array_wrap__ hook above.
The new signature will be
def __array__(self, dtype=None, context=None)
The ufuncs will pass to __array__ a tuple context=(func, args, i), where func is the ufunc itself, args is the tuple of ufunc arguments and i is the index of self in the args tuple. For backward compatibility ufuncs will allow a two-argument __array__ and classes that will take advantage of context will define __array__ with a default value for context so that it can be called with one or two arguments as well.
Remaining Issues
Some of the same issues that were resolved in numpy need to be revisited for ma. (See Numeric3.0 Design Document)
- What does single element indexing return? Scalars or rank-0 arrays?
An additional complication is that that single element may be masked.
- What should a single element indexing return for an unmasked element?
- What should a single element indexing return for a masked element?
As of changeset:1882, the answer is ma.masked. The singleton ma.masked is defined in [source:tags/0.9.2/numpy/core/ma.py ma.py] as follows:This changed from MA, where masked was defined as a rank-0 array. This definition leads to some surprizing properties:masked = MaskedArray([0], int, mask=[1])[0:0] masked = masked[0:0]
>>> from numpy.core.ma import * >>> x = array([1,2,3.0]) >>> x[1].shape (0,)
At the same time>>> x[0].shape ()
This can easily be fixed by changing the definition of "masked" back to rank-0 array. (Done in changeset:1888)>>> x[1].dtype <type 'int64_arrtype'>
At the same time>>> x[0].dtype <type 'float64_arrtype'>
Unlike the first problem, this one cannot be easily fixed without giving up the ability to check for mising values using>>> x[1] is masked True >>> x[0] is masked False
It is tempting to eliminate the special case and just use x[i].mask.all() and x[i].mask.any(), the constructs that have clear meaning for any number of elements. The downside of changing the return value of x[i] for masked elements is that "x[i] is masked" will silently break in a dangerous way - it will always be false.
- What should a single element indexing return for an unmasked element?
It may be safer to also change the name "masked" to say "missing" and educate users that x[i] is masked should be changed to x[i].mask.any(), x[i].mask.all() or even just x[i].mask as appropriate and x[i] = masked should be changed to x[i] = missing.
- Can arrays be used as truth values directly?
Attachments
-
ma_examples.py
(2.3 KB) - added by Pierre GM
7 years ago.
Ideas of implementation of std & median for masked arrays
-
ma-20060321.patch
(3.2 KB) - added by pgmdevlist@…
7 years ago.
Suggestions for a 'ma' patch
-
ma-200603280900.patch
(3.1 KB) - added by Pierre GM
7 years ago.
new patch for MA
-
testnewma.py
(5.7 KB) - added by Pierre GM
7 years ago.
New test suite for MA patch
