| 154 | | '''Note: the new implementation of MaskedArray is now available in the scipy sandbox. ''' |
| 155 | | |
| 156 | | As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became increasingly frustrated with the subclassing of masked arrays (even if I can only blame my inexperience). I needed to develop a class of arrays that could store some additional information along with numerical values, while keeping the possibility for missing data (picture storing a series of dates along with measurements). I started to implement such a class, but then quickly realized that any additional information disappeared when processing these subarrays (for example, adding a constant value to a subarray would erase its dates). I ended up writing the equivalent of {{{numpy.core.ma}}} for my particular class, ufuncs included. Everything went fine until I needed to subclass my new class, when more problems showed up: some attributes of the new subclass were lost during processing. I identified the culprit as MaskedArray, which returns masked ndarrays when I expected masked arrays of my class. I was preparing myself to rewrite numpy.core.ma when I forced myself to learn how to subclass ndarrays. As I became more familiar with the {{{__new__}}} and {{{__array_finalize__}}} methods, I started to wonder why masked arrays were objects, and not ndarrays, and whether it wouldn't be more convenient for subclassing if they did behave like regular ndarrays. |
| 157 | | |
| 158 | | The new {{{maskedarray}}} is what I eventually come up with. The main differences with the initial {{{numpy.core.ma}}} package are that {{{MaskedArray}}} is now a subclass of {{{ndarray}}} and that the {{{_data}}} section can now be any subclass of {{{ndarray}}} (well, it should work in most cases, some tweaking might required here and there). Apart from a couple of issues listed below, the behavior of the new {{{MaskedArray}}} class reproduces the old one. Initially the maskedarray implementation was marginally slower than numpy.ma in some areas, but work is underway to speed it up; the expectation is that it can be made substantially faster than the present numpy.ma. |
| 159 | | |
| 160 | | I also attach a unittest suite, modeled after the standard numpy one, along with some utilities for testing. The old {{{test_ma}}} can also be run with the new package but it does fail in some places, see below. |
| 161 | | |
| 162 | | Note that if the subclass has some special methods and attributes, they are not propagated to the masked version: this would require a modification of the {{{__getattribute__}}} method (first trying {{{ndarray.__getattribute__}}}, then trying {{{self._data.__getattribute__}}} if an exception is raised in the first place), which really slows things down. |
| 163 | | |
| 164 | | === Main differences === |
| 165 | | * The {{{_data}}} part of the masked array can be any subclass of ndarray (but not recarray, cf below). |
| 166 | | * {{{fill_value}}} is now a property, not a function. |
| 167 | | * in the majority of cases, the mask is forced to {{{nomask}}} when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled. |
| 168 | | * I got rid of the {{{share_mask}}} flag, I never understood its purpose. |
| 169 | | * {{{put}}}, {{{putmask}}} and {{{take}}} now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, {{{put}}} and {{{putmask}}} both update the mask when needed. |
| 170 | | * if {{{a}}} is a masked array, {{{bool(a)}}} raises a {{{ValueError}}}, as it does with ndarrays. |
| 171 | | * in the same way, the comparison of two masked arrays is a masked array, not a boolean |
| 172 | | * {{{filled(a)}}} returns an array of the same subclass as {{{a._data}}}, and no test is performed on whether it is contiguous or not. |
| 173 | | * the mask is always printed, even if it's {{{nomask}}}, which makes things easier (for me at least) to remember that a masked array is used. |
| 174 | | * {{{cumsum}}} works as if the {{{_data}}} array was filled with 0. The mask is preserved, but not updated. |
| 175 | | * {{{cumprod}}} works as if the {{{_data}}} array was filled with 1. The mask is preserved, but not updated. |
| 176 | | |
| 177 | | === New features === |
| 178 | | * the {{{mr_}}} function mimics {{{r_}}} for masked arrays. |
| 179 | | * the {{{anom}}} method returns the anomalies (deviations from the average) |
| 180 | | * the {{{stdu}}} and {{{varu}}} return unbiased estimates of the standard deviation and variance, respectively. |
| 181 | | |
| 182 | | === Using the new package with numpy.core.ma === |
| 183 | | I tried to make sure that the new package can understand old masked arrays. Unfortunately, there's no upward compatibility. |
| 184 | | For example: |
| 185 | | {{{ |
| 186 | | >>> import numpy.core.ma as old_ma |
| 187 | | >>> import maskedarray as new_ma |
| 188 | | >>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) |
| 189 | | >>> x |
| 190 | | array(data = |
| 191 | | [ 1 2 999999 4 5], |
| 192 | | mask = |
| 193 | | [False False True False False], |
| 194 | | fill_value=999999) |
| 195 | | >>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) |
| 196 | | >>> y |
| 197 | | array(data = [1 2 -- 4 5], |
| 198 | | mask = [False False True False False], |
| 199 | | fill_value=999999) |
| 200 | | >>> x==y |
| 201 | | array(data = |
| 202 | | [True True True True True], |
| 203 | | mask = |
| 204 | | [False False True False False], |
| 205 | | fill_value=?) |
| 206 | | >>> old_ma.getmask(x) == new_ma.getmask(x) |
| 207 | | array([True, True, True, True, True], dtype=bool) |
| 208 | | >>> old_ma.getmask(y) == new_ma.getmask(y) |
| 209 | | array([True, True, False, True, True], dtype=bool) |
| 210 | | >>> old_ma.getmask(y) |
| 211 | | False |
| 212 | | }}} |
| 213 | | |
| 214 | | === Using maskedarray with matplotlib === |
| 215 | | By default matplotlib still uses numpy.ma, but there is an rcParams setting that you can use to select maskedarray instead. In the matplotlibrc file you will find: |
| 216 | | |
| 217 | | {{{ |
| 218 | | #maskedarray : False # True to use external maskedarray module |
| 219 | | # instead of numpy.ma; this is a temporary |
| 220 | | # setting for testing maskedarray. |
| 221 | | }}} |
| 222 | | |
| 223 | | Uncomment and set to True to select maskedarray everywhere. Alternatively, you can test a script with maskedarray by using a command-line option, e.g.: |
| 224 | | |
| 225 | | {{{ |
| 226 | | python simple_plot.py --maskedarray |
| 227 | | }}} |
| | 154 | [This page has been move to the [http://projects.scipy.org/scipy/numpy/wiki/MaskedArrayAlternative MaskedArrayAlternative] page.] |