Changes between Version 20 and Version 21 of MaskedArray

Show
Ignore:
Timestamp:
10/16/06 01:11:32 (7 years ago)
Author:
pierregm
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MaskedArray

    v20 v21  
    149149 * Can arrays be used as truth values directly? 
    150150 
     151---- 
     152= An alternative implementation of MaskedArray = 
     153 
     154As a regular user of MaskedArray, I became increasingly frustrated with the subclassing of masked arrays (even if I can only blame my inexperience). I needed to develop a class of arrays that could store some additional information along with numerical values, while keeping the possibility for missing data (picture storing a series of dates along with measurements). I started to implement such a class, but then quickly realized that any additional information disappeared when processing these subarrays (for example, adding a constant value to a subarray would erase its dates). I ended up writing the equivalent of numpy.core.ma for my particular class, ufuncs included. Everything went fine until I needed to subclass my new class, when more problems showed up: some attributes of the new subclass were lost during processing. I identified the culprit as MaskedArray, which returns masked ndarrays when I expected masked arrays of my class. I was preparing myself to rewrite numpy.core.ma when I forced myself to learn how to subclass ndarrays. As I became more familiar with the {{{__new__}}} and {{{__array_finalize__}}} methods, I started to wonder why masked arrays were objects, and not ndarrays, and whether it wouldn't be more convenient for subclassing if they did behave like regular ndarrays. 
     155 
     156The attachment is what I eventually come up with. The main differences with the initial {{{numpy.core.ma}}} package are that {{{MaskedArray}}} is now a subclass of {{{ndarray}}} and that the {{{_data}}} section can now be any subclass of {{{ndarray}}} (well, it should work in most cases, some tweaking might required here and there). Apart from a couple of issues listed below, the behavior of the new {{{MaskedArray}} class reproduces the old one. It is quite likely to be significantly slower, though: I was more interested into a clear organization than in performance, so I tended to use wrappers liberally. I'm sure  we can improve that rather easily. Note that I didn't try to time any methods. 
     157I also attach a unittest suite (here), modeled after the standard numpy one, along with some utiliies for testing (here). The old {{{test_ma}}} can also be run with the new package but it does fail in some places, see below. 
     158 
     159=== Main differences === 
     160  * {{{fill_value}}} is now a property, not a function. 
     161  * in the majority of cases, the mask is forced to {{{nomask}}} when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled. 
     162  * I got rid of the {{{share_mask}}} flag, I never understood its purpose.  
     163  * {{{put}}}, {{{putmask}}} and {{{take}}} now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, {{{put}}} and {{{putmask}}} both update the mask when needed. 
     164  * if {{{a}}} is a masked array, {{{bool(a)}}} raises a {{{ValueError}}}, as it does with ndarrays. 
     165  * in the same way, the comparison of two masked arrays is a masked array, not a boolean 
     166  * {{{filled(a)}}} returns an array of the same subclass as {{{a._data}}}, and no test is performed on whether it is contiguous or not. 
     167  * the mask is always printed, even if it's {{{nomask}}}, which makes things easier (for me at least) to remember that a masked array is used. 
     168  * {{{cumsum}}} works as if the {{{_data}}} array was filled with 0. The mask is preserved, but not updated. 
     169  * {{{cumprod}}} works as if the {{{_data}}} array was filled with 1. The mask is preserved, but not updated. 
     170 
     171=== New features === 
     172  * the {{{mr_}}} function mimics {{{r_}}} for masked arrays. 
     173  * the {{{anom}}} method returns the anomalies (deviations from the average) 
     174  * the {{{stdu}}} and {{{varu}}} return unbiased estimates of the standard deviation and variance, respectively. 
     175 
     176=== Using the new package with numpy.core.ma === 
     177  I tried to make sure that the new package can understand old masked arrays. Unfortunately, there's no upward compatibility. 
     178For example: 
     179{{{ 
     180>>> import numpy.core.ma as old_ma 
     181>>> import maskedarray as new_ma 
     182>>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) 
     183>>> x 
     184array(data = 
     185 [     1      2 999999      4      5], 
     186      mask = 
     187 [False False True False False], 
     188      fill_value=999999) 
     189>>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) 
     190>>> y 
     191array(data = [1 2 -- 4 5], 
     192      mask = [False False True False False], 
     193      fill_value=999999) 
     194>>> x==y 
     195array(data = 
     196 [True True True True True], 
     197      mask = 
     198 [False False True False False], 
     199      fill_value=?) 
     200>>> old_ma.getmask(x) == new_ma.getmask(x) 
     201array([True, True, True, True, True], dtype=bool) 
     202>>> old_ma.getmask(y) == new_ma.getmask(y) 
     203array([True, True, False, True, True], dtype=bool) 
     204>>> old_ma.getmask(y) 
     205False 
     206}}} 
     207  A basic consequence is that {{{matplotlib}}} will not recognize new masked arrays as such. The file {{{matplotlib/numerix/ma/__init__.py}}} must be modified to call the new package instead of {{{numpy.core.ma}}}. 
     208 
     209   
     210Please note that it's still a work in progress (even if it seems to work quite OK when I use it). Suggestions, comments, improvements and general feedback are more than welcome ! 
     211 
     212 
     213 
     214