Changes between Version 33 and Version 34 of MaskedArray

Show
Ignore:
Timestamp:
08/25/07 21:31:20 (6 years ago)
Author:
pierregm
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MaskedArray

    v33 v34  
    152152= An alternative implementation of MaskedArray = 
    153153 
    154 '''Note: the new implementation of MaskedArray is now available in the scipy sandbox. ''' 
    155  
    156 As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became increasingly frustrated with the subclassing of masked arrays (even if I can only blame my inexperience). I needed to develop a class of arrays that could store some additional information along with numerical values, while keeping the possibility for missing data (picture storing a series of dates along with measurements). I started to implement such a class, but then quickly realized that any additional information disappeared when processing these subarrays (for example, adding a constant value to a subarray would erase its dates). I ended up writing the equivalent of {{{numpy.core.ma}}} for my particular class, ufuncs included. Everything went fine until I needed to subclass my new class, when more problems showed up: some attributes of the new subclass were lost during processing. I identified the culprit as MaskedArray, which returns masked ndarrays when I expected masked arrays of my class. I was preparing myself to rewrite numpy.core.ma when I forced myself to learn how to subclass ndarrays. As I became more familiar with the {{{__new__}}} and {{{__array_finalize__}}} methods, I started to wonder why masked arrays were objects, and not ndarrays, and whether it wouldn't be more convenient for subclassing if they did behave like regular ndarrays. 
    157  
    158 The new {{{maskedarray}}} is what I eventually come up with. The main differences with the initial {{{numpy.core.ma}}} package are that {{{MaskedArray}}} is now a subclass of {{{ndarray}}} and that the {{{_data}}} section can now be any subclass of {{{ndarray}}} (well, it should work in most cases, some tweaking might required here and there). Apart from a couple of issues listed below, the behavior of the new {{{MaskedArray}}} class reproduces the old one. Initially the maskedarray implementation was marginally slower than numpy.ma in some areas, but work is underway to speed it up; the expectation is that it can be made substantially faster than the present numpy.ma. 
    159   
    160 I also attach a unittest suite, modeled after the standard numpy one, along with some utilities for testing. The old {{{test_ma}}} can also be run with the new package but it does fail in some places, see below. 
    161  
    162 Note that if the subclass has some special methods and attributes, they are not propagated to the masked version: this would require a modification of the {{{__getattribute__}}} method (first trying {{{ndarray.__getattribute__}}}, then trying {{{self._data.__getattribute__}}} if an exception is raised in the first place), which really slows things down.  
    163  
    164 === Main differences === 
    165   * The {{{_data}}} part of the masked array can be any subclass of ndarray (but not recarray, cf below). 
    166   * {{{fill_value}}} is now a property, not a function. 
    167   * in the majority of cases, the mask is forced to {{{nomask}}} when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled. 
    168   * I got rid of the {{{share_mask}}} flag, I never understood its purpose.  
    169   * {{{put}}}, {{{putmask}}} and {{{take}}} now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, {{{put}}} and {{{putmask}}} both update the mask when needed. 
    170   * if {{{a}}} is a masked array, {{{bool(a)}}} raises a {{{ValueError}}}, as it does with ndarrays. 
    171   * in the same way, the comparison of two masked arrays is a masked array, not a boolean 
    172   * {{{filled(a)}}} returns an array of the same subclass as {{{a._data}}}, and no test is performed on whether it is contiguous or not. 
    173   * the mask is always printed, even if it's {{{nomask}}}, which makes things easier (for me at least) to remember that a masked array is used. 
    174   * {{{cumsum}}} works as if the {{{_data}}} array was filled with 0. The mask is preserved, but not updated. 
    175   * {{{cumprod}}} works as if the {{{_data}}} array was filled with 1. The mask is preserved, but not updated. 
    176  
    177 === New features === 
    178   * the {{{mr_}}} function mimics {{{r_}}} for masked arrays. 
    179   * the {{{anom}}} method returns the anomalies (deviations from the average) 
    180   * the {{{stdu}}} and {{{varu}}} return unbiased estimates of the standard deviation and variance, respectively. 
    181  
    182 === Using the new package with numpy.core.ma === 
    183   I tried to make sure that the new package can understand old masked arrays. Unfortunately, there's no upward compatibility. 
    184 For example: 
    185 {{{ 
    186 >>> import numpy.core.ma as old_ma 
    187 >>> import maskedarray as new_ma 
    188 >>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) 
    189 >>> x 
    190 array(data = 
    191  [     1      2 999999      4      5], 
    192       mask = 
    193  [False False True False False], 
    194       fill_value=999999) 
    195 >>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0]) 
    196 >>> y 
    197 array(data = [1 2 -- 4 5], 
    198       mask = [False False True False False], 
    199       fill_value=999999) 
    200 >>> x==y 
    201 array(data = 
    202  [True True True True True], 
    203       mask = 
    204  [False False True False False], 
    205       fill_value=?) 
    206 >>> old_ma.getmask(x) == new_ma.getmask(x) 
    207 array([True, True, True, True, True], dtype=bool) 
    208 >>> old_ma.getmask(y) == new_ma.getmask(y) 
    209 array([True, True, False, True, True], dtype=bool) 
    210 >>> old_ma.getmask(y) 
    211 False 
    212 }}} 
    213    
    214 === Using maskedarray with matplotlib === 
    215 By default matplotlib still uses numpy.ma, but there is an rcParams setting that you can use to select maskedarray instead.  In the matplotlibrc file you will find: 
    216  
    217 {{{ 
    218 #maskedarray : False       # True to use external maskedarray module 
    219                            # instead of numpy.ma; this is a temporary 
    220                            # setting for testing maskedarray. 
    221 }}} 
    222  
    223 Uncomment and set to True to select maskedarray everywhere.  Alternatively, you can test a script with maskedarray by using a command-line option, e.g.: 
    224  
    225 {{{ 
    226 python simple_plot.py --maskedarray 
    227 }}} 
     154[This page has been move to the  [http://projects.scipy.org/scipy/numpy/wiki/MaskedArrayAlternative MaskedArrayAlternative] page.] 
    228155 
    229156 
    230 === Revision notes === 
    231   * 01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version! 
    232   * 10/28/2006 : Updated {{{put}}}, deleted {{{putmask}}} to match numpy 1.0 
    233  
    234 === Masked records === 
    235   Like {{{numpy.core.ma}}}, the {{{ndarray}}}-based implementation of {{{MaskedArray}}} is limited when working with records: you can mask any record of the array, but not a field in a record. If you need this feature, you may want to give {{{mrecords}}} a try (available in the {{{maskedarra}}} directory in the scipy sandbox). This module defines a new class, {{{MaskedRecord}}}. An instance of this class accepts a {{{recarray}}} as data, and uses two masks: the {{{fieldmask}}} has as many entries as records in the array, each entry with the same fields as a record, but of boolean types: they indicate whether the field is masked or not; a record entry is flagged as masked in the {{{mask}}} array if all the fields are masked. A few examples in the file should give you an idea of what can be done. Note that {{{mrecords}}} is still experimental... 
    236  
    237  
    238 Please note that it's still a work in progress (even if it seems to work OK when I use it). Suggestions, comments, improvements and general feedback are more than welcome ! At last, I'd like to thank Paul, Travis and Sasha for the original masked array package: without you, I would never have started that (it might be argued that I shouldn't have anyway, but that's another story...) 
    239  
    240  
    241  
    242