Changes between Version 1 and Version 2 of ProperNanHandling

Show
Ignore:
Timestamp:
09/24/08 12:52:52 (5 years ago)
Author:
peridot
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ProperNanHandling

    v1 v2  
    2424are considered for changes: 
    2525 
    26         - max and min 
     26        - max and min, amax and amin, the maximum and minimum ufuncs 
    2727        - argmax and argmin 
    2828        - sort and argsort 
     
    5050 
    5151This has caused problems to numerous people [1, 2]. In particular, when NaN are 
    52 caused by some invalid floating point operation (0/0, sqrt(-1)), this means NaN 
     52caused by some invalid floating point operation (0/0, sqrt(-1.)), this means NaN 
    5353are silenced *and* give a not meaningful answer. 
    5454 
     
    7373          (note that this is not consistent with max/min behavior). AFAICS, 
    7474          there is no other choice possible. 
     75 
     76Alternatives to using NaN 
     77========================= 
     78 
     79NaNs that appear during floating-point calculations can be avoided by using seterr(invalid='raise'), which has the effect of raising an exception whenever one occurs. However, they were put into the IEEE standard precisely because this is often not convenient behaviour.  
     80 
     81Using NaNs as invalid values is standard idiom in MATLAB and R, but numpy has an additional tool that neither of those packages has: masked arrays. Masked arrays allow users to explicitly flag invalid values, and all functions that act on masked arrays have sensible handling of invalid values. Masked arrays also allow users to mark invalid elements in arrays with types (for example integers) that do not have a NaN.  
    7582 
    7683Cost of handling NaN 
     
    97104be significant when no Nan is in the arrays. 
    98105 
     106There can also be a substantial cost to calculating with NaNs: even operations like addition will, on some common CPUs, have to fall back to software floating point. This means, in particular, that calculating with arrays containing many NaNs can be quite slow.  
     107 
    99108Solutions 
    100109========= 
    101110 
    102111Several solutions are possible. Each of them, as well as their consequences are 
    103 discussed below 
     112discussed below. 
    104113 
    105114Raising an error 
     
    112121~~~~~~~~~~~~~~ 
    113122 
    114 NaN input would be check for relevant dtypes in relevant PyArray functions. 
     123NaN input would be checked for relevant dtypes in relevant PyArray functions. 
    115124Another solution would be to deal with NaN comparison in the core C loops, and 
    116125signal an error if a NaN is detected. 
     
    158167although those cases were not tested. 
    159168 
     169Supporting general invalid data operations 
     170------------------------------------------ 
     171 
     172Implementation 
     173~~~~~~~~~~~~~~ 
     174 
     175The masked array implementation of all the operations in question could be duplicated. This would involve both providing additional keyword arguments to control handling of invalid values (for example, should sorts leave NaNs in place, sort them to the end, or sort them to the beginning) and changing the defaults to one of the above schemes. 
     176 
     177API Breakage 
     178~~~~~~~~~~~~ 
     179 
     180Redefining the default behaviour breaks the API as above. Adding keyword arguments should not produce additional API breakage. 
     181 
     182Cost 
     183~~~~ 
     184 
     185In addition to the cost required by the two preceding approaches, this requires numpy developers to keep numpy's NaN handling consistent with its handling of masked arrays. Since masked arrays are a less stable part of numpy, maintaining this code duplication may require substantial effort. 
     186 
    160187References 
    161188==========