Changes between Version 1 and Version 2 of ProperNanHandling
- Timestamp:
- 09/24/08 12:52:52 (5 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
ProperNanHandling
v1 v2 24 24 are considered for changes: 25 25 26 - max and min 26 - max and min, amax and amin, the maximum and minimum ufuncs 27 27 - argmax and argmin 28 28 - sort and argsort … … 50 50 51 51 This has caused problems to numerous people [1, 2]. In particular, when NaN are 52 caused by some invalid floating point operation (0/0, sqrt(-1 )), this means NaN52 caused by some invalid floating point operation (0/0, sqrt(-1.)), this means NaN 53 53 are silenced *and* give a not meaningful answer. 54 54 … … 73 73 (note that this is not consistent with max/min behavior). AFAICS, 74 74 there is no other choice possible. 75 76 Alternatives to using NaN 77 ========================= 78 79 NaNs that appear during floating-point calculations can be avoided by using seterr(invalid='raise'), which has the effect of raising an exception whenever one occurs. However, they were put into the IEEE standard precisely because this is often not convenient behaviour. 80 81 Using NaNs as invalid values is standard idiom in MATLAB and R, but numpy has an additional tool that neither of those packages has: masked arrays. Masked arrays allow users to explicitly flag invalid values, and all functions that act on masked arrays have sensible handling of invalid values. Masked arrays also allow users to mark invalid elements in arrays with types (for example integers) that do not have a NaN. 75 82 76 83 Cost of handling NaN … … 97 104 be significant when no Nan is in the arrays. 98 105 106 There can also be a substantial cost to calculating with NaNs: even operations like addition will, on some common CPUs, have to fall back to software floating point. This means, in particular, that calculating with arrays containing many NaNs can be quite slow. 107 99 108 Solutions 100 109 ========= 101 110 102 111 Several solutions are possible. Each of them, as well as their consequences are 103 discussed below 112 discussed below. 104 113 105 114 Raising an error … … 112 121 ~~~~~~~~~~~~~~ 113 122 114 NaN input would be check for relevant dtypes in relevant PyArray functions.123 NaN input would be checked for relevant dtypes in relevant PyArray functions. 115 124 Another solution would be to deal with NaN comparison in the core C loops, and 116 125 signal an error if a NaN is detected. … … 158 167 although those cases were not tested. 159 168 169 Supporting general invalid data operations 170 ------------------------------------------ 171 172 Implementation 173 ~~~~~~~~~~~~~~ 174 175 The masked array implementation of all the operations in question could be duplicated. This would involve both providing additional keyword arguments to control handling of invalid values (for example, should sorts leave NaNs in place, sort them to the end, or sort them to the beginning) and changing the defaults to one of the above schemes. 176 177 API Breakage 178 ~~~~~~~~~~~~ 179 180 Redefining the default behaviour breaks the API as above. Adding keyword arguments should not produce additional API breakage. 181 182 Cost 183 ~~~~ 184 185 In addition to the cost required by the two preceding approaches, this requires numpy developers to keep numpy's NaN handling consistent with its handling of masked arrays. Since masked arrays are a less stable part of numpy, maintaining this code duplication may require substantial effort. 186 160 187 References 161 188 ==========
