[Numpy-tickets] [NumPy] #717: numpy.loadtxt fails when missing values are present
NumPy
numpy-tickets@scipy....
Thu Apr 3 21:25:35 CDT 2008
#717: numpy.loadtxt fails when missing values are present
--------------------------+-------------------------------------------------
Reporter: lesserwhirls | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 1.0.5
Component: numpy.core | Version: none
Severity: normal | Keywords: loadtxt missing values
--------------------------+-------------------------------------------------
== Problem ==
numpy.loadtxt fails when missing values are present. For example, assume
your data file is well behaved:
{{{
val1,val2,val3,val4,val5\n
}}}
loadtxt works great for this example (no surprise). Now, if your data
file is not 'well behaved' and contains missing values
{{{
val1,val2,,val4,val5\n
}}}
loadtxt fails.
----
== Solution ==
1) Add keyword fill to def
{{{
def loadtxt(...,fill=-999):
}}}
2) add the following after the line "vals = line.split(delimiter)"
(line 713 in core/numeric.py , numpy 1.0.4):
{{{
vals = [(z, fill)[z is ''] for z in vals]
}}}
----
== Performace ==
Load an 18,000 line ascii dataset, 22 float variables on each line,
skipping the first column (its a time stamp).
Timings using %timeit in ipython:
Reading an ascii file with no missing values using the current version of
loadtxt:[[BR]]
***10 loops, best of 3: 703 ms per loop
Reading an ascii file with no missing values using the proposed changes to
loadtxt:[[BR]]
***10 loops, best of 3: 801 ms per loop
The changes do create a ''slight'' performance hit for those who use
loadtxt to read in nicely behaving ascii data. If this is an issue, could
a loadtxt2 function be added?
--
Ticket URL: <http://scipy.org/scipy/numpy/ticket/717>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.
More information about the Numpy-tickets
mailing list