| Version 6 (modified by sasha, 7 years ago) |
|---|
Areas of NumPy that have come up as possible slow points:
- PyArray?_EnsureArray. It seems to be slow for python ints. This came up as a possible culprit for some pow() slowness. PyArray?_EnsureArray could special-check Python scalars for a speed-up.
- Default (digits=0) case of around is 10x slower than (x+0.5).astype(int).astype(float). Resolved: changeset:2151 implements fast rint function and adds a round method to ndarray that is about as fast for digits=0 case.
- As of changeset:2173 x.fill(1) is 2x slower than x += 1. One should be able to set memory to a constant value faster than autoincrement.
> python -m timeit -s "from numpy import zeros; x = zeros(10000,'b')" "x.fill(1)" 10000 loops, best of 3: 69.5 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'h')" "x.fill(1)" 10000 loops, best of 3: 66.1 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'i')" "x.fill(1)" 10000 loops, best of 3: 66.3 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'d')" "x.fill(1)" 10000 loops, best of 3: 73.2 usec per loop
> python -m timeit -s "from numpy import zeros; x = zeros(10000,'b')" "x += 1" 10000 loops, best of 3: 58 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'h')" "x += 1" 10000 loops, best of 3: 33.7 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'i')" "x += 1" 10000 loops, best of 3: 33.6 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'d')" "x += 1" 10000 loops, best of 3: 36.9 usec per loop
The attached patch results in the following timings:
> python -m timeit -s "from numpy import zeros; x = zeros(10000,'b')" "x.fill(1)" 100000 loops, best of 3: 4.55 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'h')" "x.fill(1)" 100000 loops, best of 3: 12 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'i')" "x.fill(1)" 100000 loops, best of 3: 12.4 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros(10000,'d')" "x.fill(1)" 100000 loops, best of 3: 13 usec per loop
Note the more than 10x improvement in the 'b' case.
Attachments
-
fast-fill-patch.txt
(3.6 KB) - added by sasha
7 years ago.
fast fill patch
