| Version 4 (modified by sasha, 7 years ago) |
|---|
> python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x.sum(0)" 10000 loops, best of 3: 23.1 usec per loop > python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x[0]+x[1]" 100000 loops, best of 3: 4.88 usec per loop
As Travis explained, some overhead is expected because x.sum(0) calls DOUBLE_add 500 times instead of 1 for x[0]+x[1], but in Numeric 24.1 overhead was much smaller:
> python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "sum(x)" 100000 loops, best of 3: 6.05 usec per loop > python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "x[0]+x[1]" 100000 loops, best of 3: 3.63 usec per loop
Interestingly, reducing colums exhibits similar overhead in Numeric
> python -m timeit -s "from Numeric import zeros,sum; x = zeros((500,2),'f')" "sum(x,1)" 10000 loops, best of 3: 26.8 usec per loop
Numeric 24.1 inner loop (ufuncobject.c:613-627):
/* This is the inner loop to actually do the computation. */
loop=-1;
while(1) {
while (loop < n_loops-2) {
loop++;
loop_i[loop] = 0;
for(i=0; i<self->nin+self->nout; i++) { resets[loop][i] = pointers[i]; }
}
function(pointers, loop_n+(n_loops-1), steps[n_loops-1], data);
while (loop >= 0 && !(++loop_i[loop] < loop_n[loop]) && loop >= 0) loop--;
if (loop < 0) break;
for(i=0; i<self->nin+self->nout; i++) { pointers[i] = resets[loop][i] + steps[loop][i]*loop_i[loop]; }
}
Numpy r2208 inner loop (ufuncobject.c:1968-1985):
while(loop->index < loop->size) {
/* Copy first element to output */
if (loop->obj)
Py_INCREF(*((PyObject **)loop->it->dataptr));
memmove(loop->bufptr[1], loop->it->dataptr,
loop->outsize);
/* Adjust input pointer */
loop->bufptr[0] = loop->it->dataptr+loop->steps[0];
loop->function((char **)loop->bufptr,
&(loop->N),
loop->steps, loop->funcdata);
UFUNC_CHECK_ERROR(loop);
PyArray_ITER_NEXT(loop->it)
loop->bufptr[1] += loop->outsize;
loop->bufptr[2] = loop->bufptr[1];
loop->index++;
}
