PossibleOptimizationAreas/ReduceDiscussion

Version 4 (modified by sasha, 7 years ago)

Interestingly, reducing colums exhibits similar overhead in Numeric

Mailing list thread

> python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x.sum(0)"
10000 loops, best of 3: 23.1 usec per loop
> python -m timeit -s "from numpy import zeros; x = zeros((2,500),'f')" "x[0]+x[1]"
100000 loops, best of 3: 4.88 usec per loop

As Travis explained, some overhead is expected because x.sum(0) calls DOUBLE_add 500 times instead of 1 for x[0]+x[1], but in Numeric 24.1 overhead was much smaller:

> python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "sum(x)"
100000 loops, best of 3: 6.05 usec per loop
> python -m timeit -s "from Numeric import zeros,sum; x = zeros((2,500),'f')" "x[0]+x[1]"
100000 loops, best of 3: 3.63 usec per loop

Interestingly, reducing colums exhibits similar overhead in Numeric

> python -m timeit -s "from Numeric import zeros,sum; x = zeros((500,2),'f')" "sum(x,1)"
10000 loops, best of 3: 26.8 usec per loop

Numeric 24.1 inner loop (ufuncobject.c:613-627):

    /* This is the inner loop to actually do the computation. */
    loop=-1;
    while(1) {
        while (loop < n_loops-2) {
            loop++;
            loop_i[loop] = 0;
            for(i=0; i<self->nin+self->nout; i++) { resets[loop][i] = pointers[i]; }
        }

        function(pointers, loop_n+(n_loops-1), steps[n_loops-1], data);

        while (loop >= 0 && !(++loop_i[loop] < loop_n[loop]) && loop >= 0) loop--;
        if (loop < 0) break;
        for(i=0; i<self->nin+self->nout; i++) { pointers[i] = resets[loop][i] + steps[loop][i]*loop_i[loop]; }
    }

Numpy r2208 inner loop (ufuncobject.c:1968-1985):

                while(loop->index < loop->size) { 
                        /* Copy first element to output */ 
                        if (loop->obj)  
                                Py_INCREF(*((PyObject **)loop->it->dataptr)); 
                        memmove(loop->bufptr[1], loop->it->dataptr,  
                               loop->outsize); 
                        /* Adjust input pointer */ 
                        loop->bufptr[0] = loop->it->dataptr+loop->steps[0]; 
                        loop->function((char **)loop->bufptr,  
                                       &(loop->N), 
                                       loop->steps, loop->funcdata); 
                        UFUNC_CHECK_ERROR(loop); 
 
                        PyArray_ITER_NEXT(loop->it) 
                        loop->bufptr[1] += loop->outsize; 
                        loop->bufptr[2] = loop->bufptr[1]; 
                        loop->index++;  
                }