Ticket #1 (closed defect: fixed)

Opened 5 years ago

Last modified 3 years ago

fftw3 - poor performance for complex arrrays

Reported by: arnd.baecker@… Owned by: cookedm
Priority: normal Milestone:
Component: scipy.fftpack Version:
Keywords: fftw3 fft performance Cc:

Description

fft via scipy is much slower when using fftw3 instead of fftw2 (though fftw3 is typically much faster than fftw2, eg. using benchfft - http://www.fftw.org/benchfft/)

This was also reported before by Darren Dale. See http://www.physik.tu-dresden.de/~baecker/tmp/fftw/ a graphical illustration + scripts of the result.

This is what Pearu go on a Opteron box:

# Use fftw-2.1.3:
pearu@opt:~/svn/scipy/Lib/fftpack$ FFTW3=None python setup.py build
pearu@opt:~/svn/scipy/Lib/fftpack$ python tests/test_basic.py -l 10
   Found 23 tests for __main__

                  Fast Fourier Transform
=================================================
       |    real input     |   complex input
-------------------------------------------------
  size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
   100 |    0.07 |    0.07 |    0.07 |    0.08  (secs for 7000 calls)
  1000 |    0.07 |    0.11 |    0.09 |    0.11  (secs for 2000 calls)
   256 |    0.13 |    0.16 |    0.15 |    0.15  (secs for 10000 calls)
   512 |    0.19 |    0.28 |    0.22 |    0.29  (secs for 10000 calls)
  1024 |    0.03 |    0.06 |    0.04 |    0.06  (secs for 1000 calls)
  2048 |    0.06 |    0.10 |    0.09 |    0.10  (secs for 1000 calls)
  4096 |    0.06 |    0.15 |    0.09 |    0.16  (secs for 500 calls)
  8192 |    0.15 |    0.68 |    0.37 |    0.70  (secs for 500 calls)
...
----------------------------------------------------------------------
Ran 23 tests in 26.286s

# Use fftw-3.0.1:
pearu@opt:~/svn/scipy/Lib/fftpack$ FFTW2=None python setup.py build
pearu@opt:~/svn/scipy/Lib/fftpack$ python tests/test_basic.py -l 10
   Found 23 tests for __main__

                  Fast Fourier Transform
=================================================
       |    real input     |   complex input
-------------------------------------------------
  size |  scipy  | Numeric |  scipy  | Numeric
-------------------------------------------------
   100 |    0.07 |    0.08 |    0.43 |    0.09  (secs for 7000 calls)
  1000 |    0.07 |    0.12 |    0.61 |    0.12  (secs for 2000 calls)
   256 |    0.15 |    0.16 |    0.99 |    0.16  (secs for 10000 calls)
   512 |    0.22 |    0.29 |    1.53 |    0.29  (secs for 10000 calls)
  1024 |    0.04 |    0.06 |    0.26 |    0.06  (secs for 1000 calls)
  2048 |    0.06 |    0.10 |    0.48 |    0.10  (secs for 1000 calls)
  4096 |    0.06 |    0.15 |    0.48 |    0.16  (secs for 500 calls)
  8192 |    0.15 |    0.68 |    1.11 |    0.69  (secs for 500 calls)
....
----------------------------------------------------------------------
Ran 23 tests in 38.188s

Attachments

fftw3slow.patch (2.3 KB) - added by david cournapeau 4 years ago.
patch to make fft much more efficient when compiled with fftw3

Change History

Changed 5 years ago by pearu

  • owner changed from somebody to pearu

Changed 4 years ago by rkern

  • component changed from component1 to scipy.fftpack

Changed 4 years ago by david cournapeau

patch to make fft much more efficient when compiled with fftw3

Changed 4 years ago by david cournapeau

I think this is comming from the fact that the plan is computed for each fft call (call to fftw_plan_dft_1d) in the function zfft in zfft.c.

I adapted the file to use the caching system for fftw3, but this is not a great solution since I need to copy the input inside a cached buffer, and copy it out to the output. Still, this is much faster than the current situation I think. I attached a patch; compiles and run the tests fine, with acceptable speed

Changed 4 years ago by cookedm

  • owner changed from pearu to cookedm
  • status changed from new to assigned

I've applied your patch in r2211. I'm going to leave this open, as I think this is just a stopgap solution: we need better fftw3 support.

Changed 3 years ago by cdavid

I agree this is a hack, and not a long term solution. The long term solution is to rewrwite fftpack, because the code, as it is, is unmanageable (many #ifdef in the C code)

Changed 3 years ago by cdavid

I think I fixed the problem in r3217. I dont't think we can do much better without more infrastructure (api to save/load/change plans, and 16 bytes alignement for numpy arrays).

But now, we are faster than matlab for complex data :)

Changed 3 years ago by cdavid

  • status changed from assigned to closed
  • resolution set to fixed

I am closing this ticket, because the specific issue of the ticket is solved now (no big difference between fftw2 and fftw3). See #474 for further improvements to fftw3 wrappers.

Note: See TracTickets for help on using tickets.