I’ve spent many hours attempting to install numpy and scipy from source on both Linux at work and Mac OS X at home. I could never get the big speedups possible from good numerical libraries like BLAS and ATLAS. Enthought’s one-click install python distribution turned out to be the perfect solution.

I highly recommend checking out the Enthought Python Distribution (EPD): http://www.enthought.com/products/epd.php. The free license is just fine for most everything, and the academic license can get you the full version. I did try to register for the academic license, but I found it was a bit tough to actually cache in on… you need to use a special link they send via email, which I didn’t see until I had accidentally installed the FREE version by mistake.

Super easy install: it just takes some computer time, not careful human babysitting. And the best part is it actually works right out of the box!

For example, here’s a simple benchmark for numerical computing: compute the matrix product , where is a matrix with LOTS of rows (specifically, half a million rows by 64 columns).

import numpy as np import time X = np.random.randn( 500000, 64 ) tstart = time.time() np.dot( X.T, X ) print '%.2f sec' % (time.time()-tstart)

The product’s result is just a 64-by-64 matrix, but it takes a while to do, and proper use of numerical libraries can make a HUGE difference.

Linux 64 self-built numpy without ATLAS | 40.13 sec |

Linux x86 self-built numpy with ATLAS | 1.92 sec |

Linux x86 numpy installed by CS dept. | 1.34 sec |

Linux x86 Enthought | 0.75 sec |

Holy cow! EPD’s execution is almost twice as fast as the ATLAS-optimized numpy maintained by the Brown CS technical staff. Doubtlessly this is due to its efficient use of Intel’s Math Kernel Library. Even a self-built ATLAS implementation cannot seem to compete with this.

### Comparison to Matlab

For matrix multiplication, Matlab appears to be much faster even than Enthought. Running the following similar test takes only 0.30 seconds, which is roughly a 2x improvement on the EPD.

X = randn( 500000, 64 ); tic; R = X'*X; fprintf( '%.2f sec\n', toc );

Please let me know if you have ideas about how to speed up numpy even more! I do like Matlab a lot for its rock solid core libraries, but I eventually want to move to a more open-source software ecosystem.

### Can I still use other 3rd party libraries?

Yes! Once installed, just use the typical Python `easy_install`

executable located in the `bin/`

directory of the EPD install.

I recommend using this to install `pip`

, a much better package manager for python than easy_install (since it supports upgrading and uninstalling out of the box). Then use pip to install any of the other great python modules out there (scikit-learn, pandas, milk, etc.).

Just do this at any bash terminal, setting `EPDROOT`

to your install directory.

EPDROOT/bin/easy_install pip EPDROOT/bin/pip install scikit-learn

## Recent Comments