Feb 04

Use Enthought for scientific python

I’ve spent many hours attempting to install numpy and scipy from source on both Linux at work and Mac OS X at home. I could never get the big speedups possible from good numerical libraries like BLAS and ATLAS. Enthought’s one-click install python distribution turned out to be the perfect solution.

I highly recommend checking out the Enthought Python Distribution (EPD): http://www.enthought.com/products/epd.php. The free license is just fine for most everything, and the academic license can get you the full version. I did try to register for the academic license, but I found it was a bit tough to actually cache in on… you need to use a special link they send via email, which I didn’t see until I had accidentally installed the FREE version by mistake.

Super easy install: it just takes some computer time, not careful human babysitting.  And the best part is it actually works right out of the box!

For example, here’s a simple benchmark for numerical computing: compute the matrix product X^TX, where X is a matrix with LOTS of rows (specifically, half a million rows by 64 columns).

import numpy as np
import time

X = np.random.randn( 500000, 64 )

tstart = time.time()
np.dot( X.T, X )

print '%.2f sec' % (time.time()-tstart)

The product’s result is just a 64-by-64 matrix, but it takes a while to do, and proper use of numerical libraries can make a HUGE difference.

Linux 64 self-built numpy without ATLAS 40.13 sec
Linux x86 self-built numpy with ATLAS 1.92 sec
Linux x86 numpy installed by CS dept. 1.34 sec
Linux x86 Enthought 0.75 sec

Holy cow! EPD’s execution is almost twice as fast as the ATLAS-optimized numpy maintained by the Brown CS technical staff. Doubtlessly this is due to its efficient use of Intel’s Math Kernel Library. Even a self-built ATLAS implementation cannot seem to compete with this.

Comparison to Matlab

For matrix multiplication, Matlab appears to be much faster even than Enthought. Running the following similar test takes only 0.30 seconds, which is roughly a 2x improvement on the EPD.

X = randn( 500000, 64 );
R = X'*X;
fprintf( '%.2f sec\n', toc );

Please let me know if you have ideas about how to speed up numpy even more! I do like Matlab a lot for its rock solid core libraries, but I eventually want to move to a more open-source software ecosystem.

Can I still use other 3rd party libraries?

Yes! Once installed, just use the typical Python easy_install executable located in the bin/ directory of the EPD install.

I recommend using this to install pip, a much better package manager for python than easy_install (since it supports upgrading and uninstalling out of the box). Then use pip to install any of the other great python modules out there (scikit-learn, pandas, milk, etc.).

Just do this at any bash terminal, setting EPDROOT to your install directory.

EPDROOT/bin/easy_install pip
EPDROOT/bin/pip install scikit-learn

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>