C++ >>> Python

Armitage

Banned
Feb 23, 2001
8,086
0
0
So this guy at work has these data files. Several thousand of them, each about 150MB, and having about 3000 floating point parameters. I wrote him a quick script to parse the files, and generate statistics (count, min, max, avg, stdev) for each of the parameters. An easy task.

I wrote the first version in Python, and the performance is abysmal. Takes between 2000 and 3000 seconds to process a file. Given the amount of data this guy has (and more arriving soon) this was clearly not acceptible. I spent some time mucking around with the python, but didn't make any significant improvement.

Next step was rewriting it in C++. It is, as much as possible, nearly a line-by-line translation of of the python. Had to add some stuff of course ... variable declarations etc. Used the STL map where I used the Python map, etc. And I did some of the FP calcs in long double instead of double for the stddev & avg calcs

The C++ version takes about 17 seconds to process the same files!

I have to find some time to take another look at the Python code and figure out what's smoking it so badly. But in general, I've found that Python seems to suck wind pretty badly when handling large files & large in-memory data sets. Memory on this machine isn't a problem ... 2GB & dual Xeons.

Anybody else have any experience like this?
 

AFB

Lifer
Jan 10, 2004
10,718
3
0
Well, since you say you did it line by line, are you sure it is doing the exact same thing as the other one? Not skipping a bunch or something like that. I agree that some laguages are slower. Heck, I notice a difference in the speed of a "Hello world" program from Java to C++, but that is mostly the JVM starting up.
 

singh

Golden Member
Jul 5, 2001
1,449
0
0
Though I don't know much about Python, I would guess that the performance is so bad due to inadiquate buffering of data. Python is written in C++ though, isn't it?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: amdfanboy
Well, since you say you did it line by line, are you sure it is doing the exact same thing as the other one? Not skipping a bunch or something like that. I agree that some laguages are slower. Heck, I notice a difference in the speed of a "Hello world" program from Java to C++, but that is mostly the JVM starting up.

Yea, I can diff the results files and find no significant difference.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Python is written in C++ though, isn't it?

Doesn't look like it.

ldd /usr/bin/python2.3
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x3aad0000)
libdl.so.2 => /lib/tls/libdl.so.2 (0x3aadf000)
libutil.so.1 => /lib/tls/libutil.so.1 (0x3aae3000)
libm.so.6 => /lib/tls/libm.so.6 (0x3aae6000)
libc.so.6 => /lib/tls/libc.so.6 (0x3ab09000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x3aaab000)
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Python's written in C.

Armitage, can you post the source? I might regret asking, if it ends up being long :p, but I am curious to see it.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Chalk it up to the difference between compiled and interpreted programs.

Hardly. The only interpretation that happens with languages like Python and Perl these days is at startup, once the code is interpreted it's compiled into memory and run from there.
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Originally posted by: Nothinman
Chalk it up to the difference between compiled and interpreted programs.

Hardly. The only interpretation that happens with languages like Python and Perl these days is at startup, once the code is interpreted it's compiled into memory and run from there.

And once a .pyc file is auto-generated, it doesn't even need to do that. :D