Help: C++ program performance differences in OS X and Linux

Discussion in 'Programming' started by AluminumStudios, Dec 18, 2012.

  1. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    Hello.

    I'm a hobbyist programmer and have stumbled across a curious problem that I hope someone can give me some insight on.

    I wrote an app in C++ using g++ on my OS X 10.6 MacBook. It's command-line only. It reads some numbers from a small TXT file, does a lot of number crunching using two threads (one per core), and it uses a lot of memory (dozens to hundreds of mb depending), which it makes many frequent short writes to in a very random pattern. The program doesn't access the hard drive or network until it dumps out a tiff file at the very end.

    I've tweaked it and it performs well on my old 2.16 Ghz core2duo MacBook.

    I compiled the same code on a newer HP laptop running 64bit Fedora 17 with a faster Core2Duo CPU and a larger quantity of faster RAM than the MacBook has using the same g++ options, but it executes at about 1/4 the speed of my slower MacBook. (There is nothing running in the background in either case to affect performance.)

    I compile with: "g++ -o -ltiff -lm -lpthreads program.o program.cc -march=core2 -O3"
    Then execute with: ./program.o input.txt

    I've tried a number of variations on the Linux machine such as skipping -march=core2 and -O3 (or trying -O1 and -O2). These have influenced run time on the Linux machine by up to 20%, but in the best case it is still running 1/4 the speed as the same code on the slower MacBook. In one test case the MacBook completed the run in 1'19" while the Linux machine took 6' (same source code, same parameter file.)

    The Linux system is 2.4 GHz Core2Duo with 1066Mhz FSB & RAM and the MacBook is 2.16 GHz Core2Duo with 667Mhz FSB & RAM.

    I know GHz doesn't always say much, but I would expect at least comparable performance, so I think something is wrong for the faster system to perform so much more slowly. I've tried both Fedora 17 as well as Ubuntu with similar (slower) results.

    Any thoughts or suggestions? I'm not a Linux guru so there may be some factor I overlooked or didn't set optimally.

    Thanks in advance!

    -William Milberry
     
  2. postmortemIA

    postmortemIA Diamond Member

    Joined:
    Jul 11, 2006
    Messages:
    7,510
    Likes Received:
    2
    I see that you are using Tiff library. Is that what number crunching is about? Are versions of the library the same on both OSes?

    Can you confirm that Linux is using MP kernel and that both CPUs are being utilized in parallel?
     
  3. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    Thanks for the reply.

    Libtiff is only used at the end to output an image after a lot of calculation. It represents an extreme minority of the execution time.

    Both cores are being utilized to 100% and this can be seen under System Monitor.

    The Linux machine has 4 gigs of RAM and the Mac only has 2, my test case is only using a few dozen megs, so there is no swapping going on.

    My program does a lot of looping and calculations, repeatedly rewriting the values in two arrays of several thousand doubles and two arrays of several thousand ints while making frequent short and random writes to another much larger memory buffer and finally using this data to output a .tif as the very last step. It's the calculation/memory intensive portion which is a majority of the run time and where the Linux machine is trailing the Mac significantly and I'm not sure why ...

    Regards,

    -Will
     
  4. degibson

    degibson Golden Member

    Joined:
    Mar 21, 2008
    Messages:
    1,389
    Likes Received:
    0
    You should collect a CPU profile, preferably on both platforms.
     
  5. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    On OS X under Activity Monitor I can choose "sample process" and see what percentage of time what functions are taking and it's been very useful. Is this what you mean by CPU profile?

    I'm not sure how to do it on Linux, any info. you could point me to would be greatly appreciated.
     
  6. degibson

    degibson Golden Member

    Joined:
    Mar 21, 2008
    Messages:
    1,389
    Likes Received:
    0
  7. Net

    Net Golden Member

    Joined:
    Aug 30, 2003
    Messages:
    1,587
    Likes Received:
    0
    i've heard that macs arch design makes better use of the memory. you might want to look into that to is if that is the case. and the hard drive specs?
     
  8. mv2devnull

    mv2devnull Senior member

    Joined:
    Apr 13, 2010
    Messages:
    939
    Likes Received:
    0
    Although it hardly explains the issue[1], OS X apparently may have multiple compilers:
    https://trac.macports.org/wiki/UsingTheRightCompiler


    [1]Assuming that the "g++" is that of GNU Compiler Collection version 4.2, it should do no better than the GCC 4.7 of Fedora 17.
     
  9. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    Thanks for the responses.

    @degipson - I saw that but posted here hoping for a variety of ideas and to her personal preferences on the tools for the job.

    As for the other questions, I don't think the hard drive is particularly relevant as it's not accessed while the program is running on either system. I have enough free ram such that there is no swapping and the tiff output is very quick at the end.

    I'm using gcc 4.2.1 on the Mac (Apple supplied), and gcc 4.7.2 on Fedora 17. I also tried whatever version was included with Ubuntu 11.04 when I was experimenting with Ubuntu (I used the older Ubuntu 11.04 on purpose for unrelated reasons.) In each case, the slightly faster system running Linux has performed at around 1/4 the speed on the same code with the same or similar compilation settings.

    While perhaps the Mac may be a little better tuned than the HP laptop I'm running Linux on, I can't imagine it resulting it such a performance difference. The investigation goes on ...
     
  10. Schmide

    Schmide Diamond Member

    Joined:
    Mar 7, 2002
    Messages:
    5,016
    Likes Received:
    5
    Could you check the cache on the processors? My bet is the Mac is a T7400 with 4mb L2 cache and the Linux box is an E4600 with 2mb L2 cache.

    If this is the case it may be beneficial to run the Linux box with one thread.
     
  11. Fayd

    Fayd Diamond Member

    Joined:
    Jun 28, 2001
    Messages:
    7,978
    Likes Received:
    0
    if that were the case, that's the biggest difference from cache i've ever seen.

    most of the time gains are less than 2%. you're suggesting a 400% gain.
     
  12. postmortemIA

    postmortemIA Diamond Member

    Joined:
    Jul 11, 2006
    Messages:
    7,510
    Likes Received:
    2
    Interesting problem, profiler should help. But really, we're shooting in the dark without code. You might have to post your (proprietary) code to get the answer.
    Netbeans IDE comes with C/C++ profiling support for Linux.

    Are both apps 64-bit or 32-bit?
     
  13. degibson

    degibson Golden Member

    Joined:
    Mar 21, 2008
    Messages:
    1,389
    Likes Received:
    0
    Clarification added to original quote, in [ ]
    I can vouch for pprof, i.e., it's my go-to tool for this sort of mystery. :)
     
  14. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    @Schmide - the Linux machine has 2.4 Ghz Core2Duo P8600 with 3mb cache and the Macbook has a 2.16 Ghz Core2Duo T7400 with 4mb cache. Cache affects performance on memory heavy, number crunching apps (which this is), but I wouldn't expect this big of a difference between those two.

    I've had bugs and unexpectedly slow functions dominate execution time which I've seen when I looked under "Sample Process" under Activity Monitor on the Mac in the past. I tried the Google performance analyzer (http://goog-perftools.sourceforge.net/doc/cpu_profiler.html) as per the previous suggestion, but it didn't seem to reveal anything overly unusual. The functions I saw in the results seemed to occupy a reasonably expected percentage of execution time. It's not like a bug in one function was occupying an disproportionate amount of time. So the mystery continues ...

    Any additional insights anyone might have are appreciated.
     
  15. masteryoda34

    masteryoda34 Golden Member

    Joined:
    Dec 17, 2007
    Messages:
    1,399
    Likes Received:
    0
    Amount of cache and the way the OS handles memory cannot explain a 75% performance reduction.

    I am going to guess that the compiler on OSX is automatically vectorizing your code to use SSE instructions while the linux compiler is not. You said your code involves a large amount of number crunching and is parallelizable, which leads me to believe it is indeed a candidate to use SSE. SSE instructions would perform 4 operations per instruction cycle vs 1. This also corresponds to your 1/4 performance level. Without using SSE, your CPU would still show 100% use. Basically, nothing you said would disprove my theory and the theory fits well. That being said its mostly a guess on my part. I would start by looking for options related to SSE/SIMD/vectorizing for g++ and try compiling with new switches.
     
  16. Ancalagon44

    Ancalagon44 Platinum Member

    Joined:
    Feb 17, 2010
    Messages:
    2,642
    Likes Received:
    4
    Is it mostly floating point or integer arithmetic?
     
  17. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    @Ancalagon44 - It does both actually. It does a lot of FP calculations with doubles in one stage, then int calculations in another. It constantly flips back and forth between these two stages.

    I appreciate everyone's input, the Google performance tool will be helpful. Right now though I'm suspecting a bug or something else going on because I just noticed a small anomaly in the way the program executes. It periodically outputs a text status update which is fine on the Mac but one tiny detail is off when I run it on Linux. So I may have an issue with a data type or some library being different. I'll have to investigate it further, but it seems like it may be more than the simple performance issue I believed it was at first.
     
  18. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    I believe I've solved my problem, which is quite a different problem from what I thought it was!

    My program reads some initial parameters from a text file. Some of them are of type bool and were read like this:

    Code:
    bool sourcediscrimination;
    getline(infile, line);
    position = line.find("="); 
    parameter = line.substr (position+1);
    if (parameter=="true")
         sourcediscrimination=true;
    if (parameter=="false")
          sourcediscrimination=false;
    
    This worked fine on OS X. If I "cout << sourcediscrimination;" on OS X it shows "0" for false or "1" for true. On Linux however I saw "55" or other strange numbers showing up for true when I tried cout to see what the values of bool variables were.

    Some things that were written as "false" in my text parameter files were being somehow read as true, and this changed the behavior of my program leading to both the speed issue that I was initially confused about as well as the later anomalies in it's text output that I noticed.

    As a quick fix I changed text "true" and "false" to 0 and 1 in my parameter file and read it with "sourcediscrimination==atoi(parameter.c_str());" now. I know this probably isn't the optimal way for me to parse a test file, but I'll improve on that someday, I just want it to run for now.

    Anyway, mystery solved. Thank you for your replies.
     
  19. mv2devnull

    mv2devnull Senior member

    Joined:
    Apr 13, 2010
    Messages:
    939
    Likes Received:
    0
    That code sample has an issue: what does happen if 'parameter' contains neither "true" nor "false"?

    'atoi()' may work, but I would consider std::istringstream
    http://www.cplusplus.com/reference/sstream/istringstream/istringstream/
    Code:
    bool sourcediscrimination;
    getline(infile, line);
    size_t position = line.find("=");
    if ( position != string::npos ) {
      std::string parameter = line.substr (position+1);
      std::istringstream istr( parameter );
      istr >> sourcediscrimination;
    } else {
      // No "=" found, so no boolean in this line
      sourcediscrimination = false;
    }
    The 'istr >> sourcediscrimination;' may still be nitpicky about unexpected values, but it at least skips whitespace by default.
     
  20. Cerb

    Cerb Elite Member

    Joined:
    Aug 26, 2000
    Messages:
    17,409
    Likes Received:
    0
    That's bad. In the future, please never ever ever do that. Decide on what a true should be, and set it either true or false. FI:
    Code:
    bool sourcediscrimination;
    getline(infile, line);
    position = line.find("="); 
    parameter = line.substr (position+1);
    if (parameter=="true")
         sourcediscrimination=true;
    else
         sourcediscrimination=false;
    Always assume external inputs are bad, until they prove otherwise.
     
  21. AluminumStudios

    AluminumStudios Senior member

    Joined:
    Sep 7, 2001
    Messages:
    628
    Likes Received:
    0
    I know the way my parameter file is handled isn't ideal, I've just been focusing my efforts on getting the number crunching portion of it to run (^_^;). I'm the only one who uses it, so I've been careful about keeping my parameter file in order, it's that is definitely on the to-fix list!
     
  22. Merad

    Merad Platinum Member

    Joined:
    May 31, 2010
    Messages:
    2,470
    Likes Received:
    1
    In addition to comments already made, this is why you should always initialize variables.

    I imagine what happened is that that for some reason your file wasn't parsed correctly, so due to your use of if/if rather than if/else, the bool was never set. Unless the chunk of memory where the variable was allocate just happened to be zeroed, you'd probably end up with it being true (it's compiler specific, but generally 0 is false and non zero is true).

    This may not seem like a big deal since bool can only be true or false and presumably, both are valid for your program, but with other variable types it could lead to some very random and hard to find bugs.