Go Back   AnandTech Forums > Software > Programming

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2014
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 12-18-2012, 05:23 PM   #1
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Question Help: C++ program performance differences in OS X and Linux

Hello.

I'm a hobbyist programmer and have stumbled across a curious problem that I hope someone can give me some insight on.

I wrote an app in C++ using g++ on my OS X 10.6 MacBook. It's command-line only. It reads some numbers from a small TXT file, does a lot of number crunching using two threads (one per core), and it uses a lot of memory (dozens to hundreds of mb depending), which it makes many frequent short writes to in a very random pattern. The program doesn't access the hard drive or network until it dumps out a tiff file at the very end.

I've tweaked it and it performs well on my old 2.16 Ghz core2duo MacBook.

I compiled the same code on a newer HP laptop running 64bit Fedora 17 with a faster Core2Duo CPU and a larger quantity of faster RAM than the MacBook has using the same g++ options, but it executes at about 1/4 the speed of my slower MacBook. (There is nothing running in the background in either case to affect performance.)

I compile with: "g++ -o -ltiff -lm -lpthreads program.o program.cc -march=core2 -O3"
Then execute with: ./program.o input.txt

I've tried a number of variations on the Linux machine such as skipping -march=core2 and -O3 (or trying -O1 and -O2). These have influenced run time on the Linux machine by up to 20%, but in the best case it is still running 1/4 the speed as the same code on the slower MacBook. In one test case the MacBook completed the run in 1'19" while the Linux machine took 6' (same source code, same parameter file.)

The Linux system is 2.4 GHz Core2Duo with 1066Mhz FSB & RAM and the MacBook is 2.16 GHz Core2Duo with 667Mhz FSB & RAM.

I know GHz doesn't always say much, but I would expect at least comparable performance, so I think something is wrong for the faster system to perform so much more slowly. I've tried both Fedora 17 as well as Ubuntu with similar (slower) results.

Any thoughts or suggestions? I'm not a Linux guru so there may be some factor I overlooked or didn't set optimally.

Thanks in advance!

-William Milberry
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-18-2012, 06:26 PM   #2
postmortemIA
Diamond Member
 
postmortemIA's Avatar
 
Join Date: Jul 2006
Location: Midwest USA
Posts: 6,500
Default

I see that you are using Tiff library. Is that what number crunching is about? Are versions of the library the same on both OSes?

Can you confirm that Linux is using MP kernel and that both CPUs are being utilized in parallel?
__________________
D1. Win7 x64 i7-3770 on Z77, HD7850, 2707WFP, 840, X-Fi D2. Win7 x64 E8400 on P35
L1. OSX 10.9 rMBP 13 L2. Vista x86 E1505
M. Galaxy S4

postmortemIA is online now   Reply With Quote
Old 12-18-2012, 06:56 PM   #3
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

Thanks for the reply.

Libtiff is only used at the end to output an image after a lot of calculation. It represents an extreme minority of the execution time.

Both cores are being utilized to 100% and this can be seen under System Monitor.

The Linux machine has 4 gigs of RAM and the Mac only has 2, my test case is only using a few dozen megs, so there is no swapping going on.

My program does a lot of looping and calculations, repeatedly rewriting the values in two arrays of several thousand doubles and two arrays of several thousand ints while making frequent short and random writes to another much larger memory buffer and finally using this data to output a .tif as the very last step. It's the calculation/memory intensive portion which is a majority of the run time and where the Linux machine is trailing the Mac significantly and I'm not sure why ...

Regards,

-Will
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-18-2012, 07:21 PM   #4
degibson
Golden Member
 
degibson's Avatar
 
Join Date: Mar 2008
Posts: 1,389
Default

You should collect a CPU profile, preferably on both platforms.
degibson is offline   Reply With Quote
Old 12-18-2012, 07:35 PM   #5
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

On OS X under Activity Monitor I can choose "sample process" and see what percentage of time what functions are taking and it's been very useful. Is this what you mean by CPU profile?

I'm not sure how to do it on Linux, any info. you could point me to would be greatly appreciated.
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-18-2012, 10:55 PM   #6
degibson
Golden Member
 
degibson's Avatar
 
Join Date: Mar 2008
Posts: 1,389
Default

http://goog-perftools.sourceforge.ne..._profiler.html
(first hit for "CPU profile")
degibson is offline   Reply With Quote
Old 12-19-2012, 10:30 AM   #7
Net
Golden Member
 
Join Date: Aug 2003
Location: California
Posts: 1,526
Default

i've heard that macs arch design makes better use of the memory. you might want to look into that to is if that is the case. and the hard drive specs?
Net is offline   Reply With Quote
Old 12-19-2012, 02:07 PM   #8
mv2devnull
Senior Member
 
Join Date: Apr 2010
Posts: 785
Default

Although it hardly explains the issue[1], OS X apparently may have multiple compilers:
https://trac.macports.org/wiki/UsingTheRightCompiler


[1]Assuming that the "g++" is that of GNU Compiler Collection version 4.2, it should do no better than the GCC 4.7 of Fedora 17.
mv2devnull is online now   Reply With Quote
Old 12-19-2012, 02:21 PM   #9
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

Thanks for the responses.

@degipson - I saw that but posted here hoping for a variety of ideas and to her personal preferences on the tools for the job.

As for the other questions, I don't think the hard drive is particularly relevant as it's not accessed while the program is running on either system. I have enough free ram such that there is no swapping and the tiff output is very quick at the end.

I'm using gcc 4.2.1 on the Mac (Apple supplied), and gcc 4.7.2 on Fedora 17. I also tried whatever version was included with Ubuntu 11.04 when I was experimenting with Ubuntu (I used the older Ubuntu 11.04 on purpose for unrelated reasons.) In each case, the slightly faster system running Linux has performed at around 1/4 the speed on the same code with the same or similar compilation settings.

While perhaps the Mac may be a little better tuned than the HP laptop I'm running Linux on, I can't imagine it resulting it such a performance difference. The investigation goes on ...
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-19-2012, 03:10 PM   #10
Schmide
Diamond Member
 
Schmide's Avatar
 
Join Date: Mar 2002
Posts: 4,252
Default

Could you check the cache on the processors? My bet is the Mac is a T7400 with 4mb L2 cache and the Linux box is an E4600 with 2mb L2 cache.

If this is the case it may be beneficial to run the Linux box with one thread.
__________________
All errors are undocumented features waiting to be discovered.
Schmide is offline   Reply With Quote
Old 12-19-2012, 05:01 PM   #11
Fayd
Diamond Member
 
Fayd's Avatar
 
Join Date: Jun 2001
Location: Around
Posts: 7,652
Default

Quote:
Originally Posted by Schmide View Post
Could you check the cache on the processors? My bet is the Mac is a T7400 with 4mb L2 cache and the Linux box is an E4600 with 2mb L2 cache.

If this is the case it may be beneficial to run the Linux box with one thread.
if that were the case, that's the biggest difference from cache i've ever seen.

most of the time gains are less than 2%. you're suggesting a 400% gain.
__________________
Hold me closer Tony Danza
Count the Headlights on the Highway
Lay me down in sheets of linen
You've had a busy day today.
Fayd is offline   Reply With Quote
Old 12-19-2012, 05:49 PM   #12
postmortemIA
Diamond Member
 
postmortemIA's Avatar
 
Join Date: Jul 2006
Location: Midwest USA
Posts: 6,500
Default

Interesting problem, profiler should help. But really, we're shooting in the dark without code. You might have to post your (proprietary) code to get the answer.
Netbeans IDE comes with C/C++ profiling support for Linux.

Are both apps 64-bit or 32-bit?
__________________
D1. Win7 x64 i7-3770 on Z77, HD7850, 2707WFP, 840, X-Fi D2. Win7 x64 E8400 on P35
L1. OSX 10.9 rMBP 13 L2. Vista x86 E1505
M. Galaxy S4

postmortemIA is online now   Reply With Quote
Old 12-19-2012, 07:14 PM   #13
degibson
Golden Member
 
degibson's Avatar
 
Join Date: Mar 2008
Posts: 1,389
Default

Clarification added to original quote, in [ ]
Quote:
Originally Posted by AluminumStudios View Post
@degipson - I saw that [pprof] but posted here hoping for a variety of ideas and to her personal preferences on the tools for the job.
I can vouch for pprof, i.e., it's my go-to tool for this sort of mystery.
degibson is offline   Reply With Quote
Old 12-19-2012, 11:36 PM   #14
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

@Schmide - the Linux machine has 2.4 Ghz Core2Duo P8600 with 3mb cache and the Macbook has a 2.16 Ghz Core2Duo T7400 with 4mb cache. Cache affects performance on memory heavy, number crunching apps (which this is), but I wouldn't expect this big of a difference between those two.

I've had bugs and unexpectedly slow functions dominate execution time which I've seen when I looked under "Sample Process" under Activity Monitor on the Mac in the past. I tried the Google performance analyzer (http://goog-perftools.sourceforge.ne..._profiler.html) as per the previous suggestion, but it didn't seem to reveal anything overly unusual. The functions I saw in the results seemed to occupy a reasonably expected percentage of execution time. It's not like a bug in one function was occupying an disproportionate amount of time. So the mystery continues ...

Any additional insights anyone might have are appreciated.
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-19-2012, 11:45 PM   #15
masteryoda34
Golden Member
 
Join Date: Dec 2007
Posts: 1,381
Default

Amount of cache and the way the OS handles memory cannot explain a 75% performance reduction.

I am going to guess that the compiler on OSX is automatically vectorizing your code to use SSE instructions while the linux compiler is not. You said your code involves a large amount of number crunching and is parallelizable, which leads me to believe it is indeed a candidate to use SSE. SSE instructions would perform 4 operations per instruction cycle vs 1. This also corresponds to your 1/4 performance level. Without using SSE, your CPU would still show 100% use. Basically, nothing you said would disprove my theory and the theory fits well. That being said its mostly a guess on my part. I would start by looking for options related to SSE/SIMD/vectorizing for g++ and try compiling with new switches.
masteryoda34 is offline   Reply With Quote
Old 12-20-2012, 12:04 AM   #16
Ancalagon44
Platinum Member
 
Join Date: Feb 2010
Posts: 2,160
Default

Is it mostly floating point or integer arithmetic?
__________________
Paul Atreides, Rand al'Thor and Luke Skywalker walk into a bar...

...and proceed to beat up Shinji Ikari for being a whiny little bitch.
Ancalagon44 is offline   Reply With Quote
Old 12-20-2012, 03:43 PM   #17
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

@Ancalagon44 - It does both actually. It does a lot of FP calculations with doubles in one stage, then int calculations in another. It constantly flips back and forth between these two stages.

I appreciate everyone's input, the Google performance tool will be helpful. Right now though I'm suspecting a bug or something else going on because I just noticed a small anomaly in the way the program executes. It periodically outputs a text status update which is fine on the Mac but one tiny detail is off when I run it on Linux. So I may have an issue with a data type or some library being different. I'll have to investigate it further, but it seems like it may be more than the simple performance issue I believed it was at first.
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-21-2012, 01:54 AM   #18
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Exclamation Problem solved!

I believe I've solved my problem, which is quite a different problem from what I thought it was!

My program reads some initial parameters from a text file. Some of them are of type bool and were read like this:

Code:
bool sourcediscrimination;
getline(infile, line);
position = line.find("="); 
parameter = line.substr (position+1);
if (parameter=="true")
     sourcediscrimination=true;
if (parameter=="false")
      sourcediscrimination=false;
This worked fine on OS X. If I "cout << sourcediscrimination;" on OS X it shows "0" for false or "1" for true. On Linux however I saw "55" or other strange numbers showing up for true when I tried cout to see what the values of bool variables were.

Some things that were written as "false" in my text parameter files were being somehow read as true, and this changed the behavior of my program leading to both the speed issue that I was initially confused about as well as the later anomalies in it's text output that I noticed.

As a quick fix I changed text "true" and "false" to 0 and 1 in my parameter file and read it with "sourcediscrimination==atoi(parameter.c_str()) ;" now. I know this probably isn't the optimal way for me to parse a test file, but I'll improve on that someday, I just want it to run for now.

Anyway, mystery solved. Thank you for your replies.
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-21-2012, 03:11 PM   #19
mv2devnull
Senior Member
 
Join Date: Apr 2010
Posts: 785
Default

Quote:
Originally Posted by AluminumStudios View Post
Code:
bool sourcediscrimination;
getline(infile, line);
position = line.find("="); 
parameter = line.substr (position+1);
if (parameter=="true")
     sourcediscrimination=true;
if (parameter=="false")
      sourcediscrimination=false;
As a quick fix I changed text "true" and "false" to 0 and 1 in my parameter file and read it with "sourcediscrimination==atoi(parameter.c_str()) ;" now. I know this probably isn't the optimal way for me to parse a test file, but I'll improve on that someday, I just want it to run for now.
That code sample has an issue: what does happen if 'parameter' contains neither "true" nor "false"?

'atoi()' may work, but I would consider std::istringstream
http://www.cplusplus.com/reference/s...istringstream/
Code:
bool sourcediscrimination;
getline(infile, line);
size_t position = line.find("=");
if ( position != string::npos ) {
  std::string parameter = line.substr (position+1);
  std::istringstream istr( parameter );
  istr >> sourcediscrimination;
} else {
  // No "=" found, so no boolean in this line
  sourcediscrimination = false;
}
The 'istr >> sourcediscrimination;' may still be nitpicky about unexpected values, but it at least skips whitespace by default.
mv2devnull is online now   Reply With Quote
Old 12-21-2012, 04:05 PM   #20
Cerb
Elite Member
 
Cerb's Avatar
 
Join Date: Aug 2000
Posts: 15,599
Default

That's bad. In the future, please never ever ever do that. Decide on what a true should be, and set it either true or false. FI:
Code:
bool sourcediscrimination;
getline(infile, line);
position = line.find("="); 
parameter = line.substr (position+1);
if (parameter=="true")
     sourcediscrimination=true;
else
     sourcediscrimination=false;
Always assume external inputs are bad, until they prove otherwise.
__________________
Quote:
Originally Posted by Crono View Post
I'm 90% certain the hipster movement was started by aliens from another galaxy who have an exaggerated interpretation of earth culture(s).
Cerb is offline   Reply With Quote
Old 12-22-2012, 12:10 AM   #21
AluminumStudios
Senior Member
 
Join Date: Sep 2001
Posts: 628
Default

I know the way my parameter file is handled isn't ideal, I've just been focusing my efforts on getting the number crunching portion of it to run (. I'm the only one who uses it, so I've been careful about keeping my parameter file in order, it's that is definitely on the to-fix list!
__________________
-William Milberry
AluminumStudios is offline   Reply With Quote
Old 12-22-2012, 12:37 PM   #22
Merad
Golden Member
 
Join Date: May 2010
Location: NC USA
Posts: 1,923
Default

Quote:
This worked fine on OS X. If I "cout << sourcediscrimination;" on OS X it shows "0" for false or "1" for true. On Linux however I saw "55" or other strange numbers showing up for true when I tried cout to see what the values of bool variables were.

Some things that were written as "false" in my text parameter files were being somehow read as true, and this changed the behavior of my program leading to both the speed issue that I was initially confused about as well as the later anomalies in it's text output that I noticed.
In addition to comments already made, this is why you should always initialize variables.

I imagine what happened is that that for some reason your file wasn't parsed correctly, so due to your use of if/if rather than if/else, the bool was never set. Unless the chunk of memory where the variable was allocate just happened to be zeroed, you'd probably end up with it being true (it's compiler specific, but generally 0 is false and non zero is true).

This may not seem like a big deal since bool can only be true or false and presumably, both are valid for your program, but with other variable types it could lead to some very random and hard to find bugs.
Merad is offline   Reply With Quote
Reply

Tags
g++, linux, os x, performance

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 04:42 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.