how to measure my computer in FLOPS?

trOver

Golden Member
Aug 18, 2006
1,417
0
0
is there anyway to measure a common desktop pc for the number of FLOPS it can do? is there a program w/ a conversion?
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
yes, there are programs to do that. Also, it depends if you mean in theory or in real life, these are two completely different numbers.
 

futuristicmonkey

Golden Member
Feb 29, 2004
1,031
0
76
Originally posted by: Special K
Originally posted by: CTho9305
For AMD CPUs, it's 3*MHz.

Why?

I bet he's thinking IPC (instructions per clock(cycle)) -- but then it'd be 3,000,000 per MHz :confused:.

Thing is, that's probably for the newer variants. The old socket 462's (Athlon XP's) I *think* had 9 IPC.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: futuristicmonkey
Originally posted by: Special K
Originally posted by: CTho9305
For AMD CPUs, it's 3*MHz.

Why?

I bet he's thinking IPC (instructions per clock(cycle)) -- but then it'd be 3,000,000 per MHz :confused:.

I meant 3*Hz :). 1GHz => 3 GFLOPS.

Thing is, that's probably for the newer variants. The old socket 462's (Athlon XP's) I *think* had 9 IPC.

No. Since K7 (i.e. the original Slot-A Athlons), AMD processors have been able to do 3 floating point operations per cycle (an add, a multiply, and a miscellaneous op). Hmm... I wonder if that only counts as 2 FLOPs. I'm not sure if any of the ops that use the fstore pipe do useful work. You're right that you could execute 9 total ops in one cycle (3 FP, 3 agen, 3 integer), but you can't sustain that - I don't think the reorder buffer can retire that many in a cycle (I think it can only retire 3). Of those 9 ops, only 3 can be floating point.

Appendicies A and B of the K7 optimization guide are a very good read.

Slide 10 here says 4 FLOPs per cycle for Barcelona.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: BrownTown
yes, there are programs to do that. Also, it depends if you mean in theory or in real life, these are two completely different numbers.
Can you point one out? I'd be interested to see what this old beast is capable of...
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: BrownTown
however in real world code you'd be doing good to get an IPC of 1.

You can write pretty high-throughput code for scientific-type apps.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: CycloWizard
Originally posted by: BrownTown
yes, there are programs to do that. Also, it depends if you mean in theory or in real life, these are two completely different numbers.
Can you point one out? I'd be interested to see what this old beast is capable of...

burnK7 should get pretty close to the theoretical maximum - edit the source code to print out the performance.

The problem here is that FLOPS isn't really meaningful on its own - the CPU can get arbitrarily close to the theoretical limit if you optimize the code right and keep it in the cache. In that situation, the rest of the system would be irrelevant. On the other hand, you could have an application that doesn't even fit in RAM, and your FLOPS would be limited by the speed you can read data from your HD. Basically, any K7 or K8 CPU (i.e. Athlon, Duron, Sempron, Turion, Opteron) at a given clock speed will have the same FLOPS limit regardless of what system it is in / how old or new stuff is. The newer ones might support more interesting operations (e.g. newer SSE instructions), but that doesn't actually change the FLOPS rating directly.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
One benchmark that has been traditionally used to measure FLOPS is Linpack. Another common FLOPS benchmark is Whetstone.

More reading:
The Wikipedia "FLOPS" entry: http://en.wikipedia.org/wiki/FLOPS
The Wikipedia Whetstone entry: http://en.wikipedia.org/wiki/Whetstone_%28benchmark%29
The Wikipedia Linpack entry: http://en.wikipedia.org/wiki/Linpack

I thought the entry for FLOPs was pretty poor. If I have time I might try to add more to it... it starts off ok, but there's not a lot of meat to it...


Benchmarks:
(win) Whetstone is included as part of the Win64.zip suite: http://freespace.virgin.net/roy.longbottom/win64.htm
(win) Sisoft Sandra (the arithmetic benchmarks measures FLOPS):
http://www.sisoftware.co.uk/
(win & linux) Intel's Linpack benchmark (scroll down to the download section): http://www3.intel.com/cd/software/products/asmo-na/eng/266857.htm
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: CTho9305
burnK7 should get pretty close to the theoretical maximum - edit the source code to print out the performance.

The problem here is that FLOPS isn't really meaningful on its own - the CPU can get arbitrarily close to the theoretical limit if you optimize the code right and keep it in the cache. In that situation, the rest of the system would be irrelevant. On the other hand, you could have an application that doesn't even fit in RAM, and your FLOPS would be limited by the speed you can read data from your HD. Basically, any K7 or K8 CPU (i.e. Athlon, Duron, Sempron, Turion, Opteron) at a given clock speed will have the same FLOPS limit regardless of what system it is in / how old or new stuff is. The newer ones might support more interesting operations (e.g. newer SSE instructions), but that doesn't actually change the FLOPS rating directly.
Interesting. I guess I never thought about it that much, but I suppose there can't be one simple metric for performance since it depends on so many parameters. I'll just have to see if I can cut down the ~hour of CPU time my MATLAB simulations take to run on this old thing by buying a new computer some day. :D
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Originally posted by: CycloWizard
Interesting. I guess I never thought about it that much, but I suppose there can't be one simple metric for performance since it depends on so many parameters. I'll just have to see if I can cut down the ~hour of CPU time my MATLAB simulations take to run on this old thing by buying a new computer some day. :D

I dunno how much you know about processor benchmarking, but if I am gonna look at the performance of a new processor I cant list at least 20 different benchmarks that would HAVE to be performed before I could even think about talking intellegently about a processor. Different types of code have HUGE differences in how they run on different architectures and what they are sensitive too. Its important to note that at any given time most of the resources on your chip are sitting around doing absolutely nothing waiting for information from the cache or memmory, waiting for the results of previous instructions, or just plain waiting for the type of instruction they compute.

 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: BrownTown
I dunno how much you know about processor benchmarking, but if I am gonna look at the performance of a new processor I cant list at least 20 different benchmarks that would HAVE to be performed before I could even think about talking intellegently about a processor. Different types of code have HUGE differences in how they run on different architectures and what they are sensitive too. Its important to note that at any given time most of the resources on your chip are sitting around doing absolutely nothing waiting for information from the cache or memmory, waiting for the results of previous instructions, or just plain waiting for the type of instruction they compute.
Well, I'm not a computer/electrical engineer. I'm just a dork who does lots and lots of simulations/optimization routines on my home computer when I'm away in the lab doing experiments. I've been debating upgrading for a while before I finish my experiments and start doing the simulation stuff more full-time to save myself a lot of wasted time.

The most taxing simulation I do is solving a system of 4 nonlinear coupled PDEs using a large finite difference mesh. It wouldn't be so bad, but I take the solution (which depends on certain parameters, of course), then apply some optimization algorithm to fit it to data I have collected, so it solves the large system of equations many times. Since finite differences require a certain time step:spatial step ratio for stability and accuracy, I have to run about 10^6 time nodes to get 20 spatial nodes. Really, I need about 100 spatial nodes, and the time nodes needed scale as the square of the number of spatial nodes, so... it can take days to solve on my P4 2.4 GHz (OC'ed from 1.6 :D) with 768 MB PC2700 RAM. The current scale of the problem is not larger than the available memory, but to get the desired accuracy would require significantly more RAM. I'm looking at upgrading to a new system with possibly 4 GB, especially if I can get my advisor to pay for it. :p I'm just wondering how much time I could shave off the simulation by upgrading to a new processor/RAM/whatever. Maybe this should be its own thread...