Help me understand how much power is in a modern computer

Leros

Lifer
Jul 11, 2004
21,867
7
81
I come from a EE background. I've been programming firmware on embedded microprocessors in assembly and C. I've been working on systems in the 8MHz - 24MHz range. I've been writing firmware which means that my code is the only thing running on the processor, there is no OS. This makes it really easy to understand the performance of my software because I can literally count cycles. "This routine looks like it takes about 500 cycles, it needs to run every 20,000 cycles, we're good!"

A couple of years ago, I got a 72MHz embedded board and I felt like I had unlimited processing power. So, if that gives you an idea.

Now, times have changed. I'm writing Java code which runs on the Java Virtual Machine. I'm running on a computer with 4 cores at 3GHz, with a multitasking OS. I have absolutely no idea how much power this equates to.

To give an example. I was writing a loop which iterated over 1 million integers and did about 10 lines of conditional logic for each. I was very concerned about the performance since, in my mind, this was an absolutely huge loop. I asked somebody at work about it and they said something like "its only a million integers, its incredibly fast, don't worry about it". Turns out they're absolutely right. The performance impact is nearly negligible to my program's performance.

So to break down my lack of understanding:
- I cannot translate lines of code into CPU cycles, because I'm writing Java
- Say I knew the cycle count for the JVM, I don't know how this translates into real cycle count on the underlying CPU.
- Say I knew the real cycle count, I don't understand what that would actually mean since I'm in a multitasking OS

Can anybody offer some insight? I don't need hugely detailed explanations. Ultimately I only need to understand basic performance (eg. can I loop over 1 million integers and not slow down the program?).
 
Last edited:

Crusty

Lifer
Sep 30, 2001
12,684
2
81
You should be able to find published numbers for the the millions of instructions per second your CPU can do. That's going to give you the biggest sense of how fast it really is. If you look at some ASM code that does something similar to your loop above it's pretty easy to see why it's so fast.

When using gcc you can use -S to only compile the code to assembly and then look at it in a text editor. Probably easier to understand if you turn off optimizations.
 

Leros

Lifer
Jul 11, 2004
21,867
7
81
You should be able to find published numbers for the the millions of instructions per second your CPU can do. That's going to give you the biggest sense of how fast it really is. If you look at some ASM code that does something similar to your loop above it's pretty easy to see why it's so fast.

When using gcc you can use -S to only compile the code to assembly and then look at it in a text editor. Probably easier to understand if you turn off optimizations.

I'm sure I can figure out the number of cycles a C or C++ program takes and gain some understanding there. At least a maximum performance benchmark for myself.

How do I go about understanding the loss of performance due to Java and the multitasking OS?
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
I'm sure I can figure out the number of cycles a C or C++ program takes and gain some understanding there. At least a maximum performance benchmark for myself.
Really, you can't Modern CPUs have LOADS of tricks implemented that embedded systems most likely don't have. Pipelining, Out of order processing, etc. It really is pretty hard to predict how many instructions per second a given CPU will do as the code structure has a lot to do with it.

This is also, btw, what makes optimizing for a modern CPU somewhat difficult. You can treat the CPU like an in-order processor and MAYBE get some performance that way. You can attempt to mix ALU access and FPU access (which is probably the most successful strategy when applicable).

If you are using an x86 processor with C++, you can access the RDTSC counter which sort of gives the number of clock cycles that have past.
http://en.wikipedia.org/wiki/Time_Stamp_Counter

How do I go about understanding the loss of performance due to Java and the multitasking OS?
It is sort of hard to do. for simple things like for loops you can essentially assume that it compiles down to the same stuff you would see had you done this in C++. But for more advanced things like class work, you pretty much can't make any assumptions to what the code will eventually compile down to.

With Java, you basically do what you need to do and use a new algorithm if that doesn't work. It is nearly impossible to improve performance by changing the way you write your code (whereas in c++ is it very possible to do this).
 

Cogman

Lifer
Sep 19, 2000
10,286
147
106
When using gcc you can use -S to only compile the code to assembly and then look at it in a text editor. Probably easier to understand if you turn off optimizations.

With the GCC, no optimization will actually produce some pretty unreadable code :D. The easiest to read, In my opinion, is -O2. -O3 is hard to read as that is the level where the compiler starts adding a lot of extra code.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Modern CPUs are much more powerful by comparison to what you have been using, OP. So much so that some of the abstractions you came to love in the embedded world aren't going to hold in the big leagues. In particular, modern high-performance x86 cores are full of invisible and difficult-to-predict performance optimizations, such that intuition about 'procedure x takes Y cycles' can start to break down.

E.g., your 'new' Java-based platform:
-> Runs on an out-of-order processor, which dynamically re-orders individual instructions to run when they're ready to run, regardless of whether earlier instructions are completed;
-> Runs on a superscalar processor, which will execute several independent instructions concurrently;
-> Runs on a voltage- and frequency-scaled processor (in all likelihood), that will vary its clock rate according to its load and power envelope;
-> Runs on a processor that dynamically predicts branch outcomes well in advance of their actual execution;
-> Has a 2- or maybe 3-level caching hierarchy; a cache hit will take about 3 "cycles"*, a cache miss will take hundreds;
-> Probably has other thread contexts (colloquially, "cores"), which can interfere with performance by eating memory bandwidth;
-> Has virtual memory, which will invisibly (and sometimes, catastrophically) alter your memory access patterns
-> Runs on a JVM, which may, at any time, choose to interpret your code, compile your code, re-compile your code, etc., as the JVM sees fit;
-> Runs on a timesharing OS, which may, at any time, choose to deschedule or relocate your code
-> And more, but I'm a little rushed this morning :)

* Many of the above features break down the abstraction of 'event X takes Y cycles'. E.g., because two independent instructions will often execute concurrently, their executing 'times' overlap. Hence "cycles" in quotations.

Most importantly, performance depends on the code -- not only on the number of instructions. Theoretically, a modern x86 can execute 2-3 instructions per cycle, but in practice they almost never do because of inter-instruction dependences, cache misses, and/or functional hazards.

But don't despair! There are a lot of great tools out there for analyzing performance in these settings. Just Google for them. It's just that they don't often use 'cycles' as their metric.
 

tatteredpotato

Diamond Member
Jul 23, 2006
3,934
0
76
Your question about the performance of a JVM is impossible to answer. The JVM has many different implementations and type of implementations that can change performance significantly.

EDIT: But typically yes you can loop over 1M integers. Loops are just O(n), which is fast.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,705
4,664
75
Since most of this has been gone over at length, I'll mainly direct you to further reading: Jon Bentley's Programing Pearls (not to be confused with Larry Wall's Programming Perl. ;)) A "back-of-the-envelope" calculation should tell you that 1M integers on a multi-GHz processor isn't going to be a problem. Assume 1 cycle per instruction, worst-case, a 10x slowdown for Java worst-case (that's a pretty bad JVM), and 100 instructions in your loop, and it still takes less than 1/3 second! Plenty of other useful stuff in there as well, like big-O notation examples.

For C/C++ performance, look no further than Agner Fog's software optimization resources.
 

Schmide

Diamond Member
Mar 7, 2002
5,745
1,036
126
Up until now you've been a resident in memory at a level of processing where memory/cache/CPU are close. On a modern system, the cap on processing power starts at the disk and memory. When you go from 30-100 MB/s (disk) to 7-20 GB/s (ram) to 20-400 GB/s (cache), doing 10-20 operations per datum is basically free relative to the data load/save times, even with the JVM overhead.

Until you get into really complex operations with sparse memory access. Forget about it. (IPC that is)