Which is more important; L2 or L3 cache?

legocitytruck

Senior member
Jan 13, 2009
294
0
0
For a machine that will be primary used for rendering/ drafting and not gaming, what is more important, L2 or L3 cache and why?
 

vj8usa

Senior member
Dec 19, 2005
975
0
0
I'm afraid I'm not knowledgeable enough to give you a direct answer (though I'm guessing L2 would be a lot more important than L3, since it's my understanding that the [slower] L3 is mainly used when L2 is full in most apps), but I'm curious - why are you focusing on cache? You can use benchmarks to gauge performance across different CPUs. For instance, the best CPUs out there for rendering (to the best of my knowledge) are the i7s, and they all have the same amount of L2/L3 cache.

If you're just asking out of curiosity and not to make a purchasing decision, then disregard the above.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
(this is intentionally a very simplistic view of computer architecture)

Performance is basically the product the completed instructions per clock (IPC) and clockspeed (GHz).

Performance = IPC x GHz

Completed IPC's are a function of many many things, but one of those things is the efficiency of the underlying memory (cache) sub-system. L1, L2, and L3 caches as well as the ram and even the harddrive matters when it comes to IPC.

Every architecture has a peak theoretical IPC based on the hypothetical scenario where the L1 cache always has exactly what it needs (no L1$ misses)...but this peak IPC is rarely reached in real-world applications because L1$ size is quite small.

This is where L2$ and L3$ comes in. They are successively larger (L3$ > L2$ > L1$) and slower (L3$ < L2$ < L1$) in an intentional hierarchy designed to maximize practical/actual IPC when running real-world applications.

It is impossible for anyone to tell you, legocitytruck, which matters more in terms of keeping real-world IPC from degrading due to excessive L2$ or L3$ misses for your specific application unless you tell us the specific application...and even then we would mostly still be guessing unless your usage pattern was well characterized.

Generally L2$ and L3$ sizes and latencies are modeled in simulation during the design phase of a CPU's architecture for a variety of applications and usage patterns so that the resultant cache hierarchy represents a tradeoff between production costs and cpu performance (IPC). Only in rare instances do we get the chance to evaluate those design tradeoffs as they impact real world applications of interest to us (the recent AMD Phenom II X2 versus Athlon II X2 is a practical example of this, not that you'd buy either one for rendering apps because they are dual-core but they still help answer the question from purely a computer scientist point of view).

In reality, as vj8usa speaks to, the end-user is typically just interested in Performance, not the question of where that performance comes from (is it coming from L2$ hits or L3$ supporting L2$ misses, etc).

The performance (which takes into account clockspeed) is what you pay for, so knowing the performance of your app is really the question to seek an answer to.

edit: grammar/spelling fixed, kinda.
 

rogue1979

Diamond Member
Mar 14, 2001
3,062
0
0
Originally posted by: Idontcare
(this is intentionally a very simplistic view of computer architecture)

Performance is basically the product the completed instructions per clock (IPC) and clockspeed (GHz).

Performance = IPC x GHz

Completed IPC's are a function of many many things, but one of those things is the efficiency of the underlying memory (cache) sub-system. L1, L2, and L3 caches as well as the ram and even the harddrive matters when it comes to IPC.

Every architecture has a peak theoretical IPC based on the hypothetical scenario where the L1 cache always has exactly what it needs (no L1$ misses)...but this peak IPC is rarely reached in real-world applications because L1$ size is quite small.

This is where L2$ and L3$ comes in. They are successively larger (L3$ > L2$ > L1$) and slower (L3$ < L2$ < L1$) in an intentional hierarchy designed to maximize practical/actual IPC when running real-world applications.

It is impossible for anyone to tell you, legocitytruck, which matters more in terms of keeping real-world IPC from degrading due to excessive L2$ or L3$ misses for your specific application unless you tell us the specific application...and even then we would mostly still be guessing unless your usage pattern was well characterized.

Generally L2$ and L3$ sizes and latencies are modeled in simulation during the design phase of a CPU's architecture for a variety of applications and usage patterns so that the resultant cache hierarchy represents a tradeoff between production costs and cpu performance (IPC). Only in rare instances do we get the chance to evaluate those design tradeoffs as they impact real world applications of interest to us (the recent AMD Phenom II X2 versus Athlon II X2 is a practical example of this, not that you'd buy either one for rendering apps because they are dual-core but they still help answer the question from purely a computer scientist point of view).

In reality, as vj8usa speaks to, the end-user is typically just interested in Performance, not the question of where that performance comes from (is it coming from L2$ hits or L3$ supporting L2$ misses, etc).

The performance (which takes into account clockspeed) is what you pay for, so knowing the performance of your app is really the question to seek an answer to.

edit: grammar/spelling fixed, kinda.

Not trying to insult you, but that did absolutely nothing to answer the OP's question:confused:

I certainly don't know the answer, but there has to be somebody that has benchmarked rendering and drafting programs with several different cpu architectures. It should be possible to give us a general idea as to what performs the best in those specific types of applications.

 

daw123

Platinum Member
Aug 30, 2008
2,593
0
0
Originally posted by: rogue1979
Originally posted by: Idontcare
(this is intentionally a very simplistic view of computer architecture)

Performance is basically the product the completed instructions per clock (IPC) and clockspeed (GHz).

Performance = IPC x GHz

Completed IPC's are a function of many many things, but one of those things is the efficiency of the underlying memory (cache) sub-system. L1, L2, and L3 caches as well as the ram and even the harddrive matters when it comes to IPC.

Every architecture has a peak theoretical IPC based on the hypothetical scenario where the L1 cache always has exactly what it needs (no L1$ misses)...but this peak IPC is rarely reached in real-world applications because L1$ size is quite small.

This is where L2$ and L3$ comes in. They are successively larger (L3$ > L2$ > L1$) and slower (L3$ < L2$ < L1$) in an intentional hierarchy designed to maximize practical/actual IPC when running real-world applications.

It is impossible for anyone to tell you, legocitytruck, which matters more in terms of keeping real-world IPC from degrading due to excessive L2$ or L3$ misses for your specific application unless you tell us the specific application...and even then we would mostly still be guessing unless your usage pattern was well characterized.

Generally L2$ and L3$ sizes and latencies are modeled in simulation during the design phase of a CPU's architecture for a variety of applications and usage patterns so that the resultant cache hierarchy represents a tradeoff between production costs and cpu performance (IPC). Only in rare instances do we get the chance to evaluate those design tradeoffs as they impact real world applications of interest to us (the recent AMD Phenom II X2 versus Athlon II X2 is a practical example of this, not that you'd buy either one for rendering apps because they are dual-core but they still help answer the question from purely a computer scientist point of view).

In reality, as vj8usa speaks to, the end-user is typically just interested in Performance, not the question of where that performance comes from (is it coming from L2$ hits or L3$ supporting L2$ misses, etc).

The performance (which takes into account clockspeed) is what you pay for, so knowing the performance of your app is really the question to seek an answer to.

edit: grammar/spelling fixed, kinda.

Not trying to insult you, but that did absolutely nothing to answer the OP's question:confused:

I certainly don't know the answer, but there has to be somebody that has benchmarked rendering and drafting programs with several different cpu architectures. It should be possible to give us a general idea as to what performs the best in those specific types of applications.

I thought IDC answered the OP's question very well.

The OP wants to know what is more important (and why); L2 or L3 cache for rendering / drafting. The first part of IDC's post was a general explaination of how CPU performance is related to L1 / L2 / L3 cache and CPU speed. IDC then went on to explain that in order to answer the OP's question, we need to know the specific applications the OP will be using. Even then it will be a guess on our part as to whether more L2 or L3 wil benefit the OP because we are not privy to the results of the simulations run by the CPU manufacturer.

Edit: You are correct that benchmarks using chips with different cache sizes in the same system will help answer the OP, but there could be scenarios where the surrounding components (MB, RAM, etc.) 'skew' the results. i.e. It may be difficult to isolate the benchmark performance of the chip from the rest of the system.
 

legocitytruck

Senior member
Jan 13, 2009
294
0
0
In the price range in which I am looking their are two options;

1.) Intel Core 2 Dual with a high amount of L2 cache and no L3 cache
2.) AMD Phenom II with a lower level of L2 cache and a high level of L3 cache

I didn't know which was the better route.

 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: daw123
Originally posted by: rogue1979
Originally posted by: Idontcare
(this is intentionally a very simplistic view of computer architecture)

Performance is basically the product the completed instructions per clock (IPC) and clockspeed (GHz).

Performance = IPC x GHz

Completed IPC's are a function of many many things, but one of those things is the efficiency of the underlying memory (cache) sub-system. L1, L2, and L3 caches as well as the ram and even the harddrive matters when it comes to IPC.

Every architecture has a peak theoretical IPC based on the hypothetical scenario where the L1 cache always has exactly what it needs (no L1$ misses)...but this peak IPC is rarely reached in real-world applications because L1$ size is quite small.

This is where L2$ and L3$ comes in. They are successively larger (L3$ > L2$ > L1$) and slower (L3$ < L2$ < L1$) in an intentional hierarchy designed to maximize practical/actual IPC when running real-world applications.

It is impossible for anyone to tell you, legocitytruck, which matters more in terms of keeping real-world IPC from degrading due to excessive L2$ or L3$ misses for your specific application unless you tell us the specific application...and even then we would mostly still be guessing unless your usage pattern was well characterized.

Generally L2$ and L3$ sizes and latencies are modeled in simulation during the design phase of a CPU's architecture for a variety of applications and usage patterns so that the resultant cache hierarchy represents a tradeoff between production costs and cpu performance (IPC). Only in rare instances do we get the chance to evaluate those design tradeoffs as they impact real world applications of interest to us (the recent AMD Phenom II X2 versus Athlon II X2 is a practical example of this, not that you'd buy either one for rendering apps because they are dual-core but they still help answer the question from purely a computer scientist point of view).

In reality, as vj8usa speaks to, the end-user is typically just interested in Performance, not the question of where that performance comes from (is it coming from L2$ hits or L3$ supporting L2$ misses, etc).

The performance (which takes into account clockspeed) is what you pay for, so knowing the performance of your app is really the question to seek an answer to.

edit: grammar/spelling fixed, kinda.

Not trying to insult you, but that did absolutely nothing to answer the OP's question:confused:

I certainly don't know the answer, but there has to be somebody that has benchmarked rendering and drafting programs with several different cpu architectures. It should be possible to give us a general idea as to what performs the best in those specific types of applications.

I thought IDC answered the OP's question very well.

The OP wants to know what is more important (and why); L2 or L3 cache for rendering / drafting. The first part of IDC's post was a general explaination of how CPU performance is related to L1 / L2 / L3 cache and CPU speed. IDC then went on to explain that in order to answer the OP's question, we need to know the specific applications the OP will be using. Even then it will be a guess on our part as to whether more L2 or L3 wil benefit the OP because we are not privy to the results of the simulations run by the CPU manufacturer.

Edit: You are correct that benchmarks using chips with different cache sizes in the same system will help answer the OP, but there could be scenarios where the surrounding components (MB, RAM, etc.) 'skew' the results. i.e. It may be difficult to isolate the benchmark performance of the chip from the rest of the system.

Thanks for explaining that, very nicely summarized.