Hey all newbie here (3rd year ece),
So I've wondered why would AMD reduced the L1I cache. And have few options in mind , but I would love the be corrected.
1. the Mop cache isn't big enough for large L1I cache so it's (mop cache) get filled , and stuff just seating around [Does AMD have this thing intel have where they have two execution pipelines one [faster] from mop cache and one from L1I cache? ]. That's assuming the decoders are fast enough. For evidence, they did double the Mop cache.
2. if the decoders aren't fast enough , the L1 cache could be waiting for instructions to be decode while taking up place where needed? (let's say more mop cache)
So my questions =
1.If 1 is true , wouldn't it be better for instructions to wait in L1 to save L2 fething time? I guess it is but the trade off smaller mop cache isn't worth it considering power/silicon constraints.
2.The decoders should be really fast as they have to fill Mop cache fast enough and still dill with branches/long instructions and such. Does it mean the BPU should have the lowest latency , then decoders and then cache ?
3. The reorder buffer before the back-end is basically there to 'hide' front end latency? [or back-end execution latency so the from end would not stop?]
Cheers.
So I've wondered why would AMD reduced the L1I cache. And have few options in mind , but I would love the be corrected.
1. the Mop cache isn't big enough for large L1I cache so it's (mop cache) get filled , and stuff just seating around [Does AMD have this thing intel have where they have two execution pipelines one [faster] from mop cache and one from L1I cache? ]. That's assuming the decoders are fast enough. For evidence, they did double the Mop cache.
2. if the decoders aren't fast enough , the L1 cache could be waiting for instructions to be decode while taking up place where needed? (let's say more mop cache)
So my questions =
1.If 1 is true , wouldn't it be better for instructions to wait in L1 to save L2 fething time? I guess it is but the trade off smaller mop cache isn't worth it considering power/silicon constraints.
2.The decoders should be really fast as they have to fill Mop cache fast enough and still dill with branches/long instructions and such. Does it mean the BPU should have the lowest latency , then decoders and then cache ?
3. The reorder buffer before the back-end is basically there to 'hide' front end latency? [or back-end execution latency so the from end would not stop?]
Cheers.