Im certain that someone already came up with this but what the heck.
There have been alot of talk about misspredict branch penalty being one of the main problem for bulldozer performance.
CPU's branch predictor tries to predict which branch is going to be executed so it loads the most probable one in one pipeline/core.
Why doesn't it load both possible brahches in 2 different pipelines/cores, and then use the data from the pipeline/core that ended to be the right one, and flush the other?
Imagine you work in a veichle factory and you get an order that you need to make a car or a truck but you are not sure which one. You could load 1 assembly line with parts from the car or truck, whichever you find most probable. But you could also load 2 assembly lines , one with car parts and other with truck parts and when a final decision comes, on which veichle is to be made, you just finish the product on the line you need and you remove the parts from the other assembly line.
This solution would need more power since it would load 2 or more cores with only one thread but it would negate branch misspredict penalty that any CPU has , since there would be no missed branches, and would thus improve single thread performance. It would improve execution of any thread that has branch instructions as long as there are enough resources. Imagine runinng 2-3 thread aplication on 8 core CPU loaded 100%.
It would increase performance but it would use more power for the same task. Maybe user could chose if he wants to use branch predict mod which would be power optimized or load 2 cores mod which would be performance optimized.
Lord knows AMD could use more singlethreaded performance and their 8 core cpus have more then enough resources to execute the task. Those core's are not even 50% used most of the time.
My question is, why isn't this used in modern CPU arhititectures? Or is it? Can it be software implemented?
There have been alot of talk about misspredict branch penalty being one of the main problem for bulldozer performance.
CPU's branch predictor tries to predict which branch is going to be executed so it loads the most probable one in one pipeline/core.
Why doesn't it load both possible brahches in 2 different pipelines/cores, and then use the data from the pipeline/core that ended to be the right one, and flush the other?
Imagine you work in a veichle factory and you get an order that you need to make a car or a truck but you are not sure which one. You could load 1 assembly line with parts from the car or truck, whichever you find most probable. But you could also load 2 assembly lines , one with car parts and other with truck parts and when a final decision comes, on which veichle is to be made, you just finish the product on the line you need and you remove the parts from the other assembly line.
This solution would need more power since it would load 2 or more cores with only one thread but it would negate branch misspredict penalty that any CPU has , since there would be no missed branches, and would thus improve single thread performance. It would improve execution of any thread that has branch instructions as long as there are enough resources. Imagine runinng 2-3 thread aplication on 8 core CPU loaded 100%.
It would increase performance but it would use more power for the same task. Maybe user could chose if he wants to use branch predict mod which would be power optimized or load 2 cores mod which would be performance optimized.
Lord knows AMD could use more singlethreaded performance and their 8 core cpus have more then enough resources to execute the task. Those core's are not even 50% used most of the time.
My question is, why isn't this used in modern CPU arhititectures? Or is it? Can it be software implemented?