why isnt intel going back to one core?

Anarchist420 · Oct 18, 2013

wouldn't 1 core (and just one vector fp unit that is 32768 bit wide or something like that) be radically more parallel?

isnt partially idle cores that constantly have to shift speeds a waste?

will we ever go back to one core processors with simply one wide fat vector unit?

i don't think we can for compatibility reasons but i dont know for sure.

perhaps AMD never really invented anything completely brand new of long term use? since dual cores have kind of held things back in the long term and the fact that x64 is an extension to x86 i was thinking perhaps amd never invented something out of the box.

and i realize that i could never invent anything outside the box and i am not criticizing amd for not being able to either.

Exophase · Oct 18, 2013

No, this is spectacularly off the mark. Even GPUs deal in what are effectively several cores before you go down to the big fat vectors, and there are real reasons why big fat GPUs haven't taken over all of our processing work.

Threads need thread-level parallelism to work, vectors need data-level parallelism which is a lot more stringent. A really crude way of saying it is that with threads you can work on a diverse set of complex tasks independently, while with vectors you have to have to be working on the same task in lock-step over several iterations of data. There are fundamental limitations to what sorts of problems can work on both. And in a lot of ways threading is just easier, not that it's all around easy. You can also consider running multiple completely separate programs at the same time, and if you think that heavy multitasking isn't a real thing consider a virtualized system.

Even then your proposed 32768-bit vector size is crazy big, well beyond practical. If you look at the history of vector extensions in processors you'll notice that they tend to track L1 interface width more or less, eg Haswell brings 256-bit integer SIMD and increases the L2 interface to 256-bits (load + load + store). You don't want to try to make the interface width larger than the cache line, currently 64 bytes, so that puts a practical limit at 512-bits. Too few useful algorithms spend most of their time in vector registers without at least having to touch something like L1 cache, so you need bandwidth that can keep up.

We might see the cache line size go up to 128 bytes at some point but understand increasing cache line size is detrimental to random access performance and increases the size of internal buffers, latencies, other bad stuff. You definitely don't want 4KB cache lines.

OCGuy · Oct 18, 2013

Paging IDC...

videogames101 · Oct 18, 2013

OCGuy said:
Paging IDC...

and pm...

Exophase · Oct 18, 2013

You guys suck

hawtdawg · Oct 18, 2013

That was pretty interesting Exophase. I have to admit, I did not expect to learn much when I clicked on this thread.

dmens · Oct 18, 2013

Exophase said:
You definitely don't want 4KB cache lines.

Hey that's a good excuse to ask for that 20th metal layer I've always wanted.

Cogman · Oct 19, 2013

exophase hit the nail on the head. but just to add, the vast majority of software doesn't depend on number crunching performance. so what if you can add five billion numbers together in a single click cycle? That doesn't mean anything if you spend most of your time trying to check if the user has permission to talk with the database.

2is · Oct 19, 2013

I think anyone who agrees with Exophase automatically gains between 10-20 IQ points.

Turbonium · Oct 19, 2013

2is said:
I think anyone who agrees with Exophase automatically gains between 10-20 IQ points.

Am I allowed to agree with him without actually understanding what he said?

Sheep221 · Oct 19, 2013

Anarchist420 said:
wouldn't 1 core (and just one vector fp unit that is 32768 bit wide or something like that) be radically more parallel?

isnt partially idle cores that constantly have to shift speeds a waste?

will we ever go back to one core processors with simply one wide fat vector unit?

i don't think we can for compatibility reasons but i dont know for sure.

perhaps AMD never really invented anything completely brand new of long term use? since dual cores have kind of held things back in the long term and the fact that x64 is an extension to x86 i was thinking perhaps amd never invented something out of the box.

and i realize that i could never invent anything outside the box and i am not criticizing amd for not being able to either.

This is not correct

AMD invented quite lot of things that are part of most computers today, they introduced 64-bit CPUs, they integrated memory controller into CPU
were also first who released desktop dual core CPUs and more.

On your single core preference, it is not really a reason to go single core anymore as the TDP of multi core CPU is much more effective than that of single core CPU.

SOFTengCOMPelec · Oct 19, 2013

Interesting 'out of the box' idea, presented by the OP.
But I think that it is heading to be like a giant cpu like, gpu.
This would make it VERY difficult to program for, and relatively unsuitable for many types of tasks.

It might be implementable, by using a very deep pipeline (I'm NO expert on pipelines, so may be giving bad info here). The very deep pipeline, may allow amazingly high GHz cpu frequencies. But a pipeline which is VERY inflexible to stuff like branches, and has terrible pipeline stall delays, and other very undesirable effects.

Given AVX2 and its future improvements, the existing Intel desktop cpus, could be considered as already being and/or heading towards being 256/512/1024 bits of cpu width (depending on ones definition of exactly what this means).
Before other posters disagree with my mention of 1024 bits, I AM including potential future unreleased cpus, which MAY go to other bigger sizes. I.e. AVX2++/and-later-versions (which I have already seen mentioned in at least one place on the internet).

I don't think this idea will catch on, because it would probably lead to big power consumption issues, and is probably very poor at coping with highly single task software (which there is plenty of around).

--------------------------------------------------

N.B. This is a joke comment (so I don't get irate forumers, throwing stuff at me).

Apparently a research team in South Korea, are quietly working on just such a project.
There is very little information about it on the internet, but I have a couple of good papers on it.

It's using techniques which give the upcoming DDR4 (they are using), a huge (by today's standards) number of parallel data connections, along with a relatively ultra high bandwidth serial burst mode, resulting in amazing overall data rates.

Although it is considered a "single core" processor, architecturally it is split into 6 major symmetrical sections. Each having 7 separate ALU (I don't know the bit size of each sub-module, but it might be 1024 bits).
Each can perform up to 10 decodes per "long cycle".

Tecc = Total equivalent core count
Tecc = 6 (Modules) x 7 (ALUs) x 10 (decoders) = 42 x 10 = 420.
A = a
N=new
Arch = Architecture
I.S.T. = Integrated Symmetric Threads.

A.N.Arch.I.S.T.420 = Anarchist420 (for short).

This architecture may have to be abandoned because it uses far too much power, and is way too difficult to program for.

2is · Oct 19, 2013

Turbonium said:
Am I allowed to agree with him without actually understanding what he said?

That's my plan

_Rick_ · Oct 19, 2013

Sheep221 said:
AMD invented quite lot of things that are part of most computers today, they introduced 64-bit CPUs, they integrated memory controller into CPU

DEC Alpha and Transmeta Crusoe would like to have a word with you about that.

As for OP - wasn't IA64 vs AMD64 (and Intel's biggest failure to date) pretty much exactly what OP describes? At a smaller scale, but VLIW failed so spectacularly -- not necessarily due to technical merits -- that nobody really asked that question since.

And these days, the front-end of a CPU is so damn wide, and the inside so heterogeneous. And this is because you need to be able to be very fast in many situations, and not just excel at one thing.

Similarly, the future of Intel vs AMD will likely be decided in a battle, that AMD already lost once before, again nVidia, when they missed the Cuda train. Giving developers the programming interface to exploit all the custom logic in the chip, and having your approach be the standard is how you win that war.

A final note: Dual cores where the result of die space getting cheap, and frequency getting expensive. This is still the case, but we see more and more custom logic instead of extra cores, and I think that will be the way of the future. It also gels quite well with Intel's 2-for-1 rule, or whatever they used to measure performance increase versus power consumption. Better to have some "dark die" than something super wide that constantly eats power for no reason.

Anarchist420 · Oct 19, 2013

_Rick_ said:
Giving developers the programming interface to exploit all the custom logic in the chip, and having your approach be the standard is how you win that war.

so i guess you dont think D3D is a good standard?

btw, you did a wonderful job explaining to noobs like me thank you

SunnyD · Oct 19, 2013

_Rick_ said:
but VLIW failed so spectacularly -- not necessarily due to technical merits -- that nobody really asked that question since.

Uh, wut? Many facets of VLIW lives on quite happily today in pretty much every modern processor that is commercially available. I would say it's pretty much the farthest thing from "failing so spectacularly" that you can get.

Exophase · Oct 19, 2013

SunnyD said:
Uh, wut? Many facets of VLIW lives on quite happily today in pretty much every modern processor that is commercially available. I would say it's pretty much the farthest thing from "failing so spectacularly" that you can get.

What facets of VLIW are you thinking of? I can't think of a single thing in most modern CPUs that can be traced specifically to VLIW.

VLIW has had success in pretty domain-specific areas like DSPs and GPUs but it's never taken off in general purpose CPUs. It also isn't analogous to SIMD, and there are some cases where both VLIW and SIMD are used simultaneously.

Hulk · Oct 19, 2013

It's been said (written) may times here.

There is only so much instruction level parallelism that can be extracted from even the most cleverly written code. If you are waiting on the result of one instruction before you can process another then you can get as wide as you want, you'll still be waiting. This is why of course Intel came up with Hyperthreading, might as well run another thread while you are waiting so you can use that "wideness."

tweakboy · Oct 19, 2013

funny Im waiting for ray traced games

SunnyD · Oct 19, 2013

Exophase said:
What facets of VLIW are you thinking of? I can't think of a single thing in most modern CPUs that can be traced specifically to VLIW.

VLIW has had success in pretty domain-specific areas like DSPs and GPUs but it's never taken off in general purpose CPUs. It also isn't analogous to SIMD, and there are some cases where both VLIW and SIMD are used simultaneously.

If I remember right, weren't the AMD Athlon chips the first to move from a pure CISC approach to a more RISC like micro-op core that shuffled the CISC instructions around in order to make the pipelines more efficient using VLIW style techniques? Not pure VLIW, but still drawing from the concept.

jhu · Oct 19, 2013

SunnyD said:
If I remember right, weren't the AMD Athlon chips the first to move from a pure CISC approach to a more RISC like micro-op core that shuffled the CISC instructions around in order to make the pipelines more efficient using VLIW style techniques? Not pure VLIW, but still drawing from the concept.

No. Nexgen did that first back in 1994 with their Nx586. Also, it's translation of CISC to RISC operations, not VLIW. The CISC (x86) -> VLIW was Transmeta with their Crusoe and Efficeon processors back in the early 2000s.

Exophase · Oct 20, 2013

SunnyD said:
If I remember right, weren't the AMD Athlon chips the first to move from a pure CISC approach to a more RISC like micro-op core that shuffled the CISC instructions around in order to make the pipelines more efficient using VLIW style techniques? Not pure VLIW, but still drawing from the concept.

RISC and VLIW have nothing to do with each other and nothing about the way a modern x86 processor represents instructions internally has anything to do with VLIW. I recommend you spend some time reading Wikipedia or something on VLIW.

why isnt intel going back to one core?

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Platinum Member

Lifer

Diamond Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Belgian Waffler

Diamond Member

Diamond Member

Diamond Member

Belgian Waffler

Lifer

Diamond Member