Creating SMP board for non-SMP CPUs

Status
Not open for further replies.

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
I don't really know what I'm talking about, hence my post here for clarification.

This is my current understanding:

Traditionally the answer is simply that if the cpu doesn't support SMP (such as an i7-2600K) then it's not possible to put more than one in a system.

But what the SMP enabled CPUs (like the Xeon E5-2600 series) offer is interconnections and logic to allow 'glueless' SMP, which means the chipset doesn't really need to do anything special, which makes the chipset cheaper.

BUT what if you added 'glue' to the chipset? You see this sort of custom interconnect logic in supercomputers and high-end servers so it's clearly possible.

How difficult would it be and (more importantly) how much would it be to bring a similar capability to the enthusiast market?

Obviously it wouldn't be as efficient as the 'native' solution, but for some applications a 'ghetto' workstation with 4 i7-2600Ks at $1300 would be FAR more cost effective. To get comparable power with official SMP CPUs, you would need 2 E5-2687Ws at $1900 a pop, or $3800 total.
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
What you are talking about is cluster computing. It is a group of processors working together on a common task whether they are all on the same board or in different cases the concept is the same. You cant just put the extra CPU on a board and have the software work, the software has to be aware that each CPU is different and split the task among them through some sort of scheduling.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
What you are talking about is cluster computing. It is a group of processors working together on a common task whether they are all on the same board or in different cases the concept is the same. You cant just put the extra CPU on a board and have the software work, the software has to be aware that each CPU is different and split the task among them through some sort of scheduling.

No, I'm talking about actually gluing them together into an SMP system

Even if you ultimately had to create your own fake '16-core' cpu that would then manage farming out the threads to actual processors, there should be some way to do it.
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
No, I'm talking about actually gluing them together into an SMP system

Even if you ultimately had to create your own fake '16-core' cpu that would then manage farming out the threads to actual processors, there should be some way to do it.


It is easily doable but you still have to have some sort of scheduler in the programming to utilize the cores. There isn't a one size fits all piece of logic that can do that for all tasks. You can use a SOC to do scheduling but it wouldn't be optimal for all task and by the time you integrate the SOC into the bus system your cost advantage is gone. Multiple single core CPU on the same board were done years ago with the pentium 1 but it loses its cost effectiveness when you start scaling it up.
Here is a board I once used that does it:
http://www.tyan.com/archive/products/html/tiger133.html


In your cost estimate you neglect the cost of increasing the board layers and increasing the cost of the power supply components to power it all.
 
Last edited:

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
It is easily doable but you still have to have some sort of scheduler in the programming to utilize the cores.

Once you got the OS (Windows/Linux) to recognize the cores, wouldn't it take care of that?

In your cost estimate you neglect the cost of increasing the board layers and increasing the cost of the power supply components to power it all.

Yes, I was thinking that but forgot to explicitly write it down

With the monstrous prices that high-end intel cpus are going for, you can charge quite a premium for the board and still come out ahead
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
I don't really know what I'm talking about, hence my post here for clarification.

This is my current understanding:

Traditionally the answer is simply that if the cpu doesn't support SMP (such as an i7-2600K) then it's not possible to put more than one in a system.

It's a bit more than just interconnections and external glue. There are internal functions that SMP enabled CPUs have to ensure that the CPUs operate in a consistent fashion.

For example, if a CPU requests data from memory, it will first check it's internal caches, and if it isn't there, request the data from RAM.

In an SMP system, there has to be a way for a CPU which updates data in RAM, to broadcast that fact to any other CPUs, so that they know that the data in their cache is out-dated and not to be trusted.

On the old FSB type CPUs, this was easy - all CPUs connected to the same bus, and when a CPU detected another CPU transmitting data into RAM via the bus, the cache could see the new data going past and capture it. As this was relatively simple, it was included as a core part of most CPUs with SMP versions, the result was that you could run things like celeron CPUs in SMP.

Things are a bit trickier with modern CPUs which don't have an FSB, and integrate multiple core features. The 2500k can only communicate with RAM connected direct to its RAM pins. A Xeon E5, however, can communicate with RAM on its RAM pins, or it can forward a request via it's interconnect bus, to another CPU where that CPU will access the RAM on its behalf.

Even if you could somehow build a "north bridge" which emulated RAM, then how would it pass synchronisation messages between CPUs that are not designed to accept such messages.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Even if you could somehow build a "north bridge" which emulated RAM, then how would it pass synchronisation messages between CPUs that are not designed to accept such messages.

Don't they support other devices that write directly to RAM? (DMA = direct memory access right?)

So couldn't it just use whatever method it uses to synchronize with hardware devices that write to memory?
 

GammaLaser

Member
May 31, 2011
173
0
0
Support for I/O devices is typically done using methods like marking those memory regions as uncacheable. This would be a horrible idea to use for shared memory between two CPUs, though.

You might be able to build a system based around message passing, the problem will be that legacy software based on coherent systems will not work.

Like Mark says the big issue is that the client CPUs lack any interconnect which can be used to maintain cache coherency.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
Support for I/O devices is typically done using methods like marking those memory regions as uncacheable.

Can that be done on the fly? So only regions that are changed by one CPU are marked uncacheable on another? Then after so long without that block being modified it's marked cacheable again? If you consider that CPUs normally aren't accessing the exact same memory, if you can keep these 'uncacheable' areas reasonably small and short-lived the performance impact shouldn't be too horrendous

i guess you would have to disable write caching on the cpus though, which would hurt
 
Last edited:

GammaLaser

Member
May 31, 2011
173
0
0
Cache type can be set as a page attribute on x86, so it would have a minimum of 4KB granularity. The privileged code can change this on the fly although I am skeptical if it can be done while maintaining correctness and also have acceptable performance.

The other issue is atomic memory operations. Some instructions require the ability to do a 'read-modify-write' of a single memory location as a single step, and other processors should not be able to utilize that memory location while the atomic operation is in progress. This is important for implementing locks and other synchronization mechanisms because if multiple CPUs attempt to acquire a lock, only one of them should be able to get it, even if they try to acquire it at the same time.

In our case, we do not know if a read issued to memory is part of an atomic access that will later be followed by a write, in which case other processors should be invalidated on the read and they must also be blocked from using the memory location until the completion of the write.

So, I also am doubtful if one can properly implement atomic accesses in this case when there is no coherent communications between the sockets.
 

tynopik

Diamond Member
Aug 10, 2004
5,245
500
126
So, I also am doubtful if one can properly implement atomic accesses in this case when there is no coherent communications between the sockets.

It sounds like there needs to be a software component too. Something that sits above the OS and intercepts certain calls. Not that virtualizes the OS, but runs like a rootkit or that XP WPA crack that intercepted calls to ID the motherboard/bios
 

SecurityTheatre

Senior member
Aug 14, 2011
672
0
0
It sounds like there needs to be a software component too. Something that sits above the OS and intercepts certain calls. Not that virtualizes the OS, but runs like a rootkit or that XP WPA crack that intercepted calls to ID the motherboard/bios

In traditional SMP systems, the north-bridge and CPUs handled all of the cache and memory coherency issues internally. It really can be a hardware issue, but it generally requires support within the CPU and north-bridge, both of which are on-die now.

Even SMP supporting chipsets back in the day, could do nothing if the CPU didn't also have the SMP functions, which are non-trivial to implement.

I has to come from both sides, which is why it's so tricky. It MIGHT be possible to hack something together, but you're going to introduce latency and seriously decrease (if not eliminate entirely) caching efficiency, possibly to the point where you're actually slower after you're done than when you started, except for very specialized multi-threaded, low-bandwidth applications like small-set linpac or something.

Then again, I haven't done any CPU design in over 10 years, so I may be outdated.
 
Status
Not open for further replies.