• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

multi processor systems?

CountZero

Golden Member
I'm looking for information on the design of multiprocessor sytems. I'm interested in systems akin to multiprocessor mobos not widely distributed systems. I've found some basic information but nothing remotely in depth with regards to whats been tried.

For instance a problem with a shared memory system would be caching information that was modified by another processor how has this problem been solved?

For the curious I am thinking of putting together a multiprocessor system with some cpus in our lab but I know next to nothing about what it takes to make such a system.

Anything that would at least point me in the right direction would be a big help as I don't seem to be having any luck what-so-ever.

Thanks!
 
For the curious I am thinking of putting together a multiprocessor system with some cpus in our lab but I know next to nothing about what it takes to make such a system.
Assuming you're talking about anything from an original Pentium or newer, it's just "plug the CPUs in, turn it on". You don't have to worry about cache coherence - the hardware handles it automagically. If you're still interested in what goes on behind the scenes, you could search google for "cache coherence site:.edu" (without quotes) and look at lecture slides - they tend to have a lot of good info.

The very very short version of how cache coherence is done:
There are two main techniques in use (these are extremely simplified explanations).
1. "Snooping". All CPUs share a bus. Whenever a CPU loads a memory location into its cache, it tells the other CPUs. This way, other CPUs can make sure they aren't doing something with that same location that will cause a problem. Most snoop-based implementations nowadays use a protocol with "MOESI" states (another term to google if you want more detail).
2. Directories. There is a directory somewhere. Whenever a CPU needs a memory location, it asks the directory where to get it. The directory could say "CPU x already has it - ask that CPU for the data", or "nobody is currently using that location" and the CPU will fetch the data from memory and tell the directory that it now is using that location.

Snooping is usually used for smallish systems (<32-64 CPUs) because you need a shared bus, and you run into bandwidth limits when a lot of CPUs access memory at the same time... plus, more CPUs on a bus means the bus is physically longer, which limits its speed. Directory-based systems are usually largers systems, and they don't necessarily have a shared bus.

If you're interested in more detail on one or both, I'd be happy to explain in more detail.

There's a whole other class of multiprocessor machines that don't use shared memory - message passing machines. I'm assuming you're not interested in those.
 
was there ever really much of a problem with cache coherence in single processor systems? i know there were issues with performance on cache misses, but i hadn't heard of a cache miss being a complete miss.. at least not since the 10 page faults i got per day when i was using win98, which obviously was not caused by microsoft's genius idea to integrate a web-browser into the user's shell, consequently increase system security.

like ctho said.. the main problem encountered in these kinds of systems is cache coherence. in a fully coherent system, it should look exactly the same regardless of where you are. another thing you'd have to make sure of is maintaining that state. one way u can reduce the likelihood of losing it is by synchronizing operations on the shared components for the nodes. it's the same idea behind thread synchronization.. but it might not be the best, easiest, or the fastest solution.

the easiest, cheapest, and most effective mechanism that u could utilize, if possible, would probably be disabling the cpu from caching data.. i don't know if the instruction cache would get disabled along with it though. if it does, then 1 proc w/cache could end up performing better than 4.
 
Originally posted by: dmens
This is the standard text used in many courses, including the one I went through. Pretty good book.

http://www.amazon.com/gp/product/155860...nce&n=283155&%5Fencoding=UTF8&v=glance

Shared memory is king.

We used that book too. It's really wordy though, so it can be slow reading (not slow as in pages-per-minute, slow as in information-per-minute).

was there ever really much of a problem with cache coherence in single processor systems?
Not really. The only thing you have to watch out for is a device doing a DMA transfer (to memory) - if you cached those locations, you need to invalidate those entries. You could also just do the DMA to an uncacheable memory area.
 
Thanks for all the great responses!

I am definitely interested in the behind the scenese information. When you say pentium or newer do it automagically I take it you mean the cpu itself has the SM system built in?

I read about the snooping system and I think I have the general idea. Haven't read about the directory system but I'm more interested in smaller number of cpus anyways.

I am interested in message passing machines but mostly as a contrast to SM machines as my understanding is that generally SM is superior.
 
Originally posted by: CountZero
When you say pentium or newer do it automagically I take it you mean the cpu itself has the SM system built in?
Yes. Most recent processors have it built in.
 
Back
Top