Let's Design A Super Modern Computing Platform

GWestphal · Nov 21, 2013

Just a thought project, how to develop a super modern computer platform. Focus on the best or comparable open standards.

Hardware:+++++++++++++++++++++++++++++++++++++++++++

PSU: 12V/24V, deliver 12V/24V only to the board and buck it down to the appropriate voltages for peripherals, allowing for fewer, smaller wires for an more easily routeable connector.

Motherboard: mostly VRMs and sockets and slots since most function is on CPU/GPU/APU, but it will have the WiFi/BlueTooth/LTE chip, reversible lighting like connectors for SATA Express that carry power, power comes from the mobo instead of the PSU, just one cable per device that does signal/power,

Buses: HyperTransport 4.0 (though lets discuss the pros and cons of DMI 2.0, QPI, and PCI-Express)

GPU: The ideal would be to somehow combine the pros of nvidia and amd here. Maybe only use double precision. I'm not a GPU expert so if someone could enlighten me why single precision is primarily used that would be helpful.

Connectivity: Thunderbolt 2, USB 3.1, WiFi AC, Bluetooth 4.0, 10GB ethernet, mini lightning style connectors.

Audio: No audio specific connectors, have speaker/receivers transfer sound over USB.

CPU: based on ARMv8, 16 core, out of order, superscalar execution, mostly 64 bit registers but a few 128, 256 and 512 bit registers for special purpose, shared large 128 MB cache (some set aside for audio buffer, video buffer), on die GPU with a couple stream processors for audio processing that shares the cache, on die Southbridge and Northbridge, H.264 encode/decode ASIC, accelerometer/gyro, certain cores for running kernel/servers that can use cache for hardware accelerated message passing between servers, having a built-in FPGA would be nice for adding custom hardware acceleration/flexibility.

RAM: 2 DIMMs for SRAM Tier one storage, 8 DIMMs for GDDR5 DRAM (new connector style with reduced pins, perhaps vertically oriented so they take up less room on the board.)

Storage: SATA Express, SSDs, lets assume price is no option so no spinning disks are allowed here.

Software:+++++++++++++++++++++++++++++++++++++++++++++++++

Kernel: Exo or Micro Kernel

APIs: OpenGL 4.4, OpenCL 2.0, OpenAL, H.264, WebGL,

Languages: for machine level (C++ or C/RPython), Python, HTML5 <--- for basic kernel/OS, userland would have support for others

Filesystems: ZFS

Ben90 · Nov 21, 2013

Mine would take everything you said and double it.

silverpig · Nov 21, 2013

You're not designing anything, you're just listing specs.

What do you mean "languages"? Is that list exclusive? It won't run java? That thing won't be able to run the internet.

velis · Nov 21, 2013

Wow, this is a pretty ambiguous spec for a computer.
You list HW support only needed for handheld devices (e.g. gyro, accelerometer), on the other 24V and desktop-side transports and stuff.
So, which is it?

Not to say that one spec can't work for both, my current phone is more powerful than my one-before-previous gaming PC build was, but desktop-side components tend to take up space.

Proceeding to 12/24V: most of chips today are running on voltages 3.3V and below. Your PSU spec would just make the MB components heat up considerably while converting all those voltages. Additionally, it doesn't really matter if a peripheral sucks power from MB or from PSU, the cable has to be placed somewhere. The PSU is just a much better place to do all those voltage conversions and can do them a just as (if not more) efficiently as MB. If you are bothered by length of cables, maybe a better, more central-to-the-case position can be found for the PSU.

Yout MB design effectively transfers PSU functionality to the MB with addition that the MB would be a "convenient" router for bus traces for expansion slots. By this logic it wouldn't matter much if you just place the bus connectors to the PSU PCB, but that has some issues since the PCS has some currents hoing through it which might be bad for bus digital signals.

No comment about busses, pick something that's up-to-date and it's fine.
Connectivity: why limit yourself to a few technologies only? Will this design not support expansion slots / 3rd party add-in cards? Plus, you're listing competing technologies there, pick one and be done with it.

CPU: you are listing specs for a modern ARM SOC used today, but you up number of cores by a generation or so and add some caches from current x86 designs - also increased by a generation or so. This is not my idea of a high performance PC of today.

SRAM on a DIMM is not faster than DRAM on DIMM. Also, current CPU caches and prefetchers pretty much eliminate the RAM bottlenecks. Your huge on-die cache should be more than sufficient for 99.99% (pulled out of a dark place) hit rate.

No comment about storage either. I'd still like at least add-in cards to allow for my 8x3TB array which I won't soon see in the form of SSDs.

Software:
So lacking I really shouldn't comment on this, but I'll try to play nice:
Microkernels have their advantages as do monolithic ones. Since you place a lot of emphasis on multimedia APIs, you probably want a slightly fatter kernel than the leanest possible one.

Multimedia and HPC APIs aren't the only APIs needed for a general-purpouse PC to run. I won't even begin to list things that are required BEFORE the listed ones.

Languages: if you have a CPU, you can effectively have them all, why limit yourself so?

ZFS is not the GODs FS implementation. It has advantages as well as drawbacks. I would like my OS to give me option to choose the best FS for my needs.

Effectively what you're asking for in the software section is Windows / Linux with the special bits that bother you programmed in. I'm cool with that, but you should consider that an OS is a lot larger that what you list.

I suggest you buy yourself a current flagship phone. It comes pretty darn close to what you list here and does it more efficiently. Since you want Python, I'd suggest an Android flagship since that's about the only mobile OS that can run Python. I'm afraid it won't be 16 core though.

sm625 · Nov 21, 2013

You have to have a much faster storage subsystem. See A split NAND/DRAM bus. Using FDDR, you can share the same exact bus for DDR4 and NAND, with NAND and DDR dies stacked and multiplexed into the same package. And when NVRAM becomes cost effective, you simply replace that package with the NVRAM. We all know NVRAM is coming but computers today arent even ready for it at all. The operating system needs to be completely redesigned to avoid copying data from storage, because storage will be just as fast as main memory. iOS will probably be the first consumer operating system that makes use of truly unified memory architecture, but something like that should be seen first in a supercomputer memory architecture.

SecurityTheatre · Nov 21, 2013

Runs for 10 hours on battery, fits in my pocket.

...

GWestphal · Nov 21, 2013

That list was just a jumping off point. Those are just the first bare level thoughts. The point isn't to make it compatible with legacy, it's to make the most modern, legacy free computer that by default uses modern and open standards. Think of it as EPCOT, and the world of tomorrow. Showing how streamlined, efficient and open something could be.

If we're talking about something that would be commercially viable, then you'd need support for many additional languages etc.

GWestphal · Nov 21, 2013

velis said:
Wow, this is a pretty ambiguous spec for a computer.
You list HW support only needed for handheld devices (e.g. gyro, accelerometer), on the other 24V and desktop-side transports and stuff.
So, which is it?

Yes for a desktop the gyro or accelerometer wouldn't be useful but for laptops and tablets it most definitely would be.

Not to say that one spec can't work for both, my current phone is more powerful than my one-before-previous gaming PC build was, but desktop-side components tend to take up space.

Proceeding to 12/24V: most of chips today are running on voltages 3.3V and below. Your PSU spec would just make the MB components heat up considerably while converting all those voltages. Additionally, it doesn't really matter if a peripheral sucks power from MB or from PSU, the cable has to be placed somewhere. The PSU is just a much better place to do all those voltage conversions and can do them a just as (if not more) efficiently as MB. If you are bothered by length of cables, maybe a better, more central-to-the-case position can be found for the PSU.

The advantage here is that getting power from the motherboard enables the use of one cable solutions for both data and power, while powering from the PSU requires two.

The voltages here aren't set in stone, (likewise the voltage used in chips are often chosen because that's what is available, we use 12V and 5V because that's what the ATX stand power supply provides, if different voltages were provided, chips would be made to use those) but you have the buck down voltages somewhere, does it really matter if you do it on the PSU or the MOBO? The only difference is where you need the buck down converter. The system as a whole will still generate the same thermal waste. Plus if you deliver one higher voltage to the system you can use fewer thinner wiring.

Yout MB design effectively transfers PSU functionality to the MB with addition that the MB would be a "convenient" router for bus traces for expansion slots. By this logic it wouldn't matter much if you just place the bus connectors to the PSU PCB, but that has some issues since the PCS has some currents hoing through it which might be bad for bus digital signals.

Not sure what you were trying to say about buses here, but think that as more and more features are offloaded on to CPU/CPU/APU, the motherboard is really serving less and less of a purpose, so why not utilize some of that now empty space for PSU duties and enable fewer cables etc.

No comment about busses, pick something that's up-to-date and it's fine.
Connectivity: why limit yourself to a few technologies only? Will this design not support expansion slots / 3rd party add-in cards? Plus, you're listing competing technologies there, pick one and be done with it.

The point was to discuss the choices. But HyperTranport is open which is nice. Not sure if one is technically superior in someway that might matter. HyperTransport supports HTX for expansion slots.

CPU: you are listing specs for a modern ARM SOC used today, but you up number of cores by a generation or so and add some caches from current x86 designs - also increased by a generation or so. This is not my idea of a high performance PC of today.

I personally think ARM is a better CPU architecture than x86. It is much easier to make it low power if needed and is much simpler to fabricate. The key here would be designing and integration of the CPU/CPU/APU with how the kernel will run in this fictive system. Having many cores that have a shared super fast cache lends itself perfectly to a microkernel type design with servers and message passing.

SRAM on a DIMM is not faster than DRAM on DIMM. Also, current CPU caches and prefetchers pretty much eliminate the RAM bottlenecks. Your huge on-die cache should be more than sufficient for 99.99% (pulled out of a dark place) hit rate.

I don't know much about off chip implementations of SRAM, but it still might be valuable from a power standpoint, as it does not need to be refreshed, which should make it substantially lower power. But I'm certainly not a memory expert. I'm just trying to piece together what I have some vague knowledge on and want feedback from some others here that may know more about a particular topic.

No comment about storage either. I'd still like at least add-in cards to allow for my 8x3TB array which I won't soon see in the form of SSDs. That RAID function could be handled by ZFS with a dedicated core on the CPU for filesystem I/O. That would make a built in RAID for you.

Software:
So lacking I really shouldn't comment on this, but I'll try to play nice:
Microkernels have their advantages as do monolithic ones. Since you place a lot of emphasis on multimedia APIs, you probably want a slightly fatter kernel than the leanest possible one.

Again this is just a basic framework I'm laying out, it's not inclusive or exclusive.

Multimedia and HPC APIs aren't the only APIs needed for a general-purpouse PC to run. I won't even begin to list things that are required BEFORE the listed ones.

Yes of course there are many subsystems I don't have listed that are required.

Languages: if you have a CPU, you can effectively have them all, why limit yourself so?

That's just the basic support which the OS would be written in. Other languages would be supported in a viable, competitive type of product. You need something you can bootstrap with, C++ but I suppose you could use CPython or RPython for the cases where you need bare metal code. My choice of python is because of the readability, shorter length of code, and relatively well thought out nature of the language, with addition to having high performance options available.

ZFS is not the GODs FS implementation. It has advantages as well as drawbacks. I would like my OS to give me option to choose the best FS for my needs.

ZFS is easily the best currently available file system by a gigantic margin. ReFS and BTRFS are certainly big leaps forward, but they are not even close to feature parity with what ZFS can provide. The downside of course is that Oracle acquired it. With a little hardware acceleration from a dedicated core on the CPU, you'd have built-in better than RAID, bulletproof functionality.

Effectively what you're asking for in the software section is Windows / Linux with the special bits that bother you programmed in. I'm cool with that, but you should consider that an OS is a lot larger that what you list.

I suggest you buy yourself a current flagship phone. It comes pretty darn close to what you list here and does it more efficiently. Since you want Python, I'd suggest an Android flagship since that's about the only mobile OS that can run Python. I'm afraid it won't be 16 core though.

See Above

SecurityTheatre · Nov 21, 2013

GWestphal said:
See Above

Very hard to read without breaking up the quote....

Is this intended to be practical?

If so, there are tradeoffs to consider.

You can decide to go massively parallel on the CPU, or you can go long pipelines, with exotic branch prediction, speculative execution, OOP, massive cache, integrated GPU, etc, but you can't practically do both, so pick one.

Second, doing power conversion and power distribution on the Motherboard directly impacts the potential performance of chips. The noise from the power signals, as well as the heat from the converters would impact your headroom on the CPU and other components. This is a big reason why they're separated.

This "dream vs reality" is why I posted about 10 hour batteries above. While you're dreaming, might as well make it the size of an iPhone.

TuxDave · Nov 21, 2013

Ben90 said:
Mine would take everything you said and double it.

I double THAT. Ha!

Edit: I guess in Highly Technical I should to be productive. Take your homogeneous CPU and GPU instead make it heterogeneous with accelerators.

GWestphal · Nov 21, 2013

Would electrical noise be a problem is the buckdowns were on one side of the mobo and the cpu on the otherside also not sharing any traces outside of the correct input voltages from the buckdowns?

Bucking down on the mobo shouldn't change the thermal envelope by much should it? I mean the PSU you're going from 115AC to 12, 5, 3.3 VDC vs doing 115VAC to 12VDC to 1,2,5V, bucking down at those ranges should be pretty darn efficient, so would it really be a thermal concern? I would imagine if nothing else toss a heat sink on it.

I also envision the combined socket with CPU/GPU being a bit larger as well to accommodate 16 cpu cores @ 2.5 GHz and a full discrete GPU (~2,000 cores). Obviously will need some good hardware power management for powering up and down cpus and gpus, power islands and all that. Being on a 14nm process with 3D transistors will help some too I would imagine.

GWestphal · Nov 21, 2013

@Tux, what do you mean? isn't an on die GPU by definition heterogeneous? What do you mean by heterogeneous with accelerators that is different from what is outlined, maybe I'm missing some nuance here.

TuxDave · Nov 21, 2013

GWestphal said:
@Tux, what do you mean? isn't an on die GPU by definition heterogeneous? What do you mean by heterogeneous with accelerators that is different from what is outlined, maybe I'm missing some nuance here.

lol, ok I missed that in your original dream machine. You're still missing a big.little configuration for your general purpose part. Unless we're just assuming everything consumes roughly little to no power.

GWestphal · Nov 21, 2013

big.little might work, have the cores dedicated to certain kernel functions be little cores. Though for a desktop I'm not sure the big.little would be all that useful unless it significantly reduced complexity or power consumption.

SecurityTheatre · Nov 22, 2013

GWestphal said:
Would electrical noise be a problem is the buckdowns were on one side of the mobo and the cpu on the otherside also not sharing any traces outside of the correct input voltages from the buckdowns?

Bucking down on the mobo shouldn't change the thermal envelope by much should it? I mean the PSU you're going from 115AC to 12, 5, 3.3 VDC vs doing 115VAC to 12VDC to 1,2,5V, bucking down at those ranges should be pretty darn efficient, so would it really be a thermal concern? I would imagine if nothing else toss a heat sink on it.

I don't know enough about the PCB layup to answer these with confidence. I just know that in two of the four PCBs I've designed in the last 20 years were completely jacked up by poor power regulation and brushless motor attached to the power lines feeding junk back in the power bus and causing all sorts of interference issues.

I also envision the combined socket with CPU/GPU being a bit larger as well to accommodate 16 cpu cores @ 2.5 GHz and a full discrete GPU (~2,000 cores). Obviously will need some good hardware power management for powering up and down cpus and gpus, power islands and all that. Being on a 14nm process with 3D transistors will help some too I would imagine.

The size of the socket has really nothing to do with the feasibility of putting down bigger chips. The die size is the primary cost factor. The die size directly affects both number of dies per wafer (cost) and the yield (cost) as bigger chips have a much lower yield.

Thermals issues affect maximum clock speeds. Many chips, given better cooling, could be made higher speed with more aggressive pipelining, or just simply higher voltage, but the cost to latency and more importantly, TDP, are high.

Basically, with air cooling, you have a pretty hard limit of something like 125W of thermal energy, unless you want to anchor 30 pounds of copper in a wind tunnel that sounds like a jet engine. Sure, advances in process and even things like 3d transistors give a little more leeway and maybe a modern architecture would be practical at 16 cores in a few years, but the limitations of electron tunneling and a few other issues have really become a serious problem.

I do like the idea of moving to an array of special purpose parallel units (like GPUs) combined with 2-4 single purpose multipurpose exotic processing units. I think this is the way to go. Ultimately, the GPU will be seen to be almost like a coprocessor and will begin to become more and more critical to normal usage. That sort of massive, but speciality computation is good stuff and I think my dream CPU architecture would use something like 2-4 "fancy" cores like a modern i7, and then have an integrated "coprocessor" that was designed to have, say 100 matrix calculators and a few dozen of a couple other types (similar to a GPU, but with some other capabilities).

Then, you can use the GPU for graphics in games or rendering and the physics engine can use the dedicated coprocessor GPU-thing, or even for things like raytracing or other similarly parallalizable operations.

I'm also not a fan of the idea of flattening RAM and storage. The tiered system of L1, L2, RAM, disk introduces all sorts of complexity, but that tradeoff is worth it, both in cost and just raw speed capability.

Nonvolatile systems will always be slower than volatile ones, I think, because of the simple physics of non volatile devices and the relative simplicity of volatlie ones.

In addition, by reducing the distance between components (L1 within the pipeline, L2 at the perimeter of the chip, RAM in very nearby sockets, storage in some discreet unit that can be picked up and moved around... you increase speed and decrease cost.

The time it takes to travel down a cable to the drive (even at the speed of light, assuming perfect signalling and perfect processing) is slower than the speed of modern internal caches.

At 5Ghz, light only travels 30mm roundtrip per cycle. You CANNOT have something operating at that speed further than 30mm away (more practically, given real materials, about 5mm away), ruling out some people's stated dream (not here, obviously) of CPU-speed external storage transmission.

Anwyay, I'm just rambling now. :-D Where was I?

meh.

bryanl · Nov 22, 2013

Large 8000 x 8000 display to eliminate the need to scroll across screens, yet easily fits in a pocket.

Includes a search engine that actually works.

Edible.

NTMBK · Nov 22, 2013

It needs to have the solution.

SecurityTheatre · Nov 22, 2013

bryanl said:
Large 8000 x 8000 display to eliminate the need to scroll across screens, yet easily fits in a pocket.

Includes a search engine that actually works.

Edible.

:awe:

I bow to our new edible overlords.

GWestphal · Nov 22, 2013

Let's stay on topic, if you you don't have anything constructive to add, then go to another thread folks, this isn't off topic, no nef'ing allowed here.

bryanl · Nov 22, 2013

GWestphal said:
Let's stay on topic, if you you don't have anything constructive to add, then go to another thread folks, this isn't off topic, no nef'ing allowed here.

I was completely serious regarding 2 of the 3 points. Screens are still far inferior in resolution to paper, making it necessary to display characters much larger to see the same level of detail, but that requires scrolling through the page rather than seeing it in its entirety. Search engines are still much inferior to expert humans.

Zodiark1593 · Nov 23, 2013

GWestphal said:
Let's stay on topic, if you you don't have anything constructive to add, then go to another thread folks, this isn't off topic, no nef'ing allowed here.

My issue with ARM, none of their consumer-solutions even come close to a Core i7 in both single threaded and multi-threaded performance, let alone when you factor in overclocking. That and lack of slot for GPU upgrades. Granted, my needs are vastly different than 98% or consumers out there, which brings me to my next point.

What exactly is the goal of your "computing platform"? Are you targeting the average user, high performance user, business user, etc? The CPU spec you're looking at says one thing (low power, good enough for average joe), but the gpu spec and power consumption says another (hardcore gamer). And that's not even getting into form factor, which seems to be missing entirely and is a rather important detail to any Compute Platform standard.

Red Squirrel · Nov 23, 2013

My idea of an open platform is a mainboard that is literally just a very high pin count BUS board, typically they would be 19" long and a bit less than 2U high. They'd be maybe like 8" long A typical machine would be a 2U short case where the front is all slots and the back is the mainboard, and there would be some fans in the back of the mainboard. Now the cards would consist of processor, ram, hard drives as well as other peripherals. You could mix and match anything you want but you'd need processor and ram minimum to get a machine that can POST... Obviously a video card too to see stuff, but you'd be able to remove it if you want. Everything would also be hot swap. Now some stuff, less so like the cpu and ram but there could be a standard software instruction set to handle it by putting a component offline first. Ex: a multi cpu system you could take one cpu offline, then pull the card out to put a bigger one in. Same with ram. Cards would typically have at minimum a LED indicator light to indicate if it's online or not. Others, like hard drive would have activity LED and failure LED, on top of the standard online one.

You can add as many processor cards or ram cards as you want. Everything would just work together. So if you are building a supercomputer you perhaps want to fill every slot with processors. Processor cards would have sockets for the actual cpu chips. Some would be built in, some would not. Small form factor machines would perhaps have more stuff on a single board. Not all of them would be 19" wide. perhaps some smaller ones would only be 3-4 slots. They would vary. The board would also have a bus bar for various DC voltages but everything would typically use -48vdc. Typically 1 card would be a PSU that takes AC and converts to -48v, but in data centre/telco setups you'd just feed the DC directly through a pass through card that the DC cables connect to. Smater ones would perhaps have fuses/voltage regulators to ensure the voltage is right.

Lot of boards would also be designed in a way where you can have cards on the front and back, ex: in server environments you'd have a couple PSU cards on the back, perhaps the processor and ram cards, and the front would be all hard drives, but it's up to the installer to decide the layout as it would not matter what slot you put what in. Some cards would perhaps take two slots because of physical size or amount of bandwidth it needs. Basically everything would speak a common language on the bus and work together.

It would also be able to take multiple boards and link them together as one, via either a copper card with ribbon cable or more expensive fibre card that simply extends the bus over. You could add multiple of these cards so you have redundant links. So if you take 20 boards and fill them with processors and link them, it would be one system, with that many processors.

The language used, and the specs would be open, and manufacturers would have to make their cards compatible with it. It would be their choice if they want to make their card hardware open or not but it would be encouraged.

Some really high end boards would perhaps use fibre between each card, for an extremely fast bus speed. The physical locations of cards would also slightly alter performance. For example, you'd typically want the processor and ram cards together.

This would also minimize the amount of cables, thus make airflow and case designs very simple and compact, since all you have is cards. This is optimized for rackmount, but would work for desktop/towers too, just with smaller boards that have less slots. A compact computer system with 3 slots would be the size of a dictionary.

A motherboard would not necessarily have to be slots that are lined up either. Laptops could also use this standard, the "chassis" would have a board specifically for it, and there would be 2-3 slots on both sides, front, and back. The side would perhaps have the processor and ram, back could have various peripheral cards like USB. There would be great availability of cards that work in any system. USB cards, PS2 cards, external SATA cards etc...

My idea was actually inspired by 70's tech, the DMS:

What I described is not how the DMS works though. Some stuff is modular but not as modular as I said.

GWestphal · Nov 23, 2013

I had a similar idea once buy I think single chip solutions are the future. The distances and multiple slot interconnects would be pretty killer to that design I think. It would be modular but not as performant.

the goal here was to make a legacy free open standards platform that also fixes some over the overlooked details like cabling etc. I'm an enthusiast so I designed this to meet my needs, but it could scale up and down a lot. But form factor is more of a mass market concern.

well an i7 has 25+years being a desktop processor whereas ARM has almost no desktop usage. obviously you're comparing 5w mobile to 85w desktop and the desktop has a pipeline twice the length. Increase pipeline and power in arm and my guess is you could see similar performance. But at this point having many lower power but fast enough cores that are dedicated to hardware acelleration seems like it would be a better design, to me at least.

Red Squirrel · Nov 23, 2013

Actually it could be that way too, in higher performance systems you could have a card where you put your own chips, and there would be a chip standard too. I never really considered the distances, I guess when it comes to today's clock speeds and performance distances actually does matter, so chances are what you'd end up with is typically 1 card would be the "system" (what is today a motherboard) and then the other slots would be expansion for other stuff like hard drives and so on.

NTMBK · Nov 24, 2013

If you want to start splitting a server into multiple rackable units, you might be interested in some stuff Intel are working on: http://semiaccurate.com/2013/04/09/...e-architecture-and-rack-disaggregation-plans/ It's a pretty interesting view of the future.

Let's Design A Super Modern Computing Platform

Golden Member

Platinum Member

Lifer

Senior member

Diamond Member

Senior member

Golden Member

Golden Member

Senior member

Lifer

Golden Member

Golden Member

Lifer

Golden Member

Senior member

Golden Member

Lifer

Senior member

Golden Member

Golden Member

Platinum Member

No Lifer

Golden Member

No Lifer

Lifer