• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Very thorough, but concise, description of the Cray X1 supercomputer

THE CRAY X1 DESIGN

The overall design goal of the X1 is to provide both the historic high bandwidth of vector supercomputers along with the efficient scaling of MPPs. This translates into several specific design elements including:

* New custom processor architecture ? The system is designed around custom multi-piped vector processors, with 12.8 Gflops peak performance per processor (25.6 Gflops for 32-bit computations).

** Processors use a new Instruction Set Architecture (ISA) that is partially based on the MIPS ISA plus many additional instructions to support vector processing, special instructions and other enhancements (e.g. fixed instruction size, more registers, masked vector operations, large integer vector support, 32-bit data, and cache control).

** It carries over the multi-streaming concept introduced in the SV1. In addition, the processor design incorporates superscalar processing, integrated vector caches, and a decoupled microarchitecture that allows the processors to better tolerate memory latencies.

** Processors are configured with 8 vector pipes.

* Balanced high bandwidth memory systems ? The system is organized into four processor nodes, each of which contains 128 Rambus memory channels, for a local bandwidth of 200 GB/s. The nodes are connected by 16 parallel networks, providing 25 GB/s of point-to-point bandwidth. In maximum configurations, the network scales to over 4 TB/s of global bandwidth.

* Scalability ? The Cray X1 is designed to provide high performance at both small and larger processor counts. The system scales to 1,000?s of processors. The scalable address translation mechanisms and communication protocols were carried forward from the Cray T3E design.

CRAY X1 HIGHLIGHTS

The highlights of the new Cray X1 include:

* Scaling from 4 to over 4,000 processors

* Each processor is rated at 12.8 peak GFLOPS. The processors are constructed of four sets of scalar/vector units to create a MSP (Muliti-Streaming Processor).

The processor chips run at 800 MHz for the vectors units and 400 MHz for the scalar units.

** Providing 3.2 scalar GOPS and 12.8 vector GFLOPS per MSP processor (25.6 GFLOPS in 32-bit node)

* High bandwidth, low latency memory system design:

** Processor bandwidth to cache is 76 GB/s (50 GB/s for loads and 26 GB/s for stores)

** Peak bandwidth to local main memory is 51 GB/s per processor (38 GB/s sustained). Global interconnect main memory bandwidth is 102 GB/s per four processor/memory node board.

** I/O bandwidth is 4.8 GB/s per 4-processor node board and up to 75 GB/s per cabinet. Up to one I/O channel per processor. Each I/O channel is 1.2 GB/s full duplex, and is globally accessible by all processors in the machine.

** The latency to global memory is in the microsecond range in the largest configurations. Typical latency across a 512 processor system (128 nodes) is around one microsecond

* U.S. list pricing begins at $2.5 million

http://www.supercomputingonline.com/article.php?sid=3049
 
Originally posted by: Vespasian
Originally posted by: RaynorWolfcastle
* U.S. list pricing begins at $2.5 million

I'll take 2 please... bah, make that 4, I like round numbers
If I remember correctly, a 4096-processor X1 is somewhere around $250 million.

Excellent, I've got a couple of billion dollar bills between the cushions in my sofa, this should put them to good use.
 
Back
Top