cbn
Lifer
- Mar 27, 2009
- 12,968
- 221
- 106
I read the NVidia paper a few months ago, and IIRC, they were applying this to compute space, not Graphics, and a major focus of the research was limiting the inter communications between the modules.
So you do still have the memory duplication issue for textures for graphics with this approach. Each GPU tile has its own memory controller, and they all likely need much the same textures. So either you duplicate the textures in each chips memory pool (thus wasting memory) or you treat it as one big pool but with lot more latency, and huge contention issues given the huge appetite for texture memory.
I am not convinced having a SYS + I/O chip solves the need for any SW involvement either. If that was the case, why couldn't standard GPU be built with a slightly more robust SYS + I/O section that was switchable between master/slave to make doing that kind of thing on Dual GPU cards, where one chips SYS + I/O runs the GPU portion from both chips? Instead dual GPU card always ended up requiring CF/SLI software and were just as problematic as dual cards.
Anyway, there certainly wouldn't bother doing this in CPU + GPU package, there would be no need/point for so much GPU power that they would need multiple GPU dies, they would have a hard time supplying the power and cooling such a beast would need.
Here is the paper:
http://research.nvidia.com/sites/default/files/publications/ISCA_2017_MCMGPU.pdf
3.1 MCM-GPU Organization
In this paper we propose the MCM-GPU as a collection of GPMs
that share resources and are presented to software and programmers
as a single monolithic GPU. Pooled hardware resources, and shared
I/O are concentrated in a shared on-package module (the SYS +
I/O module shown in Figure 1). The goal for this MCM-GPU is to
provide the same performance characteristics as a single (unmanu-
facturable) monolithic die. By doing so, the operating system and
programmers are isolated from the fact that a single logical GPU
may now be several GPMs working in conjunction. There are two
key advantages to this organization. First, it enables resource sharing
of underutilized structures within a single GPU and eliminates hard-
ware replication among GPMs.Second, applications will be able to
transparently leverage bigger and more capable GPUs, without any
additional programming effort
Figure 3 shows the high-level diagram of this 4-GPM MCM-
GPU. Such an MCM-GPU is expected to be equipped with 3TB/s
of total DRAM bandwidth and 16MB of total L2 cache. All DRAM
partitions provide a globally shared memory address space across
all GPMs.
This, in contrast, to this...
2.2 Multi-GPU Alternative
An alternative approach is to stop scaling single GPU performance,
and increase application performance via board- and system-level
integration, by connecting multiple maximally sized monolithic
GPUs into a multi-GPU system. While conceptually simple, multi-
GPU systems present a set of critical challenges. For instance, work
distribution across GPUs cannot be done easily and transparently and
requires significant programmer expertise
and this.....
Alternatively, on-package GPMs could be organized as multiple
fully functional and autonomous GPUs with very high speed in-
terconnects. However, we do not propose this approach due to its
drawbacks and inefficient use of resources.
So Nvidia certainly intends for their MCM GPU to be seen transparently by software as one big GPU.