So the AMD Trinity Devastator will have 512 MB dedicated RAM

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
yes it is, even the embedded GPU have exact same spec, even tough its come with MCM config.

and with current chip i don't think trinity will have stacked memory (maybe for Bobcat it can).
 

opethfan

Junior Member
Nov 7, 2011
16
0
0
Would it make more sense for AMD to just introduce a socket with support for quad channel DDR3 memory, at 2166MHz? That'd certainly reduce memory bottlenecks for Fusion chips.

On a related note, would having quad 2166 DDR3 provide a big enough performance improvement to allow BD to run with a reduced L3 cache?
 

drizek

Golden Member
Jul 7, 2005
1,410
0
71
Anand seems to think that AMD can lop the L3 cache off of Zambezi completely without having much of an effect on desktop performance.
 
May 11, 2008
20,138
1,149
126
the eDRAM performs pixel functions?

According to wikipedia it does :

http://en.wikipedia.org/wiki/Xenos_(graphics_chip)

500 MHz 10 MiB daughter embedded DRAM (@256GB/s) framebuffer on 90 nm process[citation needed].
NEC designed eDRAM die includes additional logic (192 parallel pixel processors) for color, alpha compositing, Z/stencil buffering, and anti-aliasing called “Intelligent Memory”, giving developers 4-sample anti-aliasing at very little performance cost.
105 million transistors [2]
8 Render Output units
Maximum pixel fillrate: 16 gigasamples per second fillrate using 4X multisample anti aliasing (MSAA), or 32 gigasamples using Z-only operation; 4 gigapixels per second without MSAA (8 ROPs × 500 MHz)
Maximum Z sample rate: 8 gigasamples per second (2 Z samples × 8 ROPs × 500 MHz), 32 gigasamples per second using 4X anti aliasing (2 Z samples × 8 ROPs × 4X AA × 500 MHz)[1]
Maximum anti-aliasing sample rate: 16 gigasamples per second (4 AA samples × 8 ROPs × 500 MHz)[1]

From Beyond 3D :

http://www.beyond3d.com/content/articles/4/3

The one key area of bandwidth, that has caused a fair quantity of controversy in its inclusion of specifications, is that of bandwidth available from the ROPS to the eDRAM, which stands at 256GB/s. The eDRAM is always going to be the primary location for any of the bandwidth intensive frame buffer operations and so it is specifically designed to remove the frame buffer memory bandwidth bottleneck - additionally, Z and colour access patterns tend not to be particularly optimal for traditional DRAM controllers where they are frequent read/write penalties, so by placing all of these operations in the eDRAM daughter die, aside from the system calls, this leaves the system memory bus free for texture and vertex data fetches which are both read only and are therefore highly efficient. Of course, with 10MB of frame buffer space available this isn't sufficient to fit the entire frame buffer in with 4x FSAA enabled at High Definition resolutions and we'll cover how this is handled later in the article.

Despite references to 192 processing elements in to the ROP's within the eDRAM we can actually resolve that to equating to 8 pixels writes per cycle, as well as having the capability to double the Z rate when there are no colour operations. However, as the ROP's have been targeted to provide 4x Multi-Sampling FSAA at no penalty this equates to a total capability of 32 colour samples or 64 Z and stencil operations per cycle.

Most PC graphics processors have to balance their output with the available bandwidth and as such their ROP units usually only cater for 2 Multi-Samples per pixel in a single cycle, and the Z output doesn't double with the number of Multi-Samples being produced either. Z and colour compression techniques are also employed in order to get close to the output capabilities with the bandwidth available. ATI's calculations lead to a colour and z bandwidth demand of around 26-134GB/s at 8 pixels with 4x Multi-Sampling AA enabled at High Definition TV resolutions. The lower end of that bandwidth figure is derived from having 4:1 colour and Z compression, however the lossless compression techniques are only optimal when there are no triangle edges intersecting a pixel, but with the presumed high geometry detail within a next generation console titles the opportunities for achieving this compression ratio across the entire frame will be reduced. So, with 256GB/s of bandwidth available in the eDRAM frame buffer there should always be sufficient bandwidth for achieving 8 pixels per clock with 4x Multi-Sampling FSAA enabled and as such this also means that Xenos does not need any lossless compression routines for Z or colour when writing to the eDRAM frame buffer.

So, as far as the operation is concerned, once pixel data has come through the shader array and is ready to be processed into colour values in memory the Z data of the pixel is matched with the correct colour data coming out of the shaders. Xenos supports an "Alpha to Mask" feature, which allows for the use of Multi-Sampling for sort-independent translucency. All of this processing is performed on the parent die and the pixels are then transferred to the daughter die in the form of source colour per pixel and loss-less compressed Z, per 2x2 pixel quad. The interconnect bandwidth between the parent and daughter die is only an eighth of the eDRAM bandwidth because the source colour data value is common to all samples of a pixel here, and the Z is compressed. Once on the daughter die the pixels are unpacked to their Multi-Sample level and each sample is driven through their Z and Alpha computations and the final data is stored on the eDRAM until either the entire frame or current tile (we'll cover this in more detail later) being rendered is finished.

When the frame or tile has finished rendering, the colour data will then be resolved on the daughter die, with the Multi-Samples being blended down to their pixel level. The resolved buffer information is then passed back from the daughter die to the parent which then outputs to system RAM such that, when all the tiles are finished, this can then be outputted to the display device. Although the resolved colour data has to be stored in system RAM, which uses some bandwidth during the transfer, the efficiency of the write as the resolved data comes out of the daughter die to be written to system RAM is very high. This high efficiency is due to the fact that it is dealing with a significant quantity of non-fragmented data and the bus isn't as busy with lots of other bandwidth consuming, high frequency and inefficient frame buffer read / write / modify operations for the back buffer. This helps in alleviating the fact that the parent die is also handling system memory requests. Also note that data can be written to the eDRAM at the same time as it is being cleared from the previous data that resided there, meaning there should be little to no wait when removing the previous data from the eDRAM ( We've heard comments from developers familiar to both designs that this element of Xenos bears similarities to the "Flipper" design for Nintendo's Gamecude, a part that was originally designed by ArtX, who of course were subsequently purchase by ATI, however ATI are keen to point out that while there may be apparent similarities the designs are entirely independent as there are distinct virtual and physical barriers between the groups working on the various console developments, past and present, and no members of the Flipper architecture team were involved in Xenos's development).

As all the sampling units for frame buffer operations are multiplied to work optimally with 4x FSAA this is actually the maximum mode available. Although the developer can choose to use 2x or no FSAA, there are no FSAA levels available higher than 4x. The sampling pattern is not programmable but fixed, although it does use a sample pattern that doesn't have any of the sample points intersecting one or another on either the vertical or horizontal axis. Although we don't know the exact sample pattern shape, we suspect it will be similar to that seen on other sparse sampled / jittered / rotated grid FSAA mechanisms we've seen over the past few years, such as this.

The ROP's can handle several different formats, including a special FP10 mode. FP10 is a floating point precision mode in the format of 10-10-10-2 (bits for Red, Green, Blue, Alpha). The 10 bit colour storage has a 3 bit exponent and 7 bit mantissa, with an available range of -32.0 to 32.0. Whilst this mode does have some limitations it can offer HDR effects but at the same cost in performance and size as standard 32-bit (8-8-8-8) integer formats which will probably result in this format being used quite frequently on XBOX 360 titles. Other formats such as INT16 and FP16 are also available, but they obviously have space implications. Like the resolution of the MSAA samples, there is a conversion step to change the front buffer format to a displayable 8-8-8-8 format when moving the completed frame buffer portion from the eDRAM memory out to system RAM.

The ROP's are fully orthogonal so Multi-Sampling can operate with all pixel formats supported.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,474
1,964
136
Are those type of memory chips (MCM configuration) capable of being engineered with the same wide bus as the stacked on interposer memory?

Or is the bus narrower with MCM?

Stacked Memory (running lower voltages) with wide bus on interposer vs MCM memory (with better cooling) using whatever bus it uses :)

Can anyone explain?

Interposers can (theoretically) provide truly staggering bus widths. Having tens of thousands of traces is a-ok on an interposer. MCM is essentially made with the same tech as the board of a GPU, so it has the same kind of constraints. Given the target market of Fusion products, this will however likely not be an issue.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
I have so little faith in AMD that I actually think intel could deliver a knock out blow right now. Imagine a 32 EU intel gpu with a 512 bit SI memory bus and 4-8 GB of stacked RAM. It would end up being faster than any amd apu simply by virtue of its high memory bandwidth and low latency.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
I have so little faith in AMD that I actually think intel could deliver a knock out blow right now. Imagine a 32 EU intel gpu with a 512 bit SI memory bus and 4-8 GB of stacked RAM. It would end up being faster than any amd apu simply by virtue of its high memory bandwidth and low latency.

but they just don't want to do that, because they are lazy right?

intel is just now learnig to make gpus, deal with it :twisted:
 

micrometers

Diamond Member
Nov 14, 2010
3,473
0
0
Here's what I"m wondering...

Could the AMD trinity devastator play old games okay?

Like, I want to play Homeworld. I love that game! But it's like DX6 or DX7. Anyone installed old games on a Llano machine?
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
I have so little faith in AMD that I actually think intel could deliver a knock out blow right now. Imagine a 32 EU intel gpu with a 512 bit SI memory bus and 4-8 GB of stacked RAM. It would end up being faster than any amd apu simply by virtue of its high memory bandwidth and low latency.

But can they do it at the margins they're comfortable at? I (personally) don't think so. At least not yet.


Here's what I"m wondering...

Could the AMD trinity devastator play old games okay?

Like, I want to play Homeworld. I love that game! But it's like DX6 or DX7. Anyone installed old games on a Llano machine?

Isn't it going to be VLIW4? Do newer Radeon boards have issues playing older games? My 5650 has never had any trouble playing relatively old games (ST Armada, Armada II, Elite Force, Freelancer).
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
Here's what I"m wondering...

Could the AMD trinity devastator play old games okay?

Like, I want to play Homeworld. I love that game! But it's like DX6 or DX7. Anyone installed old games on a Llano machine?

yes, i have tried it playing really old games (earth worm jim, nfs, CnC,) just okay, and no problem what so ever.