Hello Tegra X1: So long Tegra K1, we hardly knew ye

Bateluer

Lifer
Jun 23, 2001
27,730
8
0
http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview

This brings us to Tegra X1, NVIDIA’s latest SoC which was announced today at CES. While largely a continuation of the strategy begun with Tegra K1, there’s still a lot of ground to cover. One of the bigger surprises is the CPU configuration. While Tegra K1 had a version with NVIDIA’s custom Denver core for the CPU, NVIDIA has elected to use ARM’s Cortex A57 and A53 in the Tegra X1. The A57 CPU cluster has 2MB of L2 cache shared across the four cores, with 48KB/32KB L1s (I$+D$) per core. The A53 cluster has 512KB of L2 cache shared by all four cores and 32KB/32KB L1s (I$+D$) per core. NVIDIA representatives stated that this was done for time to market reasons.

Surprised we don't have a thread on this yet. Looks like a decent improvement over K1 . . . but weren't the only K1 based devices the Nexus 9 and Nvidia's own Shield tablets? 20nm part though.
 
Last edited:

Bateluer

Lifer
Jun 23, 2001
27,730
8
0
Yeah, TSMC has company in the 20nm club.

The articles I've skimmed on this don't mention where the X1 is being made, but I think TSMC is a pretty safe bet. GloFo signed a deal with Samsung for 16/20nm nodes, and I haven't heard anything about Nvidia licensing with Samsung or Intel.
 

Graze

Senior member
Nov 27, 2012
468
1
0
The K1 is the most impressive chip outside of Apple A8X on the market now.
If the X1 could make it to market in time it would great for Nvidia.
 

Kneedragger

Golden Member
Feb 18, 2013
1,192
45
91
I had high hopes for K1 using XBMC/Kodi but since there was never any support for VDPAU using Linux I forgot about it. Hopefully Nvidia changes their mind with this one...
 

TreVader

Platinum Member
Oct 28, 2013
2,057
2
0
K1 Denver was so buggy as to be useless. That is why they are going with A57/A53, because the K1 in the Shield Tablet had a fraction of the issues that were shown in the Nexus 9.



I don't know how anybody could expect this to be impressive. I'm sure nvidia will do their usual ARM treatment and BS a bunch of benchmarks to fool people into buying it.
 

Commodus

Diamond Member
Oct 9, 2004
9,210
6,809
136
K1 Denver was so buggy as to be useless. That is why they are going with A57/A53, because the K1 in the Shield Tablet had a fraction of the issues that were shown in the Nexus 9.

I don't know how anybody could expect this to be impressive. I'm sure nvidia will do their usual ARM treatment and BS a bunch of benchmarks to fool people into buying it.

I'm hopeful, but skeptical. NVIDIA has a tendency to talk big about chips that take forever to show up and don't quite live up to the hype. And of course, like most companies, it cherry-picks benchmarks. Jen-Hsun Huang made it look like even the Tegra K1 was faster than the A8X in the iPad Air 2, let alone the X1... well, not quite. If you looked at more than the three benchmarks NVIDIA showed at CES, you knew the iPad had the superior chip.

Not to mention that it's hard to brag about the Tegra X1 given that it's not shipping in anything yet. Winning on paper means jack squat if there's nothing to show for it. How many months will it take before there's an X1-based tablet? Is it even efficient enough for a phone?
 

TreVader

Platinum Member
Oct 28, 2013
2,057
2
0
I'm hopeful, but skeptical. NVIDIA has a tendency to talk big about chips that take forever to show up and don't quite live up to the hype. And of course, like most companies, it cherry-picks benchmarks. Jen-Hsun Huang made it look like even the Tegra K1 was faster than the A8X in the iPad Air 2, let alone the X1... well, not quite. If you looked at more than the three benchmarks NVIDIA showed at CES, you knew the iPad had the superior chip.



Not to mention that it's hard to brag about the Tegra X1 given that it's not shipping in anything yet. Winning on paper means jack squat if there's nothing to show for it. How many months will it take before there's an X1-based tablet? Is it even efficient enough for a phone?

Do you know what's funny? When I wrote this, I didn't even know about the X1 benchmarks.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
The K1 is the most impressive chip outside of Apple A8X on the market now.

I would call it the biggest letdown in mobile in years.

I was hoping the K1 would be the AMD64 to the world of mobile. Instead it is smoke and mirrors.
 

Graze

Senior member
Nov 27, 2012
468
1
0
I would call it the biggest letdown in mobile in years.

I was hoping the K1 would be the AMD64 to the world of mobile. Instead it is smoke and mirrors.


Dafuq you forgot about the Tegra 2 and 3. How those were real let downs


The K1 Might have been a let down to you but where it came to GPU might it bested everything from Qualcomm and was toe to toe with Apple's offering
 
Last edited:

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
Dafuq you forgot about the Tegra 2 and 3. How those were real let downs

Agreed. The Tegra 2 lacking Neon was fail, and the Tegra 3's heat profile and GPU was fail. Tegra is fail, which sucks because I am a huge Nvidia fanboy.

I always though part of that was because Nvidia had to use off the shelf parts, and as soon as they made their own core ala Apple it would be better. I mean, Samsung's SoCs have sucked before because of generic ARM problems so its not just Nvidia.

The problem is they didn't make their core right. Instead of putting a damn i3 under a microscope and stealing Intel's best IPC secrets they get cute with optimizations that don't work in many cases. They spent years designing that core, and I would rather have the generic ARM 64 bit one lol.

The K1 Might have been a let down to you but where it came to GPU might it bested everything from Qualcomm and was toe to toe with Apple's offering

That would be great, but seeing as how a small fraction of the Android market will have GPU power anywhere near that for a while it will end up being wasted. I mean sure the few games Nvidia pays for as demos will work and look great like every Tegra generation, but greater developer interest will be limited especially compared to the aforementioned A8X. Heck, it is hard to find many Android games that really max out the Galaxy S4 GPU, let alone this monster Nvidia made. By the time we actually see the majority of high-end Android games assuming that level of power (because your average S7 or whatever is twice as powerful) the 2GB of RAM might be the thing holding the K1 back.
 

sweenish

Diamond Member
May 21, 2013
3,656
60
91
I'll be curious to see how Maxwell in the X1 compares to the GPU in the Snapdragon 810.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
I'll be curious to see how Maxwell in the X1 compares to the GPU in the Snapdragon 810.

The 810 dev plattform gets 24FPS in the Manhatten benchmarks of GFXBench.
The TX1 dev plattform archives 63FPS.
 

t1gran

Member
Jun 15, 2014
34
1
71
Can someone explain this?
However, the company’s PR technologists decided to compare FP16 performance of Tegra X1 to FP64 performance of ASCI Red supercomputer, the world’s first 1TFLOPS supercomputer. The problem is that while FP16 is enough for certain graphics applications today (still, loads of apps use full FP32 precision), it is definitely not enough for any kind of high-performance computing applications. So, comparing FP16 performance to FP64 performance is clearly an apples to oranges kind of comparison. But an important thing is that Maxwell architecture was not designed for supercomputers, its FP64 rate is about 1/32 of its FP32 rate. So, if 256 SPs inside the Tegra X1 can offer 512GFLOPS at FP32, then its FP64 rate is 16GFLOPS.
If "Tegra X1 can offer 512 GFLOPS at FP32", and at the same it's performance in FP16 is 1024 GFLOPS, then FP32 = 2 x FP16. So why in Maxwell architecture "its FP64 rate is about 1/32 of its FP32 rate" and not 1/2? Does it really depend on architecture, or it's a mathematical rule?
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Tegra X1 has 256x FP32 and 8x FP64 units. One FP32 unit can process 2 FP16 operations.

So Tegra X1 is able to archive 1024 GFLOPS with FP16, 512 GFLOPs with FP32 and 16 GFLOPs with FP64 operations.
 

t1gran

Member
Jun 15, 2014
34
1
71
Thank you! So there are no FP16 units in X1? What about FP64 ALUs - can they process FP32 operations? In this case the actual FP32 performance will be 256 x FP32 x 2 flops + 8 x FP64 x 2 x 2 flops = 544 flops/Hz

Is there any official source for FPxx configuration in X1?
 

Roland00Address

Platinum Member
Dec 17, 2008
2,196
260
126
So question, do we have a clue on die size for cortex A57 (20nm) vs Tegra K1 (Denver, 28nm)? Do you get that rough 50ish percent die shrinkage? Would Denver shrunk be a similar size to A57?

-----

This is a theory

I ask for there is a little engineering idiom where you can only have two

Fast-good-cheap_image-618x188.jpg


Fast = Making sure you do not get a delay to time to market for time to market is everything in silicon (even Intel is learning this lesson with mobile, and they are for the most part a well oiled machine with time to market)

Cheap = Die Space

Good = Best Performance on the market, a true performance leader with great performance which in mobile boils down to is performance per watt

Choose Fast and Good and you are just going to be throwing die area at the problem and thus eat away your margins. Often you can get better performance per watt by having more die area for higher frequencies almost always mean higher voltages. Intel can do this on laptop, desktop, and servers as well as Apple with the Iphone and Ipad for there devices have very high AOSPs so you do not have to be as stingy with die area.

Choose Fast and Cheap and you have to use mostly synthetic pre designed chips. The chip will not be good only average (in cpu), but the only way you can win is showing up to the fight to borrow an phrase Anand used in one of my favorite articles (tech biographical pieces are interesting)

Choose Good and Cheap and you may get a market leader with best performance per die area and performance per watt but then it will take lots of time to get to market for this type of thing takes man hours. Taking forever is the opposite of Fast.

Fast and Cheap also has the added benefit of probably taking less engineers. Either you save money by firing them / reassigning them or you can have them work right now on the TSMC finfet and make sure that product ships on time.

-----

On another note the Radeon 5000 series of graphics is now over 5 years old. I bring this up for Anand wrote an article called "The RV870 Story: AMD Showing up to the Fight" (aka the 5000 series) and in the article he borrowed a phrase from

(this is in 2009, article posted in 2010)
ATI’s Eric Demers (now the CTO of AMD's graphics group) put it best: if you don’t show up to the fight, by default, you lose. ATI was going to stop not showing up to the fight.

And ever generation after the 5000 series AMD/ATI just can not pick up graphic share against Nvidia (its the laptops stupid, and for laptops you need good drivers the OEMs trust)

Well Qualcomm bought the ATI/AMD mobile Graphics Adreno in 2009 (Adreno is an anagram of Radeon) . Then they hire Eric in 2012. And AMD unfortunately is adrift at sea and Qualcomm is doing better than ever.

(I am not saying this is all Eric's fault or he is such a boom to Qualcomm. There are a lot of talented engineers and product heads at Qualcomm)
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
Agreed. The Tegra 2 lacking Neon was fail, and the Tegra 3's heat profile and GPU was fail. Tegra is fail, which sucks because I am a huge Nvidia fanboy.

The Tegra 2 wasn't overwhelmingly amazing and was quickly supplanted in performance after its release, but it actually hit a decent sweet spot in terms of performance, features, and price which made it widely successful, at least more so than any of the chips in that line since then.

It was rather widely used and made a lot of people expect that NV might come carve out a large portion of the SoC market for themselves.

Ever since then the Tegra line has gone down a path that has really limited the broad-market potential of the chip and has almost been relegated to the tablet or netbook space. The chip that Nvidia wants to make isn't really one that the market has a lot of use for.
 

eddman

Senior member
Dec 28, 2010
239
87
101
There is one thing I don't understand.

X1's whitepaper says it supports HW decoding for 10-bit h.265 videos, yet no mention of h.264 in that regard.

Does that really mean there is no 10-bit HW decoding for h.264? Isn't that a bit odd? If nvidia went the distance for h.265, then why not enable it for h.264 too?
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
Seems like a blow for NV. Denver was in the pipes for a long time and I don't think the worked out as well as they had hoped.

While the GPU portion of the chip looks impressive, it's utterly wasted on most devices. Qualcomm is going to have the better general purpose SoC and Samsung will tend to use their own in-house design if not going with Qualcomm.

At this point I think NV would be far better off licensing their GPU design to other SoC manufacturers. They tried for their own CPU design, but if it's not any better than the stock ARM designs there's no point in it. NV will just keep releasing more graphics heavy chips (it's what they do best) that aren't a good fit for most devices.
 
Last edited:

lopri

Elite Member
Jul 27, 2002
13,209
594
126
@Roland00Address: I do not believe your attempt at Venn diagram succeeds. Especially the definition of "good" is too open-ended and amorphous. Other barometer you proposed are likewise deficient. For example, these days die size do not necessarily correlate to price, especially on different nodes. "Fast" has also diverged into single-thread v. multi-thread and among different platforms, which makes it difficult, if not impossible, to render a definitive verdict. And we are not talking about Bulldozer v. Sandy Bridge where the IPC and Perf/watt differences are too large to overcome with more frequencies/cores. Mobile SOCs in recent years are a lot more competitive with each other than x86 have ever been.

Anyhow, according to TechReport, Quad A57 = 15.1mm², Quad A53 = 4.6 mm². And according to ExtremeTech, Denver's core size is roughly x2 of single Cortex A15 (or slightly larger) per NVIDIA.

http://techreport.com/review/27539/samsung-galaxy-note-4-with-the-exynos-5433-processor/2
http://www.extremetech.com/computin...nalysis-are-nvidias-x86-efforts-hidden-within

tegra-k1-two-versions-a15-denver-core-640x353.jpg
 
Last edited: