Nvidia Tegra CPU: "Tick, Tock" strategy?

soccerballtux · Nov 18, 2011

Computer Bottleneck said:
I was just thinking about how much performance gain someone with a 1H 2010 smartphone would see upon upgrading (through 2 year contract perk) to a 1H 2012 smartphone:

1H 2010 Qualcomm: 65nm Single core scorpion @ 1000 Mhz
1H 2012 Qualcomm: 28nm Dual core Krait @ 1.5 Ghz

Wow! That is a huge jump in CPU power!

Maybe the next selling point will be battery life (if the ARM CPU race starts to slow down)? Maybe even heterogeneous computing (Eight core ARM Mali T658 with 272 Gflops, etc)?

ya. I'll be upgrading from my OGDroid (800mhz OC, plenty fast enough due to heavy customization and tweaking), but no need to upgrade till this breaks. It's already negated my use of my netbook, no point in tossing it just for something "faster"

happy medium · Nov 18, 2011

Computer Bottleneck said:
I was just thinking about how much performance gain someone with a 1H 2010 smartphone would see upon upgrading (through 2 year contract perk) to a 1H 2012 smartphone:

1H 2010 Qualcomm: 65nm Single core scorpion @ 1000 Mhz
1H 2012 Qualcomm: 28nm Dual core Krait @ 1.5 Ghz

Wow! That is a huge jump in CPU power!

Maybe the next selling point will be battery life (if the ARM CPU race starts to slow down)? Maybe even heterogeneous computing (Eight core ARM Mali T658 with 272 Gflops, etc)?

How about the quad core+1 40nm Tegra @ 1.2ghz in 1h 2012 or the 28nm quadcore tegra 3 @ even higher clocks? They should be faster than a dual core @ 1.5
They will be in phones by then.

zebrax2 · Nov 18, 2011

So
Nvidia=AMD (more cores)
Qualcomm=Intel (faster cores)

cbn · Nov 18, 2011

zebrax2 said:
So
Nvidia=AMD (more cores)
Qualcomm=Intel (faster cores)

I guess you could say that.....

(The following is from another post I wrote.)

Regarding custom core, I found this article rather interesting.

Two custom ARM cores vs. Four vanilla ARM cores: Which approach is more efficient?

Raj Talluri, Qualcomm: "Nvidia brings a different marketing philosophy. They take a different attitude about how they market their products. Just because you make one device with four cores...[but] the rest of the system has to be big too," he said.

Talluri continued. "What's the big deal? I can get two or I can get four cores from ARM. When we do a core, we design it from the ground up. That means we do a lot of custom transistors. For example, when we run a processor at 1.4GHz or 1.5GHz...we don't push the voltage to get higher [speeds]. Because our design can run that fast at nominal voltage. The power consumption just explodes if you push the voltage up."

Qualcomm has, for many years, had an ARM architectural license, which allows the company to custom design its ARM processors--what Talluri is referring to when he speaks about designing a chip from the "ground up." Nvidia, only recently--earlier this year--got an architectural license from ARM.

Jen-Hsun Huang, Nvidia: "I feel that I answered this question once for dual-core. Using multiple processors is the best way to conserve energy when you need performance. And our strategy, our approach is efficient performance. We want performance but not at the expense of running transistors super hot. Parallel processing is really the most energy efficient way to get performance. And we'll use as many cores as the technology can afford. And the applications can use," he said.

Huang continued. "There's all kinds of application that benefit from mutli-core and quad-core. One is multi-tasking. [For example] if I'm buying an application, updating a whole bunch of applications, while I'm reading a book and connected to Wi-Fi and I'm streaming music. That's a lot of stuff going on. That's even a lot of stuff going on for a desktop PC. So, there's no question that performance lags a bit when that happens and when quad-core hits it's just going to crank right through all of that."

cbn · Nov 18, 2011

Here is the ARM/Mali Roadmap.

I just wonder if TSMC will be able to get a version of "tri-gate" working for their own 22nm process node? (possibly enabling ARMv8 for smartphones)

happy medium · Nov 18, 2011

Computer Bottleneck said:
Two custom ARM cores vs. Four vanilla ARM cores: Which approach is more efficient?

The one that is released in a product already,the quadcore.

By the time Qualcomm releases the fast dual core the die shrunk 28nm Tegra quadcore will be ready. I still don't think the faster dual core will even be faster than the 40nm quad.
I'd rather have a 1.2ghx quad than a 1.7 ghz dual ,for the same reason I sold my e8400 for a q9550.

happy medium · Nov 18, 2011

Computer Bottleneck;32589539 Here is the ARM/Mali Roadmap. I just wonder if TSMC will be able to get a version of "tri-gate" working for their own 22nm process node? (possibly enabling ARMv8 for smartphones)[/QUOTE said:
That roadmap must be OLD!
I dont think Nvidia will go from a quadcore A9 back to a dual core A15?
Am I reading this right?

Idontcare · Nov 18, 2011

Computer Bottleneck said:
I just wonder if TSMC will be able to get a version of "tri-gate" working for their own 22nm process node? (possibly enabling ARMv8 for smartphones)

Zero chance, and I do mean the mathematical kind, not the hyperbole kind.

You won't see 3D xtor from TSMC at 14nm either.

Maybe, and that is a graciously optimistic maybe, they will have them production-ready for the 11nm node. Maybe.

soccerballtux · Nov 19, 2011

Idontcare said:
Zero chance, and I do mean the mathematical kind, not the hyperbole kind.

You won't see 3D xtor from TSMC at 14nm either.

Maybe, and that is a graciously optimistic maybe, they will have them production-ready for the 11nm node. Maybe.

Could they trademark that? Or patent it? I hope not...

IntelUser2000 · Nov 19, 2011

Idontcare said:
Zero chance, and I do mean the mathematical kind, not the hyperbole kind.

You won't see 3D xtor from TSMC at 14nm either.

Maybe, and that is a graciously optimistic maybe, they will have them production-ready for the 11nm node. Maybe.

Really? Didn't they say they were aiming for it on 14nm? I ask because you don't sound speculative, you know.

cbn · Nov 19, 2011

happy medium said:
That roadmap must be OLD!

I got the roadmap from this Anandtech Nov. 9th, 2011 News Post:

http://www.anandtech.com/show/5077/arms-malit658-gpu-in-2013-up-to-10x-faster-than-mali400

cbn · Nov 19, 2011

happy medium said:
I dont think Nvidia will go from a quadcore A9 back to a dual core A15?
Am I reading this right?

Apparently Nvidia wants to go quad core A15 and octocore A15.

(although I am not sure if Octocore means eight A15s or four A15s + four Cortex A7s)

http://en.wikipedia.org/wiki/Nvidia_Tegra

Tegra (Wayne) series

Processor: quad- or octa-core ARM Cortex-A15 MPCore
Improved 24 (for the quad-core) and 32 to 64 (for the octa-core) GPU cores with support for Directx 11+, OpenGL 4.X, OpenCL 1.X, and Physx
28 nm[25]
About 10 times faster than Tegra 2
To be released in 2012

cbn · Nov 19, 2011

happy medium said:
By the time Qualcomm releases the fast dual core the die shrunk 28nm Tegra quadcore will be ready. I still don't think the faster dual core will even be faster than the 40nm quad.
I'd rather have a 1.2ghx quad than a 1.7 ghz dual ,for the same reason I sold my e8400 for a q9550.

Swapping dual core A15 to a quad core A9 is not analogous to swapping a E8400 for a q9500.

The A15 cores are a good deal wider (with much higher IPC) than the cortex A9s.

cbn · Nov 19, 2011

Idontcare said:
You are talking the opportunity of saving pennies by spending dollars. ARM's cut just isn't that much for it to make a big difference if they reduced it by 10%.

Wait a minute...

Are you saying a ARM architectural license only saves the CPU manufacturer 10% in licensing fees?

Am I reading this right? (If true, that doesn't seem like much savings compared to what it would cost to do the custom CPU R&D)

P.S. Thank you for your earlier explanation of how Qualcomm's market share was acquired by non-ARM IP related techs they have. The graph you posted in the bobcat thread here does a good job illustrating that. (eg, "Scorpion" custom CPU didn't arrive until Q4 2008, but by that time Qualcomm had already gain a significant amount of market share).

AtenRa · Nov 19, 2011

Computer Bottleneck said:
Maybe there is an economic reason for Qualcomm's ability to launch this new SOC earlier than the competition:

With Qualcomm's market share and chip volumes, I just wonder if their architectural license gives them an advantage to launch products on TSMC's more expensive wafers? (ie, Qualcomm pays more for wafers, but saves on core licensing costs from ARM)

As i have said before, ARM Cortex A15 target is at 1.5GHz with 32/28nm LP and 2.5GHz with 32/28 HP process.

We know Qualcomm will use both 28nm LP and HP.

Qualcomm and TSMC Collaborating on 28nm Process Technology

Qualcomm and TSMC worked closely on 65nm and 45nm technologies. They are continuing their relationship into low-power, low-leakage 28nm designs for high-volume manufacturing. Delivering up to twice the density of previous manufacturing nodes, 28nm technology allows semiconductors that power mobile devices to do far more with less power. Qualcomm and TSMC are working on both high-k metal gate (HKMG) 28HP and silicon oxynitride (SiON) 28LP technologies. Qualcomm expects to tape out its first commercial 28nm products in mid-2010.

28nm LP only has a 20% advantage over 40nm LP process but 28HPL brings 40%. I believe Nvidia will go for the HPL with Tegra 4 and not LP.

TSMC 28 Nanometer Process Technology

The low power (LP) process is the first available and has completed all the qualification tests. The 28LP process is the low
cost and fast time to market choice, ideal for low standby power applications such as cellular baseband. The process boasts
a 20 percent speed improvement over the 40LP process at the same leakage per gate.

TSMC also supports a high performance technology with extremely low leakage that is also in risk production. This process
adopts the HP technology gate stack while meeting more stringent low leakage requirements and a trades off speed for
performance. With extremely low leakage, 28HPL is best suited for cellular baseband, application processors, wireless
connectivity, and programmable logic. The 28HPL process reduces standby power by more than 40% compared to the 40LP
process.

According to TSMC roadmap, 28nm LP will be the first process ready. Qualcomm's first Cortex A-15 will be manufactured at 28nm LP. This will give them one,maybe two quarters head start at the 28nm if others have chosen the 28nm HP/HPL.

TSMC 28nm Portfolio

One more thing, Qualcomm's Cortex A-15 is targeting smartphones first when Tegra 3 targets Tablets and SuperPhones. Only Nvidia's Grey scheduled for 2013 will be targeting the SmartPhone market.

I believe Qualcomm's dual Core Cortex A15 is not a direct competitor with Nvidia Tegra 3 and 4 (28nm). By choosing the 28nm LP will make them lauch first a 28nm product for the SmartPhone's market.

Screen%20Shot%202011-09-12%20at%201.50.39%20PM_575px.png

Cerb · Nov 19, 2011

happy medium said:
I'd rather have a 1.2ghx quad than a 1.7 ghz dual ,for the same reason I sold my e8400 for a q9550.

With the exception of Apple fans, most people won't be buying the most expensive and high-performance offerings they can get. Slow duals will sell just fine, and performance will be tailored to the point where you won't even notice a difference on similar devices.

The real difference is that by the time Qualcomm has parts ready for high-end tablets and the like, NV will have been selling them already, and doing a die-shrink.

Idontcare · Nov 19, 2011

Computer Bottleneck said:
Wait a minute...

Are you saying a ARM architectural license only saves the CPU manufacturer 10% in licensing fees?

Am I reading this right? (If true, that doesn't seem like much savings compared to what it would cost to do the custom CPU R&D)

P.S. Thank you for your earlier explanation of how Qualcomm's market share was acquired by non-ARM IP related techs they have. The graph you posted in the bobcat thread here does a good job illustrating that. (eg, "Scorpion" custom CPU didn't arrive until Q4 2008, but by that time Qualcomm had already gain a significant amount of market share).

Sorry, I did a poor job of communicating my message.

What I meant was that if you took the net cost to Qualcomm on a per-chip basis for licensing fees to ARM and called that number $X, then the difference between $X and $0.9X (let Qualcomm have a 10% discount from ARM owing to their volume) is a mere $0.1X.

And this delta of $0.1X is itself way too small (its pennies, literally) to compensate for the R&D expenditure associated with the increased wafer price of very early production ramp wafers for a newly released node (which adds $'s per chip, literally, until the node itself matures enough to have entitlement yields, typically 2 quarters, sometimes 3).

Early adopters of newly released nodes generally are early adopters not for cost-savings reasons but for performance advantage reasons enabling them to command superior ASP's for their SKU's (and thus gross margin goals are achieved and justified).

Cost savings for new nodes (<2Q out from release) are only realized for extremely small chips (<15mm^2) where the functional yields will be intrinisicly high.

I look forward to updating this chart with 28nm ramp rate data

theeedude · Nov 19, 2011

Computer Bottleneck said:
Swapping dual core A15 to a quad core A9 is not analogous to swapping a E8400 for a q9500.

The A15 cores are a good deal wider (with much higher IPC) than the cortex A9s.

http://www.duke.edu/~BCL15/documents/azizi2010-isca-opt.pdf
If you believe this article, wider is faster, but not necessarily better for energy efficiency. What it asserts is that dual issue, which is what A9 is, is the sweet spot for power efficiency.

happy medium · Nov 19, 2011

senseamp said:
What it asserts is that dual issue, which is what A9 is, is the sweet spot for power efficiency

A Tegra 3 with a 5th idle A9 core is the sweet spot for power efficency. 61% more power efficient than a dual core Tegra 2.

"There's of course a fifth Cortex A9 on Tegra 3, limited to a maximum clock speed of 500MHz and built using LP transistors like the rest of the chip (and unlike the four-core A9 cluster). NVIDIA intends for this companion core to be used for the processing of background tasks, for example when your phone is locked and in your pocket. In light use cases where the companion core is active, the four high performance A9s will be power gated and overall power consumption should be tangibly lower than Tegra 2."

http://www.anandtech.com/show/5072/nvidias-tegra-3-launched-architecture-revealed

.

Cerb · Nov 19, 2011

happy medium said:
A Tegra 3 with a 5th idle A9 core is the sweet spot for power efficency. 61% more power efficient than a dual core Tegra 2.

Apples and oranges, and all that. The 5th T3 core is an idle power-saving provision only, based on a specific feature that may or may not be a future option. Senseamp's post is about power consumption while having to do work.

cbn · Nov 19, 2011

AtenRa said:
Qualcomm's first Cortex A-15 will be manufactured at 28nm LP. This will give them one,maybe two quarters head start at the 28nm if others have chosen the 28nm HP/HPL.

I noticed in this TSMC link, "LP" is described as "Low cost" and "HPL" is described as low leakage.

If Qualcomm is choosing 28nm LP (SiON) and the others choose 28nm HPL (HKMG) that will make for a very interesting comparison.

From the CPU side of things, that puts the Qualcomm Krait custom CPU (with an slightly inferior process tech) up against the stock vanilla ARM design (using a slightly better process tech).

Therefore:

More expensive Custom ARM core that is claimed to be a more efficient design (with cheaper, less energy efficient process tech)

vs

cheaper "synthesized" vanilla ARM core that is claimed be less efficient (with more expensive, more energy efficient process tech)

Which is the better choice in what situation?

I know the other tech on the SOC is very important, but this still makes me wonder at what point will the costs of "Custom CPU" will be justified by other competitors in the future.

cbn · Nov 19, 2011

Idontcare said:
Early adopters of newly released nodes generally are early adopters not for cost-savings reasons but for performance advantage reasons enabling them to command superior ASP's for their SKU's (and thus gross margin goals are achieved and justified).

Yep, I can definitely see that as being Qualcomm's reason for releasing Krait on 28nm so early.

It just find it interesting that 28nm (SiON) LP is also identified as "Low cost". This coupled with Qualcomm's history of keeping their past custom cpu around for so long makes me wonder how overall costs pan out for them?

cbn · Nov 21, 2011

Idontcare said:
Zero chance, and I do mean the mathematical kind, not the hyperbole kind.

You won't see 3D xtor from TSMC at 14nm either.

Maybe, and that is a graciously optimistic maybe, they will have them production-ready for the 11nm node. Maybe.

It sounds like TSMC is trying hard to compete against Intel......

http://www.electronicsweekly.com/bl...log/2011/11/arm-tsmc-moving-fast-to-20nm.html

ARM, TSMC Moving Fast To 20nm
By David Manners on November 21, 2011 1:11 AM

ARM and TSMC are moving fast to get Cortex A-15 out on a 20nm process. A chip has already been taped out and an ARM process team has been set up in Taiwan to handle the transition.

"The 20nm tape-out is a test vehicle," ARM's executive vp for marketing, Lance Howarth, tells me, "the expectation is that we're a year away from 20nm as a production technology."

The rationale for the tape-out is that: "We need to have proven IP and to prove the design flows, to verify the RTL and make sure it all works well on 20nm," says Howarth, "because the interdependencies between process technology and the IP are increasing all the time."

"We've tied that in with opening a small design centre in Taiwan's Hsinchu Science Park," adds Howarth, "we're putting in expertise in terms of physical IP from our PIPD division, process guys and graphics guys looking at the deployment of our IP on advanced processes. Initially we'll have eight guys rising to 12."

In an age when some process transitions don't deliver much in terms of increased performance due to higher leakage, TSMC's 20nm process is expected to deliver some surprisingly significant gains.

Maria Marced, President of TSMC Europe says: "Compared to 28nm the 20nm process is expected to deliver a 25% improvement in power consumption, a 15-20% improvement in performance and a 1.9x increase in density. The plan is to introduce the first version of 20nm in the second half of 2012."

Howarth is impressed by TSMC's moves on 20nm. "TSMC are quite aggressive in pushing 20nm, they are accelerating 20nm development," he says, "people think TSMC are responding well in respect to 20nm and don't think Intel are as advanced in 20nm compared to TSMC. Their vision is that finfet comes in at 20nm (at Intel) but the advantage of finfet will be marginal."

Asked if ARM might consider fully depleted SOI (FDSOI) as an alternative to finfet, Howarth responds: "We have a team in Grenoble specifically looking at SOI and have been for some time. The jury is still out on the mass adoption of SOI."

Last week, at the European Nanoelectronics Forum in Dublin, a report on the EU's Catrene SOI development project involving AMD, GloFo, ST, Soitec, Siltronic and others stated that, at 20nm, finfet and fully depleted SOI are on a par.

Delivering the report, ST's Gilles Thomas said: "Don't panic, the transistor architecture on finfet and FDSOI are the same but for a rotation of 90º."

For the time being, ARM's focus is on getting out Cortex A-15 on 28/32nm processes.

"We expect A-15 to be sampling in the first half of next year, to be in full production in Q4 2012, and to be out in hand-sets by the end of next year," says Howarth.

Some interesting points from the above blog:

1. 20nm is one year away as a production technology. Does this mean we could see mobile cpus built on 20nm a good deal earlier than 1H 2014?

2. 20nm is expected to deliver 25% improvement in power efficiency, 15-20% improvement in performance, 1.9 increase in density.

3. 20nm Fully depleted SOI is being considered as an alternative to Intel 20nm Finfet. It will be interesting to see if this FDSOI process technology on 20nm would allow ARMv8 to make the cut for smartphones?

Idontcare · Nov 21, 2011

Computer Bottleneck said:
It sounds like TSMC is trying hard to compete against Intel......

http://www.electronicsweekly.com/bl...log/2011/11/arm-tsmc-moving-fast-to-20nm.html

Some interesting points from the above blog:

1. 20nm is one year away as a production technology. Does this mean we could see mobile cpus built on 20nm a good deal earlier than 1H 2014?

2. 20nm is expected to deliver 25% improvement in power efficiency, 15-20% improvement in performance, 1.9 increase in density.

3. 20nm Fully depleted SOI is being considered as an alternative to Intel 20nm Finfet. It will be interesting to see if this FDSOI process technology on 20nm would allow ARMv8 to make the cut for smartphones?

TSMC doesn't compete with Intel. Any attempts to draw correlations between the two are being done by "technology journalists" who simply may or may not even understand the industry itself.

Remember EETimes extolling that TSMC might beat Intel to 3D?

Report: TSMC may beat Intel to 3-D chips

Chip foundry giant Taiwan Semiconductor Manufacturing Co. (TSMC) could deliver its first semiconductors with 3-D interconnects by the end of 2011, potentially beating Intel Corp. to the punch in offering the first 3-D chips, according to a report circulated Tuesday (July 5) by a Taiwan trade group.

http://www.eetimes.com/electronics-news/4217553/Report--TSMC-may-beat-Intel-to-3-D-chips

To just about everyone in the industry, this "report" was a laughable work of fiction, and it seemed like even the technology journalists of EETimes had been hoodwinked by their own ignorance of the industry into thinking the two (TSMC and Intel) efforts had anything to do with one another.

Then two weeks later EETimes humbly printed their retraction:

Setting the record straight on the Intel-TSMC 3-D 'race'

Last week, a report by the Taiwan External Trade Development Council (TAITRA), a nonprofit organization promoting trade with Taiwanese firms, did just that, issuing a report that tantalizingly suggested that TSMC might beat Intel to the punch in bringing "three-dimensional chips" to market. EE Times and other news organizations quickly seized on the report and published stories based on it.

The problem, as many EE Times readers promptly pointed out, is that the report was deeply flawed and based upon a false equivalency.

http://www.eetimes.com/electronics-...-record-straight-on-the-Intel-TSMC-3-D--race-

So be wary, very wary, of continued efforts to draw correlations between Intel and TSMC by those folks who make a living writing blogs and technology articles because they (1) need people to read their articles so they can make their mortgage payment, and (2) are writing about something that they themselves have likely not actually worked with hands-on...you are at risk of reading works of fiction and fantasy, even when liberally laced with quotes from industry workers.

The janitor at Intel knows just as much about Intel's 22nm xtors as the janitor at EETimes, but the one at Intel does have an Intel badge and can go on to say all kinds of stuff if they felt flattered to do so in an interview which then gets credited as "Intel employee's have said..." and so on.

Getting back to the article you cited. That particular alliance in France, what used to be referred to as the Crolles alliance and CEA-LETI is a rather minor league R&D consortia (dwindling to obscurity as they partner members have disbanded over time).

Even as far back as 90nm development they were very much a "following the pack" type outfit. No one in the industry looks to CEA-LETI for guidance on how the future of advanced CMOS is going to pan out, which is why you've probably never heard much of them before.

That's not a dig on them, for what they do they serve their purpose really well, but it is not surprising that they are putting out press releases that more or less attempt to legitimize their preferred path (not going 3D xtors anytime soon) because we've all seen it before.

GloFo and IBM did the same thing with HKMG when it first came out, downplayed it as an excuse for why they didn't have it when Intel did. Then we saw them do the same thing over gate-first versus gate-last. Now we are going to see the same PR notes recycled to argue for not going 3D at this time because "the benefits are minor until you get to smaller nodes".

Meanwhile, fast forward 3-4 yrs and you tell me if you'll be at ALL surprised at the possibility of you concluding "Intel, they knew what they were doing all along, including their transition to 3D xtors, and everyone else was just operating on hope and delusion".

TSMC knows what they are doing, and they are not competing with Intel. They are competing with GlobalFoundries and Samsung. Everyone else in the foundry business is really not a material concern for sub-32nm.

badb0y · Nov 21, 2011

GPU comparison.

Nvidia Tegra CPU: "Tick, Tock" strategy?

Lifer

Lifer

Senior member

Lifer

Lifer

Lifer

Lifer

Elite Member

Lifer

Elite Member

Lifer

Lifer

Lifer

Lifer

Lifer

Elite Member

Elite Member

Lifer

Lifer

Elite Member

Lifer

Lifer

Lifer

Elite Member

Diamond Member