Article Tesla Dojo Chip

Leeea · Aug 31, 2021

The Tesla Dojo Chip Is Impressive, But There Are Some Major Technical Issues

Tesla’s Dojo chip and supercomputer have mouth salivating specifications and capabilities, but cost, custom interconnects, memory constraints, lack of software, and the fact that this chip is…

semianalysis.com

The Tesla Dojo Chip Is Impressive, But There Are Some Major Technical Issues

The two things that stood out to me were CFP8 and the magic compiler solution.

The CFP8 data type is odd to me, going to a 8 bit data type now seems completely opposite of the general trend. An 8 bit floating just seems baffling. Assuming they do a 5 / 3 split that is 0-32 for the significand and 0-7 for the exponent. Just does not seem much resolution for driving a car. But the lack of memory in the design seems to force CFP8.

Tesla is also claiming they have/will have a compiler that automatically optimizes for their hardware. Having seen this claim made before, and the success rate associated, this seems unlikely.

dullard · Aug 31, 2021

Leeea said:
The CFP8 data type is odd to me, going to a 8 bit data type now seems completely opposite of the general trend. An 8 bit floating just seems baffling. Assuming they do a 5 / 3 split that is 0-32 for the significand and 0-7 for the exponent. Just does not seem much resolution for driving a car.

1) This is for training AI, not for driving a car. Training is the step where you take in massive amounts of data and let the computer figure out what the data means and how to use it. A whole different chip should be used in the field for driving.

2) AI is not extremely precise. It doesn't have to be. Think about car data. The incoming data is driving probably things like speed, angle of wheels, etc. What is 8 bits on the speed of a car? Suppose Tesla doesn't train their cars with data going over 100 MPH. 8 bits then gives you a resolution of speed of 100 MPH / 256 = 0.39 MPH. When have you ever needed to know or control your car speed with less than 0.39 MPH resolution? If you are driving at 65.00 MPH instead of 65.38 MPH, would your decision whether or not to hit the brakes change? Or with wheel angle, the most you can possibly turn most cars is through 130° of wheel angle. 256 bits thus gives you at worst a 0.5° angle resolution. When have you thought that you need to adjust your steering by less than 0.5°?

If the AI was 16 bits instead, then you'd have speed resolution of 0.002 MPH. Do cars really have that level of speed control? Do they really have that level of accuracy in the speedometer? No. Those extra bits of information are just useless for AI. Useless bits means more power, more memory, and slower calculations. AI is all about as much calculations as you can possibly do, even if the results are not perfectly accurate. The interference application (actually driving) can just calculate another speed on the next clock tick.

Tuna-Fish · Sep 3, 2021

Leeea said:
The CFP8 data type is odd to me, going to a 8 bit data type now seems completely opposite of the general trend.

No, it's perfectly in line of the general trend of ML accelerators reducing precision for more computing power. There is serious discussion of 4-bit types. In general, more nodes with less precision produce better results with less power use than fewer nodes with more precision.

Leeea said:
An 8 bit floating just seems baffling. Assuming they do a 5 / 3 split that is 0-32 for the significand and 0-7 for the exponent.

It's probably more like 1 or 2 bits for the significand and 6 or 7 for the exponent.

Leeea said:
Just does not seem much resolution for driving a car.

ML does not require a lot of precision, regardless of what it's being used for. No result of the system directly maps to any quantity stored in any single weight.

Leeea said:
But the lack of memory in the design seems to force CFP8.

It's probably the other way around, they chose CFP8 and then sized the memory to fit their needs. The author of that article seems a bit out of the loop, first he worries about the lack of memory and then seems mystified why it has so much IO. The basic idea of these kinds of gigantic-scale accelerators is that you keep the weights stationary in the device and then stream in the inputs and stream out the outputs. If you need to fit more weights, it also means you need to do more compute, so instead of adding sram to the design, you scale outwards and buy more hardware. This of course does not work for the kind of general-purpose workloads that, say, nVidia targets, where they want a small system or even a single card to be able to work on large problems by just taking more time to do it. Instead, these systems will always be sized to the models they work on.

Tuna-Fish · Sep 3, 2021

dullard said:
The incoming data is driving probably things like speed, angle of wheels, etc. What is 8 bits on the speed of a car?

No, no, no. In no world would that be a good idea. This isn't used to literally train an AI to drive a car, it's being used to train an ai that does parts of the problem of driving a car. Most crucially, image recognition and visual reasoning. The input data is not quantities like angle of wheels, it's images, and every input probably describes something like brightness of a single color channel of a single pixel of image. The ultimate outputs of the system is a massive array of confidences of things it thinks it sees.

(Such as, one output is how certain the AI is that there is a car in the lane ahead going in the same direction.)

dullard · Sep 3, 2021

Tuna-Fish said:
No, no, no. In no world would that be a good idea. This isn't used to literally train an AI to drive a car, it's being used to train an ai that does parts of the problem of driving a car. Most crucially, image recognition and visual reasoning. The input data is not quantities like angle of wheels, it's images, and every input probably describes something like brightness of a single color channel of a single pixel of image. The ultimate outputs of the system is a massive array of confidences of things it thinks it sees.

(Such as, one output is how certain the AI is that there is a car in the lane ahead going in the same direction.)

So, you want it to output whether or not a car is ahead going the same direction, but also NOT know which direction either car is going? Think about that.

Tuna-Fish · Sep 5, 2021

dullard said:
So, you want it to output whether or not a car is ahead going the same direction, but also NOT know which direction either car is going?

It's not going to output that. All it's doing is figuring out "this is a car". This is done very often, then the rest happens outside ML, feeding the objects from the ML system to some kind of simulation.

Cogman · Sep 6, 2021

Leeea said:
Tesla is also claiming they have/will have a compiler that automatically optimizes for their hardware. Having seen this claim made before, and the success rate associated, this seems unlikely.

Eh... looks like LLVM is likely doing all the heavy lifting. Languages like Rust do little optimizing in their compilation process before handing things off to the LLVM, yet it gets near C performance.

Tuna-Fish said:
It's not going to output that. All it's doing is figuring out "this is a car". This is done very often, then the rest happens outside ML, feeding the objects from the ML system to some kind of simulation.

So, they've moved a bit beyond the "this is a car" stage and (at least from the presentation) it looks like they are also pumping out "That car is the same car I saw in the last frame" as well as "I think there should be a car there even though my vision is occluded".

Using those properties they are able to pull out things like speed.

Search

Article Tesla Dojo Chip

Leeea

Diamond Member

The Tesla Dojo Chip Is Impressive, But There Are Some Major Technical Issues

dullard

Elite Member

Tuna-Fish

Golden Member

Tuna-Fish

Golden Member

dullard

Elite Member

Tuna-Fish

Golden Member

Cogman

Lifer

TRENDING THREADS