https://cloud.google.com/blog/big-d...k-at-googles-first-tensor-processing-unit-tpu
Pretty in depth article, but with all the buzz around GV100 figured you might want to see what's already out there for those specific tasks.
The road to TPUs
Although Google considered building an Application-Specific Integrated Circuit (ASIC) for neural networks as early as 2006, the situation became urgent in 2013. That’s when we realized that the fast-growing computational demands of neural networks could require us to double the number of data centers we operate.
Usually, ASIC development takes several years. In the case of the TPU, however, we designed, verified, built and deployed the processor to our data centers in just 15 months. Norm Jouppi, the tech lead for the TPU project (also one of the principal architects of the MIPS processor) described the sprint this way:
“We did a very fast chip design. It was really quite remarkable. We started shipping the first silicon with no bug fixes or mask changes. Considering we were hiring the team as we were building the chip, then hiring RTL (circuitry design) people and rushing to hire design verification people, it was hectic.”
(from First in-depth look at Google's TPU architecture, The Next Platform)
The TPU ASIC is built on a 28nm process, runs at 700MHz and consumes 40W when running. Because we needed to deploy the TPU to Google's existing servers as fast as possible, we chose to package the processor as an external accelerator card that fits into an SATA hard disk slot for drop-in installation. The TPU is connected to its host via a PCIe Gen3 x16 bus that provides 12.5GB/s of effective bandwidth.
Pretty in depth article, but with all the buzz around GV100 figured you might want to see what's already out there for those specific tasks.