Discussion RISC V Latest Developments Discussion [No Politics]

Page 18 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Some background on my experience with RISC V...
Five years ago, we were developing a CI/CD pipeline for arm64 SoC in some cloud and we add tests to execute the binaries in there as well.
We actually used some real HW instances using an ARM server chip of that era, unfortunately the vendor quickly dumped us, exited the market and leaving us with some amount of frustration.
We shifted work to Qemu which turns out to be as good as the actual chips themselves, but the emulation is buggy and slow and in the end we end up with qemu-user-static docker images which work quite well for us. We were running arm64 ubuntu cloud images of the time before moving on to docker multi arch qemu images.

Lately, we were approached by many vendors now with upcoming RISC-V chips and out of curiosity I revisited the topic above.
To my pleasant surprise, running RISC-V Qemu is smooth as butter. Emulation is fast, and images from Debian, Ubuntu, Fedora are available out of the box.
I was running ubuntu cloud images problem free. Granted it was headless but I guess with the likes of Imagination Tech offering up their IP for integration, it is only a matter of time.

What is even more interesting is that Yocto/Open Embedded already have a meta layer for RISC-V and apparently T Head already got the kernel packages and manifest for Android 10 working with RISC-V.
Very very impressive for a CPU in such a short span of time. What's more, I see active LLVM, GCC and Kernel development happening.

From latest conferences I saw this slide, I can't help but think that it looks like they are eating somebody's lunch starting from MCUs and moving to Application Processors.
1652093521458.png

And based on many developments around the world, this trend seems to be accelerating greatly.
Many high profile national and multi national (e.g. EU's EPI ) projects with RISC V are popping up left and right.
Intel is now a premium member of the consortium, with the likes of Google, Alibaba, Huawei etc..
NVDA and soon AMD seems to be doing RISC-V in their GPUs. Xilinx, Infineon, Siemens, Microchip, ST, AD, Renesas etc., already having products in the pipe or already launched.
It will be a matter of time before all these companies start replacing their proprietary Arch with something from RISC V. Tools support, compiler, debugger, OS etc., are taken care by the community.
Interesting as well is that there are lots of performant implementation of RISC V in github as well, XuanTie C910 from T Head/Alibaba, SWerV from WD, and many more.
Embedded Industry already replaced a ton of traditional MCUs with RISC V ones. AI tailored CPUs from Tenstorrent's Jim Keller also seems to be in the spotlight.

Most importantly a bunch of specs got ratified end of last year, mainly accelerated by developments around the world. Interesting times.
 

DrMrLordX

Lifer
Apr 27, 2000
22,908
12,980
136
Of course not, Grace is their Opus Magnus for ARM, but I don't be surprised if it cost a lot and wants cheaper options in the meantime.
The other thing to keep in mind is that there may be a large market where current ARM and x86 host systems may not be possible but RISC-V may be. And NV has recently received some authorization to sell additional H100 cards there. Something to consider.
 

DZero

Golden Member
Jun 20, 2024
1,630
633
96
Seeing how RISC-V is advancing... I am now thinking that in 5 years it will end going to some more devices like tablets or Mini PCs. Heck, even Huawei would consider it using on their low tier phones in order to leave Kirin 710 rest in peace at last.
 
  • Like
Reactions: marees

camel-cdr

Member
Feb 23, 2024
32
103
66
Tenstorrent decided to publish the first benchmark data for Ascalon's RVV implementation using the instruction throughput benchmark of my rvv-bench benchmark suite. <3


Overall, the results look really good so far:
* Most instructions have an inverse throughput of 0.5/1/2/4 for LMUL=1/2/4/8, even vslide1up/down, 64-bit vmulh, viota, vpopc and integer reductions
* 0.5/0.5/2/4 for vector-scalar/immediate compares (0.5/2/4/8 for vector-vector)0.5/0.5/1/2 for vector-scalar/immediate compares and 0.5/1/2/- for narrowing instructions (see "Microarchitecture speculations" section)
* dual-issue vrgather, with good scaling: 0.5/1/8/30
* dual-issue vcompress, with OK scaling: 0.5/3/6/17 (I still think this could get close to linear)
* Fault-only-first loads seem to have no overhead
* Segmented load/stores look quite fast, even the more exotic ones like seg7
* Ovlt behavior isn't supported, but I don't really care much about it


The only bigger negative thing I've seen so far is that the vslideup/vslidedown instructions don't scale linearly or close to linearly with LMUL, even for a small immediate shift amount like "3". The vslide1up/vslide1down do scale perfectly, though, with 0.5/1/2/4. It's not in the benchmark, but I hope vslideup/vslidedown with immediate "1" also do.

We'll have to wait for the other microbenchmarks to get a more complete picture.

My takeaway so far is to not be scared to use the segmented load/stores, and LMUL>1 permutes are good, but you probably want to avoid LMUL=8 ones when possible. I'll continue manually unrolling none-lane-crossing permutes. For LMUL>1 comparisons, it's better to use .vx/vi over .vv when possible.

For the scalar instructions:
* 6-issue: add/sub/lui/xor/sll/shNadd/zext/clz/cpop/min/rotl/rev8/bext/...
* 3-issue: load/store
* 2-issue: fadd/fmul/fmacc/fmin/fcvt
* 1-issue: mul/mulh/feq/flt
* pipelined: fsqrt/fdiv: ~8.5, div/rem: 12-16
 
Last edited:

soresu

Diamond Member
Dec 19, 2014
4,115
3,570
136
Seeing how RISC-V is advancing... I am now thinking that in 5 years it will end going to some more devices like tablets or Mini PCs. Heck, even Huawei would consider it using on their low tier phones in order to leave Kirin 710 rest in peace at last.
Huawei's issue right now and for the foreseeable future until their LDP system is proven and scaled is more with achieving the best process node possible than what to fab on them.
 

DZero

Golden Member
Jun 20, 2024
1,630
633
96
Huawei's issue right now and for the foreseeable future until their LDP system is proven and scaled is more with achieving the best process node possible than what to fab on them.
Well.. they reached 5nm and that's their limit in the foreseeable future.
 

soresu

Diamond Member
Dec 19, 2014
4,115
3,570
136
Well.. they reached 5nm and that's their limit in the foreseeable future.
LDP isn't for 5nm, they already managed that by supercharging ASML DUV litho machines with moar patterning.

LDP is for EUV - though they may use it for 5nm they are likely hoping for 3nm and below, and likely GAA device transition as that is the way everyone is going now.
 

DrMrLordX

Lifer
Apr 27, 2000
22,908
12,980
136
It's already off the ground and into high volume products as micro controllers, a market which it has basically taken over by storm.
Yeah but are any of those microcontrollers running Linux? Do any of them need RISC-V Linux to support big endian?
 
  • Like
Reactions: soresu