Discussion RISC V Latest Developments Discussion [No Politics]

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,622
5,887
136
Some background on my experience with RISC V...
Five years ago, we were developing a CI/CD pipeline for arm64 SoC in some cloud and we add tests to execute the binaries in there as well.
We actually used some real HW instances using an ARM server chip of that era, unfortunately the vendor quickly dumped us, exited the market and leaving us with some amount of frustration.
We shifted work to Qemu which turns out to be as good as the actual chips themselves, but the emulation is buggy and slow and in the end we end up with qemu-user-static docker images which work quite well for us. We were running arm64 ubuntu cloud images of the time before moving on to docker multi arch qemu images.

Lately, we were approached by many vendors now with upcoming RISC-V chips and out of curiosity I revisited the topic above.
To my pleasant surprise, running RISC-V Qemu is smooth as butter. Emulation is fast, and images from Debian, Ubuntu, Fedora are available out of the box.
I was running ubuntu cloud images problem free. Granted it was headless but I guess with the likes of Imagination Tech offering up their IP for integration, it is only a matter of time.

What is even more interesting is that Yocto/Open Embedded already have a meta layer for RISC-V and apparently T Head already got the kernel packages and manifest for Android 10 working with RISC-V.
Very very impressive for a CPU in such a short span of time. What's more, I see active LLVM, GCC and Kernel development happening.

From latest conferences I saw this slide, I can't help but think that it looks like they are eating somebody's lunch starting from MCUs and moving to Application Processors.
1652093521458.png

And based on many developments around the world, this trend seems to be accelerating greatly.
Many high profile national and multi national (e.g. EU's EPI ) projects with RISC V are popping up left and right.
Intel is now a premium member of the consortium, with the likes of Google, Alibaba, Huawei etc..
NVDA and soon AMD seems to be doing RISC-V in their GPUs. Xilinx, Infineon, Siemens, Microchip, ST, AD, Renesas etc., already having products in the pipe or already launched.
It will be a matter of time before all these companies start replacing their proprietary Arch with something from RISC V. Tools support, compiler, debugger, OS etc., are taken care by the community.
Interesting as well is that there are lots of performant implementation of RISC V in github as well, XuanTie C910 from T Head/Alibaba, SWerV from WD, and many more.
Embedded Industry already replaced a ton of traditional MCUs with RISC V ones. AI tailored CPUs from Tenstorrent's Jim Keller also seems to be in the spotlight.

Most importantly a bunch of specs got ratified end of last year, mainly accelerated by developments around the world. Interesting times.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,622
5,887
136
Google removing support for RISC-V in Android Common Kernel?

Key statement is here

Android will continue to support RISC-V. Due to the rapid rate of iteration, we are not ready to provide a single supported image for all vendors. This particular series of patches removes RISC-V support from the Android Generic Kernel Image (GKI).

Looks like the rate at which RISC-V is evolving is too fast to maintain a stable GKI, this should be pretty obvious from the very beginning.
Folks are creating new extensions and standards are getting ratified like no tomorrow, they seem to be in a rush to create something comprehensive.

But it is no big deal if Google continues to maintain the toolchains (i.e, soong, kati, Bazel ) to natively support RISC-V, the GKI is the least problematic one. There is no GKI for x86 or armv7 as well, they got ahead of themselves
There are bigger things to do here, like ART and Zygote optimizations to name a few.
 

DrMrLordX

Lifer
Apr 27, 2000
21,686
10,944
136
Folks are creating new extensions and standards are getting ratified like no tomorrow, they seem to be in a rush to create something comprehensive.
That's one of the problems for RISC-V, which is the lack of any standards regarding extensions. Sure sure it's great to be able to make your own private extensions for an individual implementation such as a storage microcontroller that will only ever see use in a very specific application. But what about CPUs intended for "general" computing? Critics have been pointing out this problem since RISC-V became subject to public discussion.
 
  • Like
Reactions: Nothingness

NostaSeronx

Diamond Member
Sep 18, 2011
3,687
1,222
136
Looks like the rate at which RISC-V is evolving is too fast to maintain a stable GKI, this should be pretty obvious from the very beginning.
Folks are creating new extensions and standards are getting ratified like no tomorrow, they seem to be in a rush to create something comprehensive.
RISC-V since RVA22 is actually slowing down. There is less and less new extensions and standards being proposed and ratified. The rush of RISC-V extensions/standards is no where near where it was in 2019~2022.

The removal of the GKI riscv64 code is because Qualcomm's implementation is incompatible with mainline riscv64.

RVA24 is better than most x86-64 and AArch64 ISAs in the market. Which only has these instructions being mandatory:
"The following are new development options intended to become mandatory in RVA24U64:
• Zabha Byte and Halfword Atomic Memory Operations
• Zacas Compare-and-swap
• Ziccamoc Main memory regions with both the cacheability and coherence PMAs must provide
AMOCASQ level PMA support.
• Zvbc Vector carryless multiply.
• Zama16b Misaligned loads, stores, and AMOs to main memory regions that do not cross a
naturally aligned 16-byte boundary are atomic."

The only other spec of value is OS-A Common and OS-A Server. Which was already built on RVA22 but will be finalized with RVA24.

For example;
c930.jpeg
 
Last edited:
  • Wow
Reactions: Grazick

camel-cdr

Junior Member
Feb 23, 2024
11
39
51
The removal of the GKI riscv64 code is because Qualcomm's implementation is incompatible with mainline riscv64
I don't think this is true, considering Qualcomm's involve in things like the scalar efficiency SIG, here they even propose some 48-bit instructions: https://docs.google.com/spreadsheet...p9vVvVjS6Jz9vGWhwmsdbEOF3JBwUg/htmlview#gid=0

what is true is:

* nothing has changed with our work on Android/riscv64 support in AOSP

* we've stopped producing ACK/GKI builds for now

* until there is an official GKI kernel, we're working on
transitioning to a kernel that we -- the folks working on
Android/riscv64 -- maintain...

* ...but unfortunately the GKI changes went out before our changes are ready

note that the "non-GKI" kernel will still be to all intents and
purposes an ACK/GKI kernel (with the aim that Android/riscv64 devices
will use GKI kernels), but since maintenance of an officially
_labelled_ GKI kernel is more expensive, we're removing the sticker
for now.

On other news, SpacemiT K1 looking good so far: https://github.com/pigirons/cpufp?tab=readme-ov-file#spacemit-k18-x-spacemit-x60

The 2x vector compute over A55 seems to be true:


Code:
$ ./cpufp --thread_pool=[0] # Spacemit X60
Number Threads: 1
Thread Pool Binding: 0
---------------------------------------------------------------
| Instruction Set | Core Computation       | Peak Performance |
| ime             | vmadot(s32,s8,s8)      | 511.53 GOPS      |
| ime             | vmadotu(u32,u8,u8)     | 511.5 GOPS       |
| ime             | vmadotus(s32,u8,s8)    | 511.53 GOPS      |
| ime             | vmadotsu(s32,s8,u8)    | 511.51 GOPS      |
| ime             | vmadotslide(s32,s8,s8) | 511.51 GOPS      |
| vector          | vfmacc.vf(f16,f16,f16) | 66.722 GFLOPS    |
| vector          | vfmacc.vv(f16,f16,f16) | 63.936 GFLOPS    |
| vector          | vfmacc.vf(f32,f32,f32) | 33.36 GFLOPS     |
| vector          | vfmacc.vv(f32,f32,f32) | 31.968 GFLOPS    |
| vector          | vfmacc.vf(f64,f64,f64) | 16.679 GFLOPS    |
| vector          | vfmacc.vv(f64,f64,f64) | 15.985 GFLOPS    |
---------------------------------------------------------------
$ ./cpufp --thread_pool=[0] # Cortex-A55
Number Threads: 1
Thread Pool Binding: 0
----------------------------------------------------------------
| Instruction Set | Core Computation        | Peak Performance |
| asimd_dp        | dp4a.vs(s32,s8,s8)      | 58.305 GOPS      |
| asimd_dp        | dp4a.vv(s32,s8,s8)      | 58.311 GOPS      |
| asimd_dp        | dp4a.vs(u32,u8,u8)      | 58.313 GOPS      |
| asimd_dp        | dp4a.vv(u32,u8,u8)      | 58.311 GOPS      |
| asimd_hp        | fmla.vs(fp16,fp16,fp16) | 29.156 GFLOPS    |
| asimd_hp        | fmla.vv(fp16,fp16,fp16) | 29.156 GFLOPS    |
| asimd           | fmla.vs(f32,f32,f32)    | 14.579 GFLOPS    |
| asimd           | fmla.vv(f32,f32,f32)    | 14.577 GFLOPS    |
| asimd           | fmla.vs(f64,f64,f64)    | 7.2891 GFLOPS    |
| asimd           | fmla.vv(f64,f64,f64)    | 7.2834 GFLOPS    |

It's also 10x INT8 performance using their custom matrix extension.
 

camel-cdr

Junior Member
Feb 23, 2024
11
39
51
Thanks a lot for sharing :)


What are the respective frequencies?

Also I guess the K1 is simulated while the A55 is a real platform. And I guess your loops don't depend on memory?
No the K1 is real, you can order it on aliexpress now, but sadly not in Germany for now :-( See the banana pi BPI-F3.
I'm not sure about the frequency, but the geekbench scores list it at 1.6GHz.
 

camel-cdr

Junior Member
Feb 23, 2024
11
39
51
On that topic, XiangShans RVV backend still has some problems with my benchmarks, but there have been a few other things I've noticed:
  • There is a new branch that separates the float and vector pipelines
  • New SPECint 2006 numbers have been published, looks quite good so far:
    1.png
  • A new PR implements Zicond, this only took ~40 lines of code, and I found it quite interesting to look at.
  • Similarly, here is how the Zvbb (vector bitmanip) extension was implemented a while back, it took ~450 lines of code: functional unit changes, main repo changes
Other open-source RVV implementations also had some updates:
  • IntelLabs darecreek implementation published a small design description. "So far, the arithmetic functional units are sufficiently tested. Other functions such as load/store and control flow only passed basic test", so it's still very much in progress. Hopefully it will be ready enough to attach it to rocket chip soon.
  • t1, is now had a public beta release, you can play with it using "docker run --name t1 -it -v $PWD:/workspace --rm ghcr.io/chipsalliance/t1-blastoise:latest /bin/bash" and run a program with "ip-emulator --no-logging -C yourProgram". It currently uses spike to execute the scalar instructions. Last time I tried it, I couldn't get my benchmarks to run, I'll have to look into it again.
 

Attachments

  • 1.png
    1.png
    409.4 KB · Views: 3