News New SiFive RISC-V cores

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
SiFive just introduced new RISC-V CPU cores, named the U84 and the U87.

The U84 has 3.1x more performance at 7nm than the previous 16nm U74 core, this is a combination of 2.3x IPC and 1.4x clock frequency (I realise the math seems off, but it's their press release).

The U84 core area is about 0.28mm2 (7nm), with a quad core (with 2MB L2$) taking up a whopping 2.63mm2.

They claim competitive performance with the Cortex A72, though exact numbers are not forthcoming, only a footnote implying they were comparing against a Cortex A72 implemented at 16nm.

The U87 is similar to the above core (I presume for lack of further information and similarity of core number), but with vector processing too, further details on U87 will supposedly follow in a later press release.

(Sorry for the anaemic round of posts guys, I get a little over enthusiastic with a low attention span every so often.....)

Link here.
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
Yea, a little bit of commentary or a TLDR version would be nice. Anything to pique my interest a bit to give the thing a full read.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
For those of us less technical, can you tell us what this means ? For me its like reading Chinese or Greek.
Yea, a little bit of commentary or a TLDR version would be nice. Anything to pique my interest a bit to give the thing a full read.
Sorry guys, I amended the OP with more detail - not enough sleep and too much of it spent in a chair don't lead to Pulitzer (or Hugo) prize winning posts I'm afraid.

I agree that the SiFive press release is not ideal for a fast and easy read, something tells me that their PR division is not up to the same standard as ARM, AMD or Intel.
 
Last edited:
  • Like
Reactions: Thunder 57

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
U87 has vector processing? Do tell.
Unfortunately nothing to tell at the moment, they only announced this fact and the notification of a future press release with details.

Their use of the words 'vector processing' rather than SIMD makes me think it uses a method closer to ARM SVE than the SIMD in x86/64 - as far as I am aware, the R5 people have been working on both paths, but the packed SIMD path seems to be further out.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
Let's go back to this post and discuss its validity sometime in the 2030s then.
While I get the pessimism, the resume linked is a fairly impressive one, if not so much recently as in the past.

I think he implied forward development at SiFive could be much better.

Though unless R5 makes it magically easier to design uArch, I wouldn't expect any significant announced improvements due to this particular hire so soon as 2020.
 
Mar 11, 2004
23,031
5,495
146
The math is probably because its more like 2.25x IPC or something and they just rounded up but didn't want to double fudge (mmm *homer drool*) the numbers beyond what they would be so they basically kept the final overarching performance number right.
 
  • Like
  • Haha
Reactions: Kirito and soresu

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Their use of the words 'vector processing' rather than SIMD makes me think it uses a method closer to ARM SVE than the SIMD in x86/64 - as far as I am aware, the R5 people have been working on both paths, but the packed SIMD path seems to be further out.
RISC-V vectors are closer to Cray-1's implementation: https://www.chessprogramming.org/Cray-1

Vector Length and Vector Standard Element Width:
VL is always a power of 2, but vsew can be i8, i16, fp8, fp16 and up.
VL=2 w/ i8 => 2x 8-bit integers.

It can be run on SIMD, superscalar, or scalar units.

There is also the grouping multiplier for transfers into a wider result or into many thinner results. (64b(LMUL=8) into 8b(LMUL=1) results or 8b(LMUL=1) into 64b(LMUL=8) results)
element A(i8)&B(i8)((i16)) + element C(i8)&D(i8)((i16)) = element E(i8)&F(i8)((i16)) <= if LMUL is 2
element A&B&C&D + element E&F&G&H = element I&J&L&M <= if LMUL is 4

This is a more ideal code once, run on all RISC-V architectures with vector extension enabled.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Then what is this about then:
The P-extension isn't vector processing. It is also more immature than the V-extension which is draft zero point seven (point one), with the P-extension being draft zero point two.

The V is preferred over P, in all cases. V is the only one that is psuedo-final, thus can be HVM'd.
"This is a draft of a stable proposal for the vector specification to be used for implementation and evaluation. Once the draft label is removed, version 0.7 is intended to be stable enough to begin developing toolchains, functional simulators, and initial implementations, though will continue to evolve with minor changes and updates."
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Errr... isn't SIMD vector processing?
Nope, executing simd instructions is intrinsically scalar.
I also mentioned the difference in readiness of the 2 paths in an earlier post.
No general purpose core is going to use P over V. SIMD instructions aren't vector thus are Vector-destructive.

U87 isn't a DSP core, its a Superscalar+Vector core. Hence, it will be using the V-extension which is near-final. Whereas, the P-extension was killed off in the 2016 meet, then revived for only extremely small cores that won't use V and only operate on fixed-point.

The RISC-V V-extension is not vector-destructive. There is no two paths, there is only one path and that is Vector above all else.
 
Last edited:
  • Like
Reactions: teejee

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
I may be completely off base here, but doesn't the MD part of SIMD stand for Multiple Data?

I thought that the 'multiple data' in question was classed as a vector?
How SIMD instructions work is that they are always operating as a scalar operation. This is different from a vector instruction and how vector units operate.

SIMD instructions and simd units have to be heavily-tweaked to get to non-destructive vectors. <== This is extremely harmful to ISA byte-code/op-code/etc and scalability up: 256-bit, 512-bit, 1024-bit, etc.
While, Vector instructions with SIMD units do not have to be tweaked to be non-destructive vectors. <== This is not harmful.

The U87 isn't SIMD(P) processing, it is vector(V) processing.

It isn't SVE, it isn't Neon, it isn't AVX512(EVEX), AVX(VEX), SSE/MMX. It is however Cray-1 like.

---
The U87 is probably going to deploy something like Hwacha, since the creator is the CTO.

Hwacha4 is 7.84 mm2 on 28nm. 7nm would be above this ~1.284 mm2 <= 7.84 mm2(13440/82080‬).
0.28mm2 + >1.284 => >1.564 mm2 * 4 would thus be greater than 6.256 mm2 for four cores w/ vector processing. With HBM2E and chiplet strategy, probably can output more cores than EPYC w/ more efficiency with V than AVX/AVX2. If someone had the money for it.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
How SIMD instructions work is that they are always operating as a scalar operation. This is different from a vector instruction and how vector units operate.

SIMD instructions and simd units have to be heavily-tweaked to get to non-destructive vectors. <== This is extremely harmful to ISA byte-code/op-code/etc and scalability up: 256-bit, 512-bit, 1024-bit, etc.
While, Vector instructions with SIMD units do not have to be tweaked to be non-destructive vectors. <== This is not harmful.
Sorry but you make no sense at all or I completely misunderstood you. SIMD can be non-destructive; being non-destructive is a general property of an ISA and its encoding. ARM NEON SIMD extension is non-destructive and its units can be used to implement SVE which is size agnostic.

I took a quick look at the Vector Extension spec and I saw nothing fundamentally different from ARM SVE.

BTW RISC-V is starting to show one of its great weaknesses: a mess of extensions.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Sorry but you make no sense at all or I completely misunderstood you. SIMD can be non-destructive; being non-destructive is a general property of an ISA and its encoding. ARM NEON SIMD extension is non-destructive and its units can be used to implement SVE which is size agnostic.

I took a quick look at the Vector Extension spec and I saw nothing fundamentally different from ARM SVE.
SIMD ISAs are always vector-destructive compared to true Vector ISAs. It destruction isn't in data, but in the ISA/instruction/codebase itself.

SVE is a SIMD instruction set. The operation is always up to 16B*16 from 16B*1. <== Mostly a multimedia ISA.
RVV is a Vector instruction set. The operation can be 1B*x to 128B*y. <== Mostly a general purpose ISA.


The destruction is this;
MMX/SSE, SSE2, AVX256, AVX512, AVX1024, etc.
NEON to SVE isn't as destructive, however SVE needs to replace NEON w/ SVE2. If the SVE users need a total width of 4096-bit or needs smaller simd widths than 128-bit. Whelp, new instruction set.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
SIMD ISAs are always vector-destructive compared to true Vector ISAs. It destruction isn't in data, but in the ISA/instruction/codebase itself.

SVE is a SIMD instruction set. The operation is always up to 16B*16 from 16B*1. <== Mostly a multimedia ISA.
RVV is a Vector instruction set. The operation can be 1B*x to 32B*y. <== Mostly a general purpose ISA.

The destruction is this;
MMX/SSE, SSE2, AVX256, AVX512, AVX1024, etc.
NEON to SVE isn't as destructive, however SVE needs to replace NEON w/ SVE2.
So by destructive you mean some new instructions replace previous ones. SVE2 doesn't replace NEON, it comes on top of it, the encoding space is not shared. And how is RISC-V vector extension any different here? How is the vector extension non "destructive" against RV Packed-SIMD extension?

If the SVE users need a total width of 4096-bit or needs smaller simd widths than 128-bit. Whelp, new instruction set.
That's ridiculous. If a user of RISC-V needs more than 4096-bit then a new instruction set will be needed, that's it? SVE can go up to 2048-bit. And if you need less than 128-bit, just use standard SIMD instructions.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
So by destructive you mean some new instructions replace previous ones. SVE2 doesn't replace NEON, it comes on top of it, the encoding space is not shared. And how is RISC-V vector extension any different here? How is the vector extension non "destructive" against RV Packed-SIMD extension?
SVE2 replaces NEON, its a seperate encoding. SVE2 intrinsics != NEON intrinsics. Need 2048-bit width SIMDs? Out of luck. If RVP needs longer SIMD widths well new extension time, new compile time, etc.
If a user of RISC-V needs more than 4096-bit then a new instruction set will be needed, that's it?
Technically, RVV supports scalar elements of 1024-bit, if Scalar length = Vector length, that means v0-v31 is 32 1024-bit registers. So, there is a lot of room for RISC-V v-extension to expand. Most of the vector length is register size driven not op-code driven. So, newer architectures just have larger vector capacity, while older architectures can run them at a slower pace. Forward/backwards compatible, no trashing to keep up to the latest 128-bit SIMD, 256-bit SIMD, 512-bit SIMD, etc.
SVE can go up to 2048-bit. And if you need less than 128-bit, just use standard SIMD instructions.
Then, what if my code needs NEON to be 32-bit or 256-bit? I'm out of luck and my code is destroyed.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
SVE2 doesn't replace NEON, it comes on top of it
SVE2 replaces NEON, its a seperate encoding.
ARM's blog post announcing SVE2 and TME explicitly stated that future processors implementing SVE2 would continue to keep NEON code compatibility/functionality.

So while SVE2 will functionally replace NEON, the compatibility will remain for quite some time to come.
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
ARM's blog post announcing SVE2 and TME explicitly stated that future processors implementing SVE2 would continue to keep NEON code compatibility/functionality.

So while SVE2 will functionally replace NEON, the compatibility will remain for quite some time to come.
I think what @NostaSeronx tries to say is that you can't use the same instructions interchangeably.

SVE2 replaces NEON, its a seperate encoding. SVE2 intrinsics != NEON intrinsics.
Yeah obviously. And RISC-V Vector != RISC-V intrisics.

Need 2048-bit width SIMDs? Out of luck. If RVP needs longer SIMD widths well new extension time, new compile time, etc.Technically, RVV supports scalar elements of 1024-bit, if Scalar length = Vector length, that means v0-v31 is 32 1024-bit registers. So, there is a lot of room for RISC-V v-extension to expand. Most of the vector length is register size driven not op-code driven. So, newer architectures just have larger vector capacity, while older architectures can run them at a slower pace. Forward/backwards compatible, no trashing to keep up to the latest 128-bit SIMD, 256-bit SIMD, 512-bit SIMD, etc.
The same applies to SVE. You can select size in increments of 128-bit.

Then, what if my code needs NEON to be 32-bit or 256-bit? I'm out of luck and my code is destroyed.
You want your code to be vector length agnostic. Which is what SVE allows. You don't have to code with a specified VL.