News New SiFive RISC-V cores

soresu · Oct 25, 2019

SiFive just introduced new RISC-V CPU cores, named the U84 and the U87.

The U84 has 3.1x more performance at 7nm than the previous 16nm U74 core, this is a combination of 2.3x IPC and 1.4x clock frequency (I realise the math seems off, but it's their press release).

The U84 core area is about 0.28mm2 (7nm), with a quad core (with 2MB L2$) taking up a whopping 2.63mm2.

They claim competitive performance with the Cortex A72, though exact numbers are not forthcoming, only a footnote implying they were comparing against a Cortex A72 implemented at 16nm.

The U87 is similar to the above core (I presume for lack of further information and similarity of core number), but with vector processing too, further details on U87 will supposedly follow in a later press release.

(Sorry for the anaemic round of posts guys, I get a little over enthusiastic with a low attention span every so often.....)

Link here.

Markfw · Oct 25, 2019

soresu said:
SiFive just introduced new RISC-V CPU cores.

U84 and U87

Link here.

For those of us less technical, can you tell us what this means ? For me its like reading Chinese or Greek.

Thunder 57 · Oct 25, 2019

Yea, a little bit of commentary or a TLDR version would be nice. Anything to pique my interest a bit to give the thing a full read.

NostaSeronx · Oct 25, 2019

Expect more crazy stuff with that guy over there.

Commentary, RISC-V will probably surpass ARM in the 2020s.

moinmoin · Oct 25, 2019

NostaSeronx said:
Commentary, RISC-V will probably surpass ARM in the 2020s.

Let's go back to this post and discuss its validity sometime in the 2030s then.

NostaSeronx · Oct 25, 2019

moinmoin said:
Let's go back to this post and discuss its validity sometime in the 2030s then.

Why? It is already present, the joke is the 2020s is next year, lul.

soresu · Oct 25, 2019

Markfw said:
For those of us less technical, can you tell us what this means ? For me its like reading Chinese or Greek.

Thunder 57 said:
Yea, a little bit of commentary or a TLDR version would be nice. Anything to pique my interest a bit to give the thing a full read.

Sorry guys, I amended the OP with more detail - not enough sleep and too much of it spent in a chair don't lead to Pulitzer (or Hugo) prize winning posts I'm afraid.

I agree that the SiFive press release is not ideal for a fast and easy read, something tells me that their PR division is not up to the same standard as ARM, AMD or Intel.

DrMrLordX · Oct 26, 2019

U87 has vector processing? Do tell.

soresu · Oct 27, 2019

DrMrLordX said:
U87 has vector processing? Do tell.

Unfortunately nothing to tell at the moment, they only announced this fact and the notification of a future press release with details.

Their use of the words 'vector processing' rather than SIMD makes me think it uses a method closer to ARM SVE than the SIMD in x86/64 - as far as I am aware, the R5 people have been working on both paths, but the packed SIMD path seems to be further out.

soresu · Oct 27, 2019

moinmoin said:
Let's go back to this post and discuss its validity sometime in the 2030s then.

While I get the pessimism, the resume linked is a fairly impressive one, if not so much recently as in the past.

I think he implied forward development at SiFive could be much better.

Though unless R5 makes it magically easier to design uArch, I wouldn't expect any significant announced improvements due to this particular hire so soon as 2020.

darkswordsman17 · Oct 29, 2019

The math is probably because its more like 2.25x IPC or something and they just rounded up but didn't want to double fudge (mmm *homer drool*) the numbers beyond what they would be so they basically kept the final overarching performance number right.

NostaSeronx · Oct 29, 2019

soresu said:
Their use of the words 'vector processing' rather than SIMD makes me think it uses a method closer to ARM SVE than the SIMD in x86/64 - as far as I am aware, the R5 people have been working on both paths, but the packed SIMD path seems to be further out.

RISC-V vectors are closer to Cray-1's implementation: https://www.chessprogramming.org/Cray-1

Vector Length and Vector Standard Element Width:
VL is always a power of 2, but vsew can be i8, i16, fp8, fp16 and up.
VL=2 w/ i8 => 2x 8-bit integers.

It can be run on SIMD, superscalar, or scalar units.

There is also the grouping multiplier for transfers into a wider result or into many thinner results. (64b(LMUL=8) into 8b(LMUL=1) results or 8b(LMUL=1) into 64b(LMUL=8) results)
element A(i8)&B(i8)((i16)) + element C(i8)&D(i8)((i16)) = element E(i8)&F(i8)((i16)) <= if LMUL is 2
element A&B&C&D + element E&F&G&H = element I&J&L&M <= if LMUL is 4

This is a more ideal code once, run on all RISC-V architectures with vector extension enabled.

soresu · Oct 29, 2019

NostaSeronx said:
RISC-V vectors are closer to Cray-1's implementation: https://www.chessprogramming.org/Cray-1

Then what is this about then:

Slide PDF link here.

As I said in your quote of my earlier post, I believe the R5 standardisation people are working on 2 paths, the one you mentioned and the other one I linked here, which is closer to NEON if I have it right.

soresu · Oct 29, 2019

darkswordsman17 said:
double fudge (mmm *homer drool*)

Hahahahhahaha.

NostaSeronx · Oct 29, 2019

soresu said:
Then what is this about then:

The P-extension isn't vector processing. It is also more immature than the V-extension which is draft zero point seven (point one), with the P-extension being draft zero point two.

The V is preferred over P, in all cases. V is the only one that is psuedo-final, thus can be HVM'd.
"This is a draft of a stable proposal for the vector specification to be used for implementation and evaluation. Once the draft label is removed, version 0.7 is intended to be stable enough to begin developing toolchains, functional simulators, and initial implementations, though will continue to evolve with minor changes and updates."

soresu · Oct 29, 2019

NostaSeronx said:
The P-extension isn't vector processing.

Errr... isn't SIMD vector processing?

I also mentioned the difference in readiness of the 2 paths in an earlier post.

soresu said:
as far as I am aware, the R5 people have been working on both paths, but the packed SIMD path seems to be further out.

NostaSeronx · Oct 29, 2019

soresu said:
Errr... isn't SIMD vector processing?

Nope, executing simd instructions is intrinsically scalar.

soresu said:
I also mentioned the difference in readiness of the 2 paths in an earlier post.

No general purpose core is going to use P over V. SIMD instructions aren't vector thus are Vector-destructive.

U87 isn't a DSP core, its a Superscalar+Vector core. Hence, it will be using the V-extension which is near-final. Whereas, the P-extension was killed off in the 2016 meet, then revived for only extremely small cores that won't use V and only operate on fixed-point.

The RISC-V V-extension is not vector-destructive. There is no two paths, there is only one path and that is Vector above all else.

soresu · Oct 29, 2019

NostaSeronx said:
Nope, executing simd instructions is intrinsically scalar.

I may be completely off base here, but doesn't the MD part of SIMD stand for Multiple Data?

I thought that the 'multiple data' in question was classed as a vector?

NostaSeronx · Oct 29, 2019

soresu said:
I may be completely off base here, but doesn't the MD part of SIMD stand for Multiple Data?

I thought that the 'multiple data' in question was classed as a vector?

How SIMD instructions work is that they are always operating as a scalar operation. This is different from a vector instruction and how vector units operate.

SIMD instructions and simd units have to be heavily-tweaked to get to non-destructive vectors. <== This is extremely harmful to ISA byte-code/op-code/etc and scalability up: 256-bit, 512-bit, 1024-bit, etc.
While, Vector instructions with SIMD units do not have to be tweaked to be non-destructive vectors. <== This is not harmful.

The U87 isn't SIMD(P) processing, it is vector(V) processing.

It isn't SVE, it isn't Neon, it isn't AVX512(EVEX), AVX(VEX), SSE/MMX. It is however Cray-1 like.

---
The U87 is probably going to deploy something like Hwacha, since the creator is the CTO.

Hwacha4 is 7.84 mm2 on 28nm. 7nm would be above this ~1.284 mm2 <= 7.84 mm2(13440/82080‬).
0.28mm2 + >1.284 => >1.564 mm2 * 4 would thus be greater than 6.256 mm2 for four cores w/ vector processing. With HBM2E and chiplet strategy, probably can output more cores than EPYC w/ more efficiency with V than AVX/AVX2. If someone had the money for it.

Nothingness · Oct 30, 2019

NostaSeronx said:
How SIMD instructions work is that they are always operating as a scalar operation. This is different from a vector instruction and how vector units operate.

SIMD instructions and simd units have to be heavily-tweaked to get to non-destructive vectors. <== This is extremely harmful to ISA byte-code/op-code/etc and scalability up: 256-bit, 512-bit, 1024-bit, etc.
While, Vector instructions with SIMD units do not have to be tweaked to be non-destructive vectors. <== This is not harmful.

Sorry but you make no sense at all or I completely misunderstood you. SIMD can be non-destructive; being non-destructive is a general property of an ISA and its encoding. ARM NEON SIMD extension is non-destructive and its units can be used to implement SVE which is size agnostic.

I took a quick look at the Vector Extension spec and I saw nothing fundamentally different from ARM SVE.

BTW RISC-V is starting to show one of its great weaknesses: a mess of extensions.

NostaSeronx · Oct 30, 2019

Nothingness said:
Sorry but you make no sense at all or I completely misunderstood you. SIMD can be non-destructive; being non-destructive is a general property of an ISA and its encoding. ARM NEON SIMD extension is non-destructive and its units can be used to implement SVE which is size agnostic.

I took a quick look at the Vector Extension spec and I saw nothing fundamentally different from ARM SVE.

SIMD ISAs are always vector-destructive compared to true Vector ISAs. It destruction isn't in data, but in the ISA/instruction/codebase itself.

SVE is a SIMD instruction set. The operation is always up to 16B*16 from 16B*1. <== Mostly a multimedia ISA.
RVV is a Vector instruction set. The operation can be 1B*x to 128B*y. <== Mostly a general purpose ISA.

The destruction is this;
MMX/SSE, SSE2, AVX256, AVX512, AVX1024, etc.
NEON to SVE isn't as destructive, however SVE needs to replace NEON w/ SVE2. If the SVE users need a total width of 4096-bit or needs smaller simd widths than 128-bit. Whelp, new instruction set.

Nothingness · Oct 30, 2019

NostaSeronx said:
SIMD ISAs are always vector-destructive compared to true Vector ISAs. It destruction isn't in data, but in the ISA/instruction/codebase itself.

SVE is a SIMD instruction set. The operation is always up to 16B*16 from 16B*1. <== Mostly a multimedia ISA.
RVV is a Vector instruction set. The operation can be 1B*x to 32B*y. <== Mostly a general purpose ISA.

The destruction is this;
MMX/SSE, SSE2, AVX256, AVX512, AVX1024, etc.
NEON to SVE isn't as destructive, however SVE needs to replace NEON w/ SVE2.

So by destructive you mean some new instructions replace previous ones. SVE2 doesn't replace NEON, it comes on top of it, the encoding space is not shared. And how is RISC-V vector extension any different here? How is the vector extension non "destructive" against RV Packed-SIMD extension?

If the SVE users need a total width of 4096-bit or needs smaller simd widths than 128-bit. Whelp, new instruction set.

That's ridiculous. If a user of RISC-V needs more than 4096-bit then a new instruction set will be needed, that's it? SVE can go up to 2048-bit. And if you need less than 128-bit, just use standard SIMD instructions.

NostaSeronx · Oct 30, 2019

Nothingness said:
So by destructive you mean some new instructions replace previous ones. SVE2 doesn't replace NEON, it comes on top of it, the encoding space is not shared. And how is RISC-V vector extension any different here? How is the vector extension non "destructive" against RV Packed-SIMD extension?

SVE2 replaces NEON, its a seperate encoding. SVE2 intrinsics != NEON intrinsics. Need 2048-bit width SIMDs? Out of luck. If RVP needs longer SIMD widths well new extension time, new compile time, etc.

Nothingness said:
If a user of RISC-V needs more than 4096-bit then a new instruction set will be needed, that's it?

Technically, RVV supports scalar elements of 1024-bit, if Scalar length = Vector length, that means v0-v31 is 32 1024-bit registers. So, there is a lot of room for RISC-V v-extension to expand. Most of the vector length is register size driven not op-code driven. So, newer architectures just have larger vector capacity, while older architectures can run them at a slower pace. Forward/backwards compatible, no trashing to keep up to the latest 128-bit SIMD, 256-bit SIMD, 512-bit SIMD, etc.

Nothingness said:
SVE can go up to 2048-bit. And if you need less than 128-bit, just use standard SIMD instructions.

Then, what if my code needs NEON to be 32-bit or 256-bit? I'm out of luck and my code is destroyed.

soresu · Oct 30, 2019

NostaSeronx said:
SVE2 doesn't replace NEON, it comes on top of it

NostaSeronx said:
SVE2 replaces NEON, its a seperate encoding.

ARM's blog post announcing SVE2 and TME explicitly stated that future processors implementing SVE2 would continue to keep NEON code compatibility/functionality.

So while SVE2 will functionally replace NEON, the compatibility will remain for quite some time to come.

Nothingness · Oct 30, 2019

soresu said:
ARM's blog post announcing SVE2 and TME explicitly stated that future processors implementing SVE2 would continue to keep NEON code compatibility/functionality.

So while SVE2 will functionally replace NEON, the compatibility will remain for quite some time to come.

I think what @NostaSeronx tries to say is that you can't use the same instructions interchangeably.

NostaSeronx said:
SVE2 replaces NEON, its a seperate encoding. SVE2 intrinsics != NEON intrinsics.

Yeah obviously. And RISC-V Vector != RISC-V intrisics.

Need 2048-bit width SIMDs? Out of luck. If RVP needs longer SIMD widths well new extension time, new compile time, etc.Technically, RVV supports scalar elements of 1024-bit, if Scalar length = Vector length, that means v0-v31 is 32 1024-bit registers. So, there is a lot of room for RISC-V v-extension to expand. Most of the vector length is register size driven not op-code driven. So, newer architectures just have larger vector capacity, while older architectures can run them at a slower pace. Forward/backwards compatible, no trashing to keep up to the latest 128-bit SIMD, 256-bit SIMD, 512-bit SIMD, etc.

The same applies to SVE. You can select size in increments of 128-bit.

Then, what if my code needs NEON to be 32-bit or 256-bit? I'm out of luck and my code is destroyed.

You want your code to be vector length agnostic. Which is what SVE allows. You don't have to code with a specified VL.

News New SiFive RISC-V cores

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member