Are these two the ones in the bottom of that list?Elbrus-8S - Wikipedia
en.wikipedia.org
Description | Cores | Frequency (MHz) | Prod year | TDP | Process (nm) | GFlops (double) | ISA generation | Model name | Alternative name |
---|---|---|---|---|---|---|---|---|---|
Cpu Elbrus | 1 | 300 | 2005 | 6 | 130 | 2.4 | (E2Kv1) | - | e3m |
SoC Elbrus | 1 | 300 | 2007 | 6 | 130 | 2.4 | (E2Kv2) | - | e3s |
SoC Elbrus-S | 1 | 500 | 2010 | 13 | 90 | 4 | elbrus-v2 | elbrus-s | Экскурсовод, Эльбрус-3S, Elbrus-2C1 |
SoC Elbrus-2C+DSP | 2 (+4 DSP cores) | 500 | 2011 | 25 | 90 | 8 | elbrus-v2 | elbrus-2c+ | Кубик, Elbrus-2C2, Elbrus-S2 |
SoC Elbrus-4C | 4 | 800 | 2014 | 60 | 65 | 25.6 | elbrus-v3 | elbrus-4c | e2s, Эльбрус-2S |
SoC Elbrus-8C | 8 | 1300 | 2016 | 80 | 28 | 124.8 | elbrus-v4 | elbrus-8c | e8c, Процессор-1, Эльбрус-4C+ |
SoC Elbrus-1C+ | 1 | 1000 | 2015 | 10 | 40 | 12 | elbrus-v4 | elbrus-1c+ | e1c+, e1cp, e1c, Процессор-2 |
SoC Elbrus-8CB | 8 | 1500 | 2018 | 90 | 28 | 288 | elbrus-v5 | elbrus-8c2 | e8c2, p9, Процессор-9 |
SoC Elbrus-16C | 16 | 2000 | 2021 | 110-130 | 16 | 768 | elbrus-v6 | elbrus-16c | |
SoC Elbrus-2C3 | 2 | 2000 | 2021 | 10-15 | 16 | 96 | elbrus-v6 | elbrus-2c3 | |
SoC Elbrus-12C | 12 | 2000 | 2022 | 85-90 | 16 | 576 | elbrus-v6 | elbrus-12c | |
SoC Elbrus-32C | 32 | 2500? | 2025/2026? | ? | 7 | 1920 | elbrus-v7 | elbrus-32c? | Процессор-21(СРВ) |
####################################################
getDetails and MHz
Assembler CPUID and RDTSC
CPU , Features Code 00000000, Model Code 00000000
Measured - Minimum 0 MHz, Maximum 0 MHz
Linux Functions
get_nprocs() - CPUs 8, Configured CPUs 8
get_phys_pages() and size - RAM Size 30.94 GB, Page Size 4096 Bytes
uname() - Linux, sumireko, 4.19.0-0.3-e8c2
#1 SMP Sat Jan 18 09:49:15 GMT 2020, e2k
##############################################
64 Bit MP SSE MFLOPS Benchmark 1, 8 Threads, Sat Mar 21 14:13:55 2020
Test 4 Byte Ops/ Repeat Seconds MFLOPS First All
Words Word Passes Results Same
Data in & out 102400 2 20000 0.050556 81019 0.620974 Yes
Data in & out 1024000 2 2000 0.045250 90519 0.942935 Yes
Data in & out 10240000 2 200 0.446652 9170 0.994032 Yes
Data in & out 102400 8 20000 0.066665 245766 0.749971 Yes
Data in & out 1024000 8 2000 0.065821 248916 0.965360 Yes
Data in & out 10240000 8 200 0.465831 35172 0.996409 Yes
Data in & out 102400 32 20000 0.177363 369501 0.498060 Yes
Data in & out 1024000 32 2000 0.174700 375134 0.910573 Yes
Data in & out 10240000 32 200 0.464857 140981 0.990447 Yes
End of test Sat Mar 21 14:13:57 2020
Press Enter
The wiki says 8SV has 576 vs the 8S ( 250? ) and the only thing was 1500/1300mhz and DDR4-2400 vs DDR3-1600?How calculate FLOPS (v1 .. v3):
How calculate FLOPS (v4):
- Single Precision: 4 FP ALUs * 4 Single operation * Cores * Frequency
- Double Precision: 4 FP ALUs * 2 Double operation * Cores * Frequency
How calculate FLOPS (v5+ [128 bit SIMD]):
- Single Precision: 6 FP ALUs * 4 Single operation * Cores * Frequency
- Double Precision: 6 FP ALUs * 2 Double operation * Cores * Frequency
Example for Elbrus-16C: 6 ALUs * 2 DP * 2 * 16 Cores * 2e10 = 7.68e11 --> 768 GFlops
- Single Precision: 6 FP ALUs * 4 Single operation * 2 SIMD * Cores * Frequency
- Double Precision: 6 FP ALUs * 2 Double operation * 2 SIMD * Cores * Frequency
The wiki says 8SV has 576 vs the 8S ( 250? ) and the only thing was 1500/1300mhz and DDR4-2400 vs DDR3-1600?
#include <stdio.h>
#include <math.h>
int * calculate_mx4(int * x, int * y, int * z, int a, int b, int c, const int size) {
int * res = (int *)calloc(size, sizeof(int));
#pragma loop count(4)
for(int i = 0; i < size; i++) {
res[i] = a * x[i] + b * y[i] / 2 + z[i] * c;
}
return res;
}
double * calculate_fx4(double * x, double * y, double * z, double a, double b, double c, const int size) {
double * res = (double *)calloc(size, sizeof(double));
for(int i = 0; i < size; i++) {
res[i] = a * x[i] + b * y[i] / 1.99 + z[i] * c;
}
return res;
}
int main() {
int x[] = { 1, -1, 1, -1 };
int y[] = { 1, 2, 3, 4 };
int z[] = { 8, 4, 2, 1 };
int * res = calculate_mx4(x, y, z, 7, 8, 6, 4);
printf("%d %d %d %d\n", res[0], res[1], res[2], res[3]);
double fx[] = { 1.0, -1.0, 1.0, -1.0 };
double fy[] = { 1.0, 2.0, 3.0, 4.0 };
double fz[] = { 8.0, 4.0, 2.0, 1.0 };
double * fres = calculate_fx4(fx, fy, fz, 7.0, 8.0, 6.0, 4.0);
printf("%f %f %f %f\n", fres[0], fres[1], fres[2], fres[3]);
return 0;
}
calculate_mx4(int*, int*, int*, int, int, int, int):
{
setwd wsz = 0xd, nfx = 0x1, dbl = 0x0
setbn rsz = 0x3, rbs = 0x9, rcur = 0x0
disp %ctpr2, calloc
addd,4,sm 0x0, %dr0, %dr17
}
{
cmplsb,0 0x0, %r6, %pred2
sxt,2 0x2, %r6, %db[0]
addd,5 0x4, 0x0, %db[1]
}
{
getsp,0 _f32s,_lts0 0xfffffff0, %dr8
}
{
nop 1
cmplsb,0,sm 0x2, %r6, %pred1 ? %pred2
cmplsb,1 0x1, %r6, %pred0 ? %pred2
addd,2,sm 0x8, 0x0, %dr14 ? %pred2
adds,3,sm 0x2, 0x0, %r15 ? %pred2
addd,4,sm 0x4, 0x0, %dr9 ? %pred2
addd,5 0x0, 0x0, %dr13 ? %pred2
}
{
call %ctpr2, wbs = 0x9
}
{
return %ctpr3
ldw,0,sm %dr1, 0x0, %g16
ldw,2,sm %dr2, 0x0, %g17
ldw,3 %dr0, 0x0, %g18 ? %pred2
addd,4 0x0, %db[0], %dr16
}
{
disp %ctpr1, .L345
addd,4 0x0, %dr16, %dr0 ? ~%pred2
}
{
ct %ctpr3 ? ~%pred2
}
{
nop 1
return %ctpr3
}
{
nop 5
muls,0,sm %r4, %g16, %g16
muls,1,sm %g17, %r5, %g17
muls,3,sm %r3, %g18, %g18
}
{
getfs,0,sm %g16, _f16s,_lts0lo 0x5f, %g19
adds,1 0x0, %g17, %r11 ? %pred2
adds,2 0x0, %g18, %r12 ? %pred2
}
{
adds,0,sm %g16, %g19, %g16
}
{
sars,0,sm %g16, 0x1, %r10 ? %pred2
}
.L345:
{
adds,0 %r12, %r10, %g16
addd,3 0x0, %dr16, %dr0 ? ~%pred0
pass %pred0, @p0
andp @p0, @p0, @p4
pass @p4, %pred2
}
{
adds,0 %g16, %r11, %g16
}
{
stw,2 %dr16, %dr13, %g16
ldw,5,sm %dr2, %dr9, %r11
}
{
ct %ctpr3 ? ~%pred2
ldw,0,sm %dr1, %dr9, %r8
ldw,2,sm %dr17, %dr9, %r10
}
{
adds,0,sm %r15, 0x1, %g16
addd,1,sm 0x0, %dr9, %dr13
addd,2,sm 0x0, %dr14, %dr9
addd,3,sm 0x4, %dr14, %dr14
pass %pred1, @p0
andp @p0, @p0, @p4
pass @p4, %pred0
}
{
cmplsb,0,sm %g16, %r6, %pred2
adds,1,sm 0x0, %g16, %r15
}
{
pass %pred2, @p0
andp @p0, @p0, @p4
pass @p4, %pred1
}
{
muls,3 %r11, %r5, %g16
}
{
nop 4
muls,0 %r4, %r8, %g17
muls,1 %r3, %r10, %g18
}
{
adds,0,sm 0x0, %g16, %r11
}
{
getfs,0 %g17, _f16s,_lts0lo 0x5f, %g16
adds,1,sm 0x0, %g18, %r12
}
{
adds,0 %g17, %g16, %g16
}
{
sars,0 %g16, 0x1, %g16
}
{
ct %ctpr1
adds,0,sm 0x0, %g16, %r10
}
calculate_fx4(double*, double*, double*, double, double, double, int):
{
setwd wsz = 0x10, nfx = 0x1, dbl = 0x1
setbn rsz = 0x3, rbs = 0xc, rcur = 0x0
disp %ctpr2, calloc
getsp,0 _f32s,_lts1 0xfffffff0, %dr8
addd,1,sm 0x0, %dr5, %dr5
addd,2,sm 0x0, %dr3, %dr3
addd,3,sm 0x0, %dr4, %dr4
addd,4,sm 0x0, %dr2, %dr2
addd,5,sm 0x0, %dr1, %dr1
}
{
nop 1
cmplsb,0,sm 0x0, %r6, %pred0
sxt,1 0x2, %r6, %db[0]
addd,2 0x8, 0x0, %db[1]
addd,3,sm 0x0, %dr0, %dr10
}
{
merges,0,sm 0x1, %r6, %r11, %pred0
}
{
subs,0,sm %r11, 0x1, %r8
}
{
call %ctpr2, wbs = 0xc
cmplsb,0,sm %r8, _f16s,_lts0lo 0x60, %pred1
}
{
return %ctpr3
ldd,0,sm %dr1, 0x0, %dg16
ldd,2,sm %dr1, 0x8, %dg17
ldd,3,sm %dr1, _f16s,_lts0lo 0x10, %dg18
addd,4 0x0, %db[0], %dr6
ldd,5,sm %dr1, _f16s,_lts0hi 0x18, %dg19
}
{
disp %ctpr1, .L933
ldd,0,sm %dr1, _f16s,_lts0lo 0x28, %dr15
merges,1,sm %r8, _f16s,_lts0hi 0x60, %g20, ~%pred1
ldd,2,sm %dr1, _f16s,_lts1lo 0x20, %dg21
cmpledb,3,sm %dr6, %dr1, %pred1
addd,4 0x0, %dr6, %dr0 ? ~%pred0
ldd,5,sm %dr1, _f16s,_lts1hi 0x30, %dr14
}
{
ct %ctpr3 ? ~%pred0
cmpledb,0,sm %dr6, %dr0, %pred2
sxt,1,sm 0x2, %g20, %dg20
ldd,2,sm %dr1, _f16s,_lts0lo 0x38, %dr13
subd,3,sm %dr6, 0x8, %dg23
subd,4,sm %dr6, 0x8, %dg22
ldd,5,sm %dr1, _f16s,_lts0hi 0x40, %dr12
}
{
shld,0,sm %dg20, 0x3, %dg20
cmpledb,1,sm %dr6, %dr2, %pred3
}
{
addd,0,sm 0x8, %dg20, %dg20
}
{
addd,0,sm %dg20, %dr1, %dg24
addd,1,sm %dg20, %dr0, %dg25
addd,2,sm %dg20, %dr2, %dg20
fmuld,3,sm %dr4, %dg16, %dr20
fmuld,4,sm %dr4, %dg17, %dr19
fmuld,5,sm %dr4, %dg18, %dr18
}
{
cmpledb,0,sm %dg24, %dg22, %pred4
cmpledb,1,sm %dg25, %dg23, %pred5
fmuld,2,sm %dr4, %dg19, %dr17
addd,3,sm 0x0, _f64,_lts0 0x3fffd70a3d70a3d7, %dr0
fmuld,4,sm %dr4, %dg21, %dr16
}
{
cmpledb,0,sm %dg20, %dg22, %pred4
pass %pred4, @p0
pass %pred1, @p1
landp ~@p0, ~@p1, @p4
pass @p4, %pred1
pass %pred5, @p2
pass %pred2, @p3
landp ~@p2, ~@p3, @p5
pass @p5, %pred2
}
{
pass %pred0, @p0
pass %pred2, @p1
landp @p0, ~@p1, @p4
pass @p4, %pred0
pass %pred1, @p2
landp @p4, ~@p2, @p5
pass @p5, %pred1
}
{
nop 2
pass %pred1, @p0
pass %pred4, @p1
landp @p0, @p1, @p4
pass @p4, %pred0
landp @p0, ~@p1, @p5
pass @p5, %pred1
pass %pred3, @p2
landp @p5, @p2, @p6
pass @p6, %pred2
}
{
ct %ctpr1 ? %pred0
}
{
ct %ctpr1 ? %pred2
}
{
setwd wsz = 0x35, nfx = 0x1, dbl = 0x1
setbn rsz = 0x28, rbs = 0xc, rcur = 0x0
disp %ctpr1, .L571
addd,0 0x0, 0x0, %dg16
addd,3 0x0, _f64,_lts1 0x3fffd70a3d70a3d7, %dr0
}
{
addd,0 0x0, _f64,_lts0 0x20ff2000000000, %dg17
aaurwd,2 %dr6, %aad0
addd,3,sm 0x0, 0x0, %db[32]
}
{
insfd,0 %dg17, _f32s,_lts1 0x8800, %dr11, %dg17
aaurwd,2 %dg16, %aasti1
addd,3,sm %db[32], _f16s,_lts0lo 0x10, %dg19
addd,4,sm 0x8, %db[32], %dg18
addd,5,sm %db[32], _f16s,_lts0hi 0x18, %dg20
}
{
ldd,0,sm %dr10, %db[32], %db[60], mas=0x4
addd,1,sm %db[32], _f16s,_lts0lo 0x30, %dg22
addd,2,sm %db[32], _f16s,_lts1hi 0x38, %dg23
ldd,3,sm %dr1, %db[32], %db[79], mas=0x4
addd,4,sm %db[32], _f16s,_lts0hi 0x20, %dg16
addd,5,sm %db[32], _f16s,_lts1lo 0x28, %dg21
}
{
ldd,0,sm %dr1, %dg19, %db[75], mas=0x4
addd,1,sm %db[32], _f16s,_lts0lo 0x50, %dg26
addd,2,sm %db[32], _f16s,_lts1hi 0x58, %dg27
ldd,3,sm %dr1, %dg18, %db[77], mas=0x4
addd,4,sm %db[32], _f16s,_lts0hi 0x40, %dg24
addd,5,sm %db[32], _f16s,_lts1lo 0x48, %dg25
}
{
ldd,0,sm %dr1, %dg16, %db[71], mas=0x4
addd,1,sm %db[32], _f16s,_lts0lo 0x60, %dg28
addd,2,sm %db[32], _f16s,_lts0hi 0x70, %dg30
ldd,3,sm %dr1, %dg20, %db[73], mas=0x4
addd,4,sm %db[32], _f16s,_lts1lo 0x68, %dg29
addd,5,sm %db[32], _f16s,_lts1hi 0x78, %db[2]
}
{
ldd,0,sm %dr10, %dg18, %db[58], mas=0x4
addd,1,sm 0x0, %dg18, %db[30]
addd,2,sm 0x0, %dg19, %db[28]
ldd,3,sm %dr1, %dg21, %db[69], mas=0x4
addd,4,sm 0x0, %dg20, %db[26]
addd,5,sm 0x0, %dg16, %db[24]
}
{
ldd,0,sm %dr10, %dg19, %db[56], mas=0x4
addd,1,sm 0x0, %dg21, %db[22]
addd,2,sm 0x0, %dg22, %db[20]
ldd,3,sm %dr1, %dg22, %db[67], mas=0x4
addd,4,sm 0x0, %dg23, %db[18]
addd,5,sm 0x0, %dg24, %db[16]
}
{
ldd,0,sm %dr1, %dg24, %db[63], mas=0x4
addd,1,sm 0x0, %dg25, %db[14]
addd,2,sm 0x0, %dg26, %db[12]
ldd,3,sm %dr1, %dg23, %db[65], mas=0x4
fmuld,4,sm %dr4, %db[79], %dg31
addd,5,sm 0x0, %dg27, %db[10]
}
{
ldd,0,sm %dr1, %dg25, %db[61], mas=0x4
addd,1,sm 0x0, %dg28, %db[8]
addd,2,sm 0x0, %dg29, %db[6]
fmuld,3,sm %dr4, %db[75], %dg24
fmuld,4,sm %dr4, %db[77], %dg23
addd,5,sm 0x0, %dg30, %db[4]
}
{
ldd,0,sm %dr2, %db[32], %db[72], mas=0x4
fmuld,3,sm %dr4, %db[71], %dr8
fmuld,4,sm %dr4, %db[73], %dg25
}
{
ldd,0,sm %dr2, %dg18, %db[70], mas=0x4
fmuld,4,sm %dr4, %db[69], %dr11
}
{
ldd,0,sm %dr2, %dg19, %db[68], mas=0x4
ldd,3,sm %dr10, %dg20, %db[54], mas=0x4
fmuld,4,sm %dr4, %db[67], %dg19
fdivd,5,sm %dg31, %dr0, %dg18
}
{
ldd,0,sm %dr10, %dg16, %db[52], mas=0x4
ldd,3,sm %dr10, %dg21, %db[50], mas=0x4
fmuld,4,sm %dr4, %db[65], %dg21
fmuld,5,sm %dr4, %db[63], %dg31
}
{
ldd,0,sm %dr1, %dg27, %db[57], mas=0x4
ldd,3,sm %dr1, %dg26, %db[59], mas=0x4
fmuld,4,sm %dr4, %db[61], %dg26
fdivd,5,sm %dg23, %dr0, %dg23
}
{
rwd,0 %dg17, %lsr
ldd,3,sm %dr2, %dg20, %db[66], mas=0x4
}
{
ldd,0,sm %dr2, %dg16, %db[64], mas=0x4
ldd,3,sm %dr10, %dg22, %db[48], mas=0x4
fdivd,5,sm %dg24, %dr0, %dg17
}
{
ldd,0,sm %dr1, %dg28, %db[55], mas=0x4
ldd,3,sm %dr1, %dg29, %db[53], mas=0x4
}
{
ldd,0,sm %dr1, %dg30, %db[51], mas=0x4
fdivd,5,sm %dg25, %dr0, %dg16
}
{
fmuld,0,sm %dr4, %db[59], %db[5]
fmuld,1,sm %dr4, %db[57], %db[3]
}
{
nop 1
fdivd,5,sm %dr8, %dr0, %dg20
}
{
nop 1
fdivd,5,sm %dr11, %dr0, %dg22
}
{
nop 1
fdivd,5,sm %dg19, %dr0, %db[35]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[60], %dg18, %db[44]
fdivd,5,sm %dg21, %dr0, %db[33]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[58], %dg23, %db[42]
fdivd,5,sm %dg31, %dr0, %db[31]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[56], %dg17, %db[40]
fdivd,5,sm %dg26, %dr0, %db[29]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[54], %dg16, %db[38]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[52], %dg20, %db[36]
fmul_addd,4,sm %db[72], %dr5, %db[44], %db[80]
}
{
nop 1
fmul_addd,3,sm %dr3, %db[50], %dg22, %db[34]
fmul_addd,4,sm %db[70], %dr5, %db[42], %db[78]
}
{
fmul_addd,3,sm %db[68], %dr5, %db[40], %db[76]
}
.L571:
{
loop_mode
rbranch .L1495
ldd,0,sm %dr10, %db[18], %db[46], mas=0x4 ? %pcnt7
fmuld,1,sm %dr4, %db[55], %db[1]
ldd,2 %dr2, %db[32], %db[72], mas=0x3 ? %pcnt0
fdivd,5,sm %db[5], %dr0, %db[27]
}
.L1511:
{
loop_mode
rbranch .L1498
ldd,0,sm %dr2, %db[22], %db[62], mas=0x4 ? %pcnt5
addd,1,sm 0x8, %db[2], %db[0]
ldd,2 %dr1, %db[32], %db[79], mas=0x3 ? %pcnt0
ldd,3,sm %dr1, %db[2], %db[49], mas=0x4
fmul_addd,4,sm %db[66], %dr5, %db[38], %db[74]
ldd,5 %dr10, %db[32], %db[60], mas=0x3 ? %pcnt0
}
.L1508:
{
loop_mode
alc alcf=1, alct=1
abn abnf=1, abnt=1
ct %ctpr1 ? %NOT_LOOP_END
fmul_addd,4,sm %dr3, %db[48], %db[35], %db[32]
staad,5 %db[80], %aad0[ %aasti1 ]
incr,5 %aaincr0
}
{
setwd wsz = 0x10, nfx = 0x1, dbl = 0x1
setbn rsz = 0x3, rbs = 0xc, rcur = 0x0
disp %ctpr1, .L421
adds,0 0x0, 0x0, %g16
addd,1 0x0, 0x0, %dg17
}
{
return %ctpr3
mmurw,2 %dg17, %dam_inv
}
{
nop 3
aaurw,2 %g16, %aabf0
}
{
ct %ctpr1
}
.L933:
{
setwd wsz = 0x28, nfx = 0x1, dbl = 0x1
setbn rsz = 0x1b, rbs = 0xc, rcur = 0x0
ldisp %ctpr2, .L1133
addd,0 0x0, _f64,_lts1 0x20492000000000, %dg17
fmuld,1,sm %dr4, %dr14, %dg20
fmuld,2,sm %dr4, %dr15, %dg21
fmuld,3,sm %dr4, %dr12, %dg18
fmuld,4,sm %dr4, %dr13, %dg19
fdivd,5,sm %dr16, %dr0, %dg16
}
{
disp %ctpr1, .L608
insfd,0 %dg17, _f32s,_lts0 0x8800, %dr11, %dg17
addd,1,sm 0x0, %dr10, %dg22
addd,2,sm 0x0, %dr1, %dg24
addd,3,sm 0x0, %dr2, %dg26
addd,4,sm 0x0, 0x0, %dg23
addd,5,sm 0x0, 0x0, %dg25
}
{
return %ctpr3
rwd,0 %dg17, %lsr
aaurwd,2 %dr6, %aad3
fdivd,5,sm %dr17, %dr0, %dg28
}
{
ldd,0,sm %dr10, _f16s,_lts0hi 0x20, %dg29
addd,1,sm 0x0, 0x0, %dg27
addd,2,sm %dg26, _f16s,_lts0lo 0xa8, %dg26
addd,3,sm %dg22, _lit16_ref,_lts0lo 0xa8, %dg22
addd,4 0x0, 0x0, %dg17
addd,5,sm %dg24, _lit16_ref,_lts0lo 0xa8, %dg24
}
{
ldd,0,sm %dr10, _f16s,_lts0hi 0x10, %dr8
ldd,2,sm %dr10, 0x8, %dr11
ldd,3,sm %dr10, _f16s,_lts0lo 0x18, %dg31
fdivd,5,sm %dr18, %dr0, %dg30
}
{
aaurwq,2 %qg26, %aad0
}
{
ldd,0,sm %dr10, 0x0, %dg17
aaurwd,2 %dg17, %aasti1
fdivd,5,sm %dr19, %dr0, %dg26
}
{
aaurwq,2 %qg22, %aad2
}
{
ldd,0,sm %dr1, _f16s,_lts0lo 0x58, %dg27
ldd,2,sm %dr1, _f16s,_lts1lo 0x50, %dr12
ldd,3,sm %dr1, _f16s,_lts0hi 0x60, %dg23
fdivd,5,sm %dr20, %dr0, %dg22
}
{
aaurwq,2 %qg24, %aad1
}
{
bap
ldd,0,sm %dr1, _f16s,_lts0lo 0x78, %dg25
ldd,2,sm %dr1, _f16s,_lts1lo 0x70, %dr13
ldd,3,sm %dr1, _f16s,_lts0hi 0x48, %dg24
fdivd,5,sm %dg18, %dr0, %dg18
}
{
ldd,0,sm %dr2, _f16s,_lts0lo 0x20, %dr15
ldd,2,sm %dr2, _f16s,_lts1lo 0x18, %dr16
ldd,3,sm %dr1, _f16s,_lts0hi 0x68, %dr14
}
{
ldd,0,sm %dr2, 0x8, %dr18
ldd,2,sm %dr2, 0x0, %dr19
ldd,3,sm %dr2, _f16s,_lts0lo 0x10, %dr17
fdivd,5,sm %dg19, %dr0, %dg19
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0x38, %dr21
fmuld,1,sm %dr4, %dr12, %dr12
ldd,2,sm %dr10, _f16s,_lts1lo 0x30, %dr22
ldd,3,sm %dr10, _f16s,_lts0hi 0x40, %dr20
fmuld,4,sm %dr4, %dg23, %dg23
fmuld,5,sm %dr4, %dg27, %dg27
}
{
ldd,0,sm %dr1, _f16s,_lts0lo 0x80, %dr23
ldd,2,sm %dr1, _f16s,_lts1lo 0xa0, %db[50]
ldd,3,sm %dr10, _f16s,_lts0hi 0x28, %dg29
fmul_addd,4,sm %dr3, %dg29, %dg16, %dg16
fdivd,5,sm %dg20, %dr0, %dg20
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0xa0, %db[3]
fmuld,1,sm %dr4, %dr13, %dr13
ldd,2,sm %dr2, _lit16_ref,_lts0lo 0xa0, %db[2]
ldd,3,sm %dr1, _f16s,_lts0hi 0x98, %db[52]
fmuld,4,sm %dr4, %dg24, %dg24
fmuld,5,sm %dr4, %dg25, %dg25
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0x98, %db[5]
fmuld,1,sm %dr4, %dr14, %dg31
ldd,2,sm %dr2, _lit16_ref,_lts0lo 0x98, %db[4]
ldd,3,sm %dr1, _f16s,_lts0hi 0x90, %db[54]
fmul_addd,4,sm %dr3, %dg31, %dg28, %dg28
fdivd,5,sm %dg21, %dr0, %dg21
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0x90, %db[7]
ldd,2,sm %dr2, _lit16_ref,_lts0lo 0x90, %db[6]
ldd,3,sm %dr10, _f16s,_lts0hi 0x88, %db[9]
}
{
ldd,0,sm %dr2, _f16s,_lts0lo 0x88, %db[8]
ldd,2,sm %dr10, _f16s,_lts0hi 0x80, %db[11]
ldd,3,sm %dr2, _lit16_ref,_lts0hi 0x80, %db[10]
fmul_addd,4,sm %dr3, %dr8, %dg30, %dg30
ldd,5,sm %dr10, _f16s,_lts1lo 0x78, %db[13]
}
{
ldd,0,sm %dr2, _f16s,_lts0lo 0x78, %db[12]
fmuld,1,sm %dr4, %dr23, %db[51]
ldd,2,sm %dr10, _f16s,_lts0hi 0x70, %db[15]
ldd,3,sm %dr2, _lit16_ref,_lts0hi 0x70, %db[14]
fdivd,5,sm %dg23, %dr0, %db[42]
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0x68, %db[17]
ldd,2,sm %dr2, _lit16_ref,_lts0lo 0x68, %db[16]
ldd,3,sm %dr10, _f16s,_lts0hi 0x60, %db[19]
fmul_addd,4,sm %dr3, %dr11, %dg26, %dg23
ldd,5,sm %dr2, _lit16_ref,_lts0hi 0x60, %db[18]
}
{
ldd,0,sm %dr10, _f16s,_lts0lo 0x58, %db[21]
ldd,2,sm %dr2, _lit16_ref,_lts0lo 0x58, %db[20]
ldd,3,sm %dr10, _f16s,_lts0hi 0x50, %db[23]
fdivd,5,sm %dg27, %dr0, %db[44]
}
{
ldd,0,sm %dr2, _f16s,_lts0lo 0x50, %db[22]
ldd,2,sm %dr10, _f16s,_lts0hi 0x48, %db[25]
ldd,3,sm %dr2, _lit16_ref,_lts0hi 0x48, %db[24]
fmul_addd,4,sm %dr3, %dg17, %dg22, %dg17
ldd,5,sm %dr2, _f16s,_lts1lo 0x40, %db[26]
}
{
ldd,0,sm %dr2, _f16s,_lts0lo 0x38, %db[28]
ldd,2,sm %dr2, _f16s,_lts0hi 0x30, %db[30]
ldd,3,sm %dr2, _f16s,_lts1lo 0x28, %db[32]
fmul_addd,4,sm %dr15, %dr5, %dg16, %db[27]
fdivd,5,sm %dr12, %dr0, %db[46]
}
{
ldd,0,sm %dr1, _f16s,_lts0lo 0x88, %dg16
fmul_addd,3,sm %dr3, %dr20, %dg18, %db[39]
fmul_addd,4,sm %dr16, %dr5, %dg28, %db[29]
}
{
fdivd,5,sm %dg24, %dr0, %db[48]
}
{
fmul_addd,3,sm %dr17, %dr5, %dg30, %db[31]
fmul_addd,4,sm %dr3, %dr21, %dg19, %db[41]
}
{
fdivd,5,sm %dg25, %dr0, %db[36]
}
{
fmul_addd,3,sm %dr18, %dr5, %dg23, %db[33]
fmul_addd,4,sm %dr3, %dr22, %dg20, %db[43]
}
{
fmuld,0,sm %dr4, %dg16, %db[49]
fdivd,5,sm %dr13, %dr0, %db[38]
}
{
fmul_addd,3,sm %dr19, %dr5, %dg17, %db[35]
fmul_addd,4,sm %dr3, %dg29, %dg21, %db[45]
}
{
nop 7
fdivd,5,sm %dg31, %dr0, %db[40]
}
.L608:
{
loop_mode
fmul_addd,3,sm %dr3, %db[25], %db[48], %db[37]
fmuld,4,sm %dr4, %db[54], %db[47]
fdivd,5,sm %db[51], %dr0, %db[34]
movad,0 area=0, ind=0, am=1, be=0, %db[0]
movad,1 area=1, ind=0, am=1, be=0, %db[1]
}
{
loop_mode
alc alcf=1, alct=1
abn abnf=1, abnt=1
ct %ctpr1 ? %NOT_LOOP_END
staad,2 %db[35], %aad3[ %aasti1 ]
incr,2 %aaincr0
fmul_addd,3,sm %db[32], %dr5, %db[45], %db[25]
movad,3 area=0, ind=0, am=1, be=0, %db[48]
}
{
setwd wsz = 0x10, nfx = 0x1, dbl = 0x1
setbn rsz = 0x3, rbs = 0xc, rcur = 0x0
adds,0 0x0, 0x0, %g16
}
{
disp %ctpr2, disp=0x0
aaurw,2 %g16, %aabf0
}
.L421:
{
ct %ctpr3
addd,3 0x0, %dr6, %dr0
}
.L1133:
{
fapb ct=0, dcd=0, fmt=4, mrng=8, d=0, incr=0, ind=0, asz=4, abs=0, disp=0
fapb dpl=0, dcd=0, fmt=4, mrng=8, d=1, incr=0, ind=0, asz=5, abs=0, disp=0
}
{
fapb ct=1, dcd=0, fmt=4, mrng=8, d=2, incr=0, ind=0, asz=4, abs=16, disp=0
}
.L1495:
{
nop 3
}
{
nop 7
fmul_addd,0,sm %db[72], %dr5, %db[44], %db[80]
}
{
ibranch .L1511
}
.L1498:
{
nop 3
}
{
nop 3
fmuld,3,sm %dr4, %db[79], %db[25]
}
{
nop 7
fdivd,5,sm %db[25], %dr0, %db[47]
}
{
nop 5
}
{
nop 7
fmul_addd,3,sm %dr3, %db[60], %db[47], %db[44]
}
{
nop 7
fmul_addd,3,sm %db[72], %dr5, %db[44], %db[80]
}
{
nop 1
}
{
ibranch .L1508
}
main:
{
setwd wsz = 0x16, nfx = 0x0, dbl = 0x0
setbn rsz = 0x3, rbs = 0x12, rcur = 0x0
disp %ctpr1, calloc
getsp,0 _f32s,_lts1 0xffffff40, %dr2
addd,1 0x0, _f64,_lts2 0x400000003, %dr3
scrd,3 0x1, 0x2, %dr4
}
{
qppackdl,0 %dr3, _f64,_lts2 0x200000001, %xr3
addd,1,sm 0x4, 0x0, %db[1]
addd,2 0x0, _f64,_lts0 0xffffffff00000001, %dr5
addd,3,sm 0x4, 0x0, %db[0]
addd,4 0x0, %dr4, %dr6
}
{
addd,0 %dr2, _f64,_lts0 0xc0, %dr1
addd,1 0x0, _f64,_lts2 0x100000002, %dr7
addd,2 0x0, %dr5, %dr9
}
{
qppackdl,0 %dr9, %dr5, %xr5
qppackdl,1 %dr7, _f64,_lts1 0x400000008, %xr3
stqp,2 %dr1, _f16s,_lts0lo 0xffe0, %xr3
adds,3 0x0, _f16s,_lts0hi 0x705c, %r7
}
{
ldw,0 %dr1, _f16s,_lts0lo 0xffec, %r9
addd,1 0x0, [ _f64,_lts2 .LC.1 ], %dr13
ldw,2 %dr1, _f16s,_lts0hi 0xffe4, %r11
ldw,3 %dr1, _f16s,_lts1lo 0xffe8, %r10
ldw,5 %dr1, _f16s,_lts1hi 0xffe0, %r12
}
{
call %ctpr1, wbs = 0x12
stqp,2 %dr1, _f16s,_lts0lo 0xffd0, %xr3
addd,4 0x0, _f64,_lts1 0x4010000000000000, %dr14
ldw,5 %dr1, _f16s,_lts0hi 0xffdc, %r3
}
{
disp %ctpr1, printf
ldw,0 %dr1, _f16s,_lts0lo 0xffd8, %r5
stqp,2 %dr1, _f16s,_lts0hi 0xfff0, %xr5
addd,3 0x0, _f64,_lts2 0x3ff0000000000000, %dr16
ldw,5 %dr1, _f16s,_lts1lo 0xffd4, %r15
}
{
ldw,0 %dr1, _f16s,_lts0lo 0xfff8, %r19
ldw,2 %dr1, _f16s,_lts0hi 0xffd0, %r17
ldw,3 %dr1, _f16s,_lts1lo 0xfff4, %r20
qppackdl,4 %dr14, _f64,_lts2 0x4008000000000000, %xr21
ldw,5 %dr1, _f16s,_lts1hi 0xfffc, %r18
}
{
addd,1 0x0, _f64,_lts1 0x4020000000000000, %dr23
ldw,2 %dr1, _f16s,_lts0lo 0xfff0, %r22
qppackdl,4 %dr6, %dr16, %xr6
}
{
getfs,0 %r9, %r7, %r24
getfs,1 %r10, %r7, %r25
getfs,3 %r11, %r7, %r26
getfs,4 %r12, %r7, %r7
shls,5 %r9, 0x3, %r9
}
{
shls,0 %r10, 0x3, %r10
ands,1 %r25, 0x1, %r25
shls,2 %r11, 0x3, %r11
ands,3 %r26, 0x1, %r26
shls,4 %r12, 0x3, %r12
ands,5 %r7, 0x1, %r7
}
{
shls,0 %r3, 0x1, %r27
shls,1 %r3, 0x2, %r3
ands,2 %r24, 0x1, %r24
shls,3 %r5, 0x1, %r28
shls,4 %r5, 0x2, %r5
shls,5 %r15, 0x1, %r29
}
{
shls,0 %r15, 0x2, %r15
shls,1 %r20, 0x3, %r30
shls,2 %r18, 0x3, %r31
shls,3 %r19, 0x3, %r32
shls,4 %r17, 0x1, %r33
shls,5 %r17, 0x2, %r17
}
{
shls,0 %r22, 0x3, %r34
adds,1 %r10, %r25, %r10
adds,2 %r12, %r7, %r7
adds,3 %r11, %r26, %r11
subs,4 %r27, %r18, %r12
adds,5 %r9, %r24, %r9
}
{
adds,0 %r3, %r31, %r3
subs,1 %r28, %r19, %r18
adds,2 %r5, %r32, %r5
subs,3 %r29, %r20, %r19
adds,4 %r15, %r30, %r15
subs,5 %r33, %r22, %r20
}
{
adds,0 %r17, %r34, %r17
sars,1 %r10, 0x1, %r10
adds,2 %r18, %r5, %r5
sars,3 %r7, 0x1, %r7
sars,4 %r11, 0x1, %r11
adds,5 %r19, %r15, %r15
}
{
adds,0 %r12, %r3, %r3
adds,1 %r20, %r17, %r12
sars,2 %r9, 0x1, %r9
qpswitchd,3 %xr21, %xr15
qpswitchd,4 %xr6, %xr17
adds,5 %r15, %r11, %r11
}
{
adds,0 %r5, %r10, %r5
adds,1 %r3, %r9, %r3
adds,2 %r12, %r7, %r7
sxt,3 0x2, %r11, %db[2]
addd,4 0x0, _f64,_lts0 0x3fffd70a3d70a3d7, %dr9
addd,5 0x0, _f64,_lts2 0xbff0000000000000, %dr10
}
{
sxt,0 0x2, %r5, %db[3]
sxt,1 0x2, %r3, %db[4]
sxt,2 0x2, %r7, %dr12
qppackdl,3 %dr10, %dr16, %xr7
addd,4 0x0, _f64,_lts0 0x401c000000000000, %dr10
stw,5 %db[0], 0x0, %r7
}
{
addd,0 0x0, _f64,_lts2 0x4018000000000000, %dr18
addd,2,sm 0x0, %dr12, %db[1]
qpswitchd,3 %xr7, %xr11
qppackdl,4 %dr14, _f64,_lts0 0x4020000000000000, %xr14
stw,5 %db[0], 0x4, %r11
}
{
addd,0 0x0, [ _f64,_lts0 .LC.2 ], %dr16
qppackdl,3 %dr16, %dr4, %xr4
qpswitchd,4 %xr14, %xr5
stw,5 %db[0], 0x8, %r5
}
{
addd,2,sm 0x0, [ _f64,_lts0 .LC.1 ], %db[0]
qpswitchd,3 %xr4, %xr3
stw,5 %db[0], 0xc, %r3
}
{
std,2 0x18, %dr2, %db[3]
std,5 %dr2, _f16s,_lts0lo 0x20, %db[4]
}
{
std,2 %dr2, 0x8, %dr12
std,5 0x10, %dr2, %db[2]
}
{
std,2 %dr2, 0x0, %dr13
}
{
call %ctpr1, wbs = 0x12
}
{
nop 4
disp %ctpr1, calloc
addd,0,sm 0x4, 0x0, %db[0]
addd,1 0x8, 0x0, %db[1]
}
{
call %ctpr1, wbs = 0x12
}
{
disp %ctpr1, printf
fmuld,0,sm %dr6, %dr23, %dr6
fmuld,1 %dr11, %dr10, %dr11
fmuld,2 %dr7, %dr10, %dr7
fmuld,3,sm %dr15, %dr23, %dr12
fmuld,4,sm %dr21, %dr23, %dr13
fmuld,5,sm %dr17, %dr23, %dr15
}
{
nop 2
fmuld,0 %dr3, %dr18, %dr3
fmuld,1 %dr4, %dr18, %dr4
fmuld,2 %dr5, %dr18, %dr5
fmuld,3 %dr14, %dr18, %dr10
}
{
nop 1
fdivd,5,sm %dr12, %dr9, %dr12
}
{
nop 1
fdivd,5,sm %dr13, %dr9, %dr13
}
{
nop 1
fdivd,5,sm %dr15, %dr9, %dr14
}
{
nop 7
fdivd,5,sm %dr6, %dr9, %dr6
}
{
nop 1
faddd,3,sm %dr11, %dr12, %dr9
}
{
nop 1
faddd,3,sm %dr7, %dr13, %dr12
}
{
nop 1
faddd,3,sm %dr11, %dr14, %dr11
faddd,4,sm %dr9, %dr3, %dr3
}
{
nop 1
faddd,3,sm %dr7, %dr6, %dr6
faddd,4,sm %dr12, %dr4, %dr4
}
{
nop 1
faddd,3,sm %dr11, %dr5, %dr5
}
{
nop 1
addd,0,sm 0x0, %dr3, %db[4]
faddd,3,sm %dr6, %dr10, %dr6
}
{
nop 1
addd,0,sm 0x0, %dr4, %db[3]
}
{
addd,0,sm 0x0, %dr5, %db[2]
std,5 %db[0], 0x0, %dr6
}
{
std,5 %db[0], 0x8, %dr5
}
{
addd,0,sm 0x0, %dr6, %db[1]
std,5 %db[0], _f16s,_lts0lo 0x10, %dr4
}
{
addd,0,sm 0x0, [ _f64,_lts1 .LC.2 ], %db[0]
std,5 %db[0], _f16s,_lts0lo 0x18, %dr3
}
{
std,2 %dr2, _f16s,_lts0lo 0x20, %dr3
std,5 0x18, %dr2, %dr4
}
{
std,2 0x10, %dr2, %dr5
std,5 %dr2, 0x8, %dr6
}
{
std,2 %dr2, 0x0, %dr16
}
{
call %ctpr1, wbs = 0x12
}
{
nop 5
return %ctpr3
addd,3 0x0, 0x0, %dr0
}
{
ct %ctpr3
}
.LC.1:
.ascii "%d %d %d %d\n\000"
.LC.2:
.ascii "%f %f %f %f\n\000"
elbrus_optimizing_compiler_v1.24.10_Mar__8_2020 = 0x0
Can I ask what the plug is about? Getting the word on Elbrus out there, support from the IC? .
Is any of this hardware readily available outside of Russia?
Look for IcepeakITX ELBRUS-8CB.
Interesting. So they crowdfunded the board but the CPU is sourced from Russia, basically?
Oof too rich for my blood.
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 81921 microseconds.
(= 81921 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 16674.2 0.102085 0.095957 0.142147
Scale: 16511.2 0.101129 0.096904 0.129083
Add: 19486.0 0.126514 0.123165 0.140751
Triad: 19358.4 0.124993 0.123977 0.125983
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
$ cc stream.c -O4 -DSTREAM_ARRAY_SIZE=100000000 -DSTREAM_TYPE=double -fopenmp
entityfx@yukari:~/STREAM$ ./a.out
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 32
Number of Threads counted = 32
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 31778 microseconds.
(= 31778 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 33722.5 0.117056 0.047446 0.350954
Scale: 32959.0 0.133574 0.048545 0.429537
Add: 37047.9 0.193759 0.064781 0.668979
Triad: 36455.3 0.165948 0.065834 0.411430
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 100000000 (elements), Offset = 0 (elements)
Memory per array = 762.9 MiB (= 0.7 GiB).
Total memory required = 2288.8 MiB (= 2.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 45269 microseconds.
(= 45269 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 23097.3 0.069884 0.069272 0.070459
Scale: 23137.4 0.069689 0.069152 0.070604
Add: 25578.7 0.094895 0.093828 0.096911
Triad: 25643.2 0.094898 0.093592 0.096150
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------