• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

DNet responds to P4 core inquiry

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
Is any work been done yet on a d.net core (RC5) for Pentium 4 Processors?
I would really be interested in any benchmarks you have, even if still
preliminary.

Thanks,
- Train
----------------------------
Not yet.

If you think one of the current cores is best suited for the P4, or
if you can improve on one of the current cores, I'd be happy to hear
back from you!

Ivo
----------------------------
Well ive looked at the Pentium 3 core source code and ive also been trying to learn the new instructions from Intel for the P4 and SSE2, it will take a while, but i have a few ideas that may take advantage of the new instructions. I'll let you know.
- Train



Looks like ill have to break out the old Assembly language to tackle this one, maybe it would help if i actually had a P4 to work with. I think im just gonna make a few different versions and send them off to Anand to see if he will benchmark them.

One thing i noticed is that the source for the P3 core only uses 2 pipes, i guess the P3 only HAS 2 pipes, well if the P4 has 20, thats a huge difference right there, the P3 core was hard coded to use 2 pipes only (one way to make sure its a P3), that means if you ran it on a P4 you are using 10% of the available pipeline. the P3 core is also using MMX, not SSE, actually moving up to SSE2 will again make a huge difference. This should be fun..
 

Joe O

Senior member
Oct 11, 1999
961
0
0


<< This should be fun. >>

I envy you!

You might want to take a look at the G4 Altivec code. It also has 128 bit wide registers, and the bit-slice algorithm might make sense.
 

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
hmm, i didnt even think of that, great idea Joe!

I just assumed the P3 core would be the best to go off of, but the G4 core actually would make more sense.
 

JWMiddleton

Diamond Member
Aug 10, 2000
5,686
172
106
Hey Train,

I believe that 20 refers to the depth of the instruction pipeline, not the number of pipelines. I don't know how many parallel pipelines they have.

Sounds like you have a fun project ahead of you. Good luck with it!
 

jinsonxu

Golden Member
Aug 19, 2000
1,370
0
0
Sounds like great fun!

Train, when you want to work on the Athlon core, let me know. I offer the services of my Duron. :)
 

sciencewhiz

Diamond Member
Jun 30, 2000
5,885
8
81
Train,

The best starting point would be the probably be the P3 core, unless you know assembly for the PPC.

The main reason that the P4 core is so slow is that it relies heavily on rotate instructions, which take 4 cycles to complete, vs 1 on a P3 and Athlon. I found a intel whitepaper on the P4, which gives some hints on how to get around having to use rotates. It is on my other computer. Once I get over there, I'll post the link.

My suggestion for working on a core is to first try and get rid of the rotates. This should give a fairly nice improvement. After that, and after you are comfortable with the code, you could try messing with SSE2.

The other important thing to find out is what core the client chooses automatically. Since Anand doesn't always answer his e-mail, you might also want to try and find a more reliable source of a p4.
 

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
I fired an email to anand, i asked him if he would benchmark a re-compiled core, hopefully he'll respond positively.

Intel is going to send me a 30 day evaluation of their new Version 5 C++ compiler, which has full IA-32 micro-architecture support and optimizations. Wont be as perfect as hand coding SSE2, but it will be a lot easier to get something going. I have the IA-32 SSE2 manual, its a good 1,000 pages, so i dont see making any amazing breakthroughs for a while. One goody i did find though was some hand coded ASM on Intels site that demonstrates optimized math on IA-32 chips (P4 and Itanium) which should definelty offer some good insight as to where to go next
 

Riv

Member
Oct 5, 2000
80
0
0
Hmm, last time I checked the dnet client source there were no special P3 core at all, just the PPro class core. That core don't even use MMX (but the P5 MMX core sucks at P3 anyway...) For quickly start coding SSE2 the P4 instruction set reference manual and the basic architecture manual should be adequate. Although a lot of pages there isn't a lot of text. Good luck!
 

toph99

Diamond Member
Aug 25, 2000
5,505
0
0
i don't know a whole lot about any of this stuff, but why start with the PIII core? after all, the P4 and PIII are different architectures(right?) maybe one of the other cores would have a better starting place? just an idea :)
 

Qythyx

Member
Feb 6, 2000
87
0
0
You might want to talk to Tom at Tom's Hardware. I know he's been benchmarking the P4 for MPEG4 encoding and would probably really appreciate a new benchmark tool.
 

Fandu

Golden Member
Oct 9, 1999
1,341
0
0
Just to let everyone know, we've already contacted Intel and they are looking into writing a P4 SSE2 core. We're getting some 512bit RSA SSE2 code, but I don't think it would be worth trying to convert that core. I think it's probably best to let Intel have a crack at it.
 

Riv

Member
Oct 5, 2000
80
0
0
If Intel tries to write a P4 RC5 core I hope they realize the usefulness of a SIMD rotate instruction- especially if each partial register could be rotated a different amount than the others (e.g. rotate the high 32 bits 4 steps to the left and rotate the low 32 bits 7 steps to the left) =)
 

Fandu

Golden Member
Oct 9, 1999
1,341
0
0
I didn't bother making suggestions to the engineer, I pointed him to the source code and I'll just let them come up with the best solution.
 

Riv

Member
Oct 5, 2000
80
0
0
Oh, that was a hint for a future instruction. MMX, SSE and SSE2 lacks such an instruction :(
 

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
well im still going to compile my own, as soon as the evaluation version of Intels C++ compiler with IA-32 support comes out, im going to recompile some cores and play around with them, and hopefully get someone with a P4 to test some of them.

Intel could take a year, or longer, to get around to this.
 

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
Oh, and the New version 5 compiler also has more refined optimizations for SSE as well, so we just might see a better P3 core as well.
 

idealego

Junior Member
Jan 17, 2000
3
0
0
As someone else already mentioned there is no P3 core, the core used by everything from a PPro and up is the PPro core. All cores are hand coded in asm so I fail to see how a newer compiler would speed things up.

If I had a P4 I might take a shot at writing some new code myself but I find it too frustrating writing for something you can't test it on yourself :)
 

Train

Lifer
Jun 22, 2000
13,587
82
91
www.bing.com
only the PPro core was coded, which doesnt take advantage of the new instructions in SSE, only that of MMX, new compilers know about the new instructions of both SSE and SSE2, and will use the proper instructions to speed things up wherever possible.
 

SSP

Lifer
Oct 11, 1999
17,727
0
0
I thought SSE only have FPU instructions? I know SSE2 have those 128bit registers that might be useful.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Ya, SSE only has FPU instructions. It's SSE2 that has the register enhancements and Integer instructions.
 

charrison

Lifer
Oct 13, 1999
17,033
1
81
Lets hope IVO can figure out the difference between pipes and pipeline stages before he gets too far into the optimization.
 

Riv

Member
Oct 5, 2000
80
0
0
There were a few integer instructions introduced with SSE, but they only operated on 64-bit (the MMX) registers. SSE2 brings 128-bit SIMD integer instructions.