Nehalem

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
No I don't think Nehalem will be a EPIC(VLIW) core. I believe larrabbee will be EPIC and the Nehalem will leverage this tech. Looking at Terra scale. To me it looks like intel for awhile is going to have a Master core (Nehalam/Gesher), and intel will use up to 4 larrabbee cards in pci-e and another on a cpu extra socket. I think it will be awhile before we see tera scale only cpus using EPIC . Even tho intel may call 8 core nehalems tera scale rather than many cores.

Back to the transitors them selves. transitor has 2 states open and closed. period . All logic is determined by the state of transitors in a byte/bit / block,

Basicly than is what I am saying is . SOFTWARE is what a makes a cpu functional.

Hardware as far as I understood it inside a cpu . Logic circuits are software . Some fixed function and some specultive. Other wise they couldn't translate the state that the transitor is in. Now if we could get rid of hardlocked software (logic circuites).

If simple inorder cpus using VLIW can get rid of some of those logic circuits. I see this as only good. Problem is X86 needs logic circuits to operate. Stores /ordering and the such.

With all the X86 programs we can't just go to another type cpu. .

If we have software that can take advantage of inorder cpu's and still run x86 apps. It allows companies to use VLIW so as to engineer a cooler more efcient processor

800 million tansitors doesn't impress me much. beings how a switch is about as exciting as watching daisies pushing up. Now software that controls the state of the transitor and can imput and exacute the most effciently does excite me, X86 isn't ware its at.

We all hale the transitor as something great . Which it is . But it a very simple device. it has 2 states open and closed. So if the transitor itself is that simple. I think we want software on the compiler doing the work rather than hardware logic . Just my thought . If the transitor is so simple . Keep the rest of the components as simple as possiable for best performance and efficiency. Sure coding software for VLIW may be a little harder for the programmers. But it will make a better cpu.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
You guys scratch your heads on what I am going on about. I am a lot older than most all you guys. My health not so good. I want to see this tech on this link . In use on Games befor my time is over. Right now INTEL is my best chance for that. TO achieve it I believe it will require. Very simple cores and lots of them . Running VLIW and so as to keep core simple a fantastic compiler is required. I just want to see this so bad. I really don't care who does it. I just believe intel can make it happen the fastest.

http://www.pcper.com/article.p...=506&type=expert&pid=1
 

Lord Banshee

Golden Member
Sep 8, 2004
1,495
0
0
Originally posted by: Nemesis 1
No I don't think Nehalem will be a EPIC(VLIW) core. I believe larrabbee will be EPIC and the Nehalem will leverage this tech. Looking at Terra scale. To me it looks like intel for awhile is going to have a Master core (Nehalam/Gesher), and intel will use up to 4 larrabbee cards in pci-e and another on a cpu extra socket. I think it will be awhile before we see tera scale only cpus using EPIC . Even tho intel may call 8 core nehalems tera scale rather than many cores.

Back to the transitors them selves. transitor has 2 states open and closed. period . All logic is determined by the state of transitors in a byte/bit / block,

Basicly than is what I am saying is . SOFTWARE is what a makes a cpu functional.

Hardware as far as I understood it inside a cpu . Logic circuits are software . Some fixed function and some specultive. Other wise they couldn't translate the state that the transitor is in. Now if we could get rid of hardlocked software (logic circuites).

If simple inorder cpus using VLIW can get rid of some of those logic circuits. I see this as only good. Problem is X86 needs logic circuits to operate. Stores /ordering and the such.

With all the X86 programs we can't just go to another type cpu. .

If we have software that can take advantage of inorder cpu's and still run x86 apps. It allows companies to use VLIW so as to engineer a cooler more efcient processor

800 million tansitors doesn't impress me much. beings how a switch is about as exciting as watching daisies pushing up. Now software that controls the state of the transitor and can imput and exacute the most effciently does excite me, X86 isn't ware its at.

We all hale the transitor as something great . Which it is . But it a very simple device. it has 2 states open and closed. So if the transitor itself is that simple. I think we want software on the compiler doing the work rather than hardware logic . Just my thought . If the transitor is so simple . Keep the rest of the components as simple as possiable for best performance and efficiency. Sure coding software for VLIW may be a little harder for the programmers. But it will make a better cpu.

ok , i don't know much about Larrabbee, so i'll end that discussion.

Just one question, have you ever designed some digital circuits at the transistor level or even at the gate level? As a transistor by it self is simple, connecting 800 million together is not, simple as that :)

About "reprogrammable logic", well this technology exist today but is not cheap, takes around 10 times as much area for transistor to transistor, and no where near as fast (1/4 or slower). There has been talks about using part of the core to have programmable logic and the other part being the core CPU, and this idea is possible with todays processing nodes. But again this stuff is not "simple". Anyway the technology you ask about existing in IC chips called FPGAs and CPLDs. The idea of having reprogrammable logic from software is one of the main concepts of "Reconfigurable Computing". Again seeing how much slower such technology is and how nonparallel a normal CPU is i don't thing CPUs will ever go that route, but adding some reprogrammable logic on core for things like specific application speed ups would be possible. For example, if i was running so sort of FEA analysis, the software could program circuits to forum as a massive parallel matrix ALU and the CPU would have direct access to this very specialized hardware. Anyway this kind of introduction of programmable logic probably will not see itself in normal CPUs within 5 years maybe more if ever, again not cost effective technology with you are selling millions-billions of these.


 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Nemesis 1
You guys scratch your heads on what I am going on about. I am a lot older than most all you guys. My health not so good. I want to see this tech on this link . In use on Games befor my time is over. Right now INTEL is my best chance for that. TO achieve it I believe it will require. Very simple cores and lots of them . Running VLIW and so as to keep core simple a fantastic compiler is required. I just want to see this so bad. I really don't care who does it. I just believe intel can make it happen the fastest.

http://www.pcper.com/article.p...=506&type=expert&pid=1

I can appreciate your sense of "hurry up already!". It is this similiar desire to see more revolution than evolution that is what I enjoy about reading your threads and posts. I may disagree with the likelihood or probability of a lot of what you write, but you are keeping your dreams within the realm of possibilty (i.e. I have not see you post that Intel is going to qubits at 32nm, which would be preposterous to expect at 32nm).

Incidentally it was my own desire to see the industry do things faster in their node scaling efforts that actually drove me to work in the industry. I could think of no better way to ensure it happened a bit faster by being there in person. TI was second only to Intel in releasing 65nm logic products, both companies shipping 65nm product in late 2005. Sadly you won't see TI shipping anything 45nm from their own fabs anytime soon, if ever.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Nemesis 1, the problem here is that while you know a little about alot of things concerning a CPU you don't actually know alot about anything here and consequently there are errors at every level of your understanding. For example a statement like "Logic circuits are software" is silly, logic circuits are hardware, hardware is the stuff made out of transistors, software is the stuff made out of lines of code. See, here is a VERY basic understanding of how this all goes:

You start with your nice little transistor (ON/OFF in a very simplified sense), now you can take 4 of these transistors and put them together and make a "logic gate", NAND/NOR/AND/OR, you cna then take these gates and put them together and get "logic circuits", say I take 4 of those gates I can make a "1-bit adder", now you build these up from there, I can take 32 "1-bit adders" put them together and get a "32-bit ripple carry adder", but thats no good because I can't control it, however those same gates can make other stuff, put 6 together and get a "1-but SRAM cell (cache memmory)", put a bunch more together and I can make decoders and encoders and multiplexors and registers and everything else I could ever need. So I slap an adder and a multiplier, shifter, bitwise logic circuits etc together and make an ALU, then I stick some registers together to make a register file for the ALU to use and put in some decoders to take in instructions from the outside world and control the ALU. That is some absurdly basic hardware CPU, the inputs to the decoders I might take out on pins to the outside world.

Then somewhere in another dimension (the software one) I am writing nice pleasant code in the C language, I hit the "compile" button and boom the compiler translates all my C code into assembly (could be X86, could be a VLIW code, whatever), the exact nature of the instructions it generates are already defined in something called and "instruction set architecture (ISA)", see this is where the hardware and software meet, the software and hardware have agreed upon a certain list of instructions (like the X86 list) and the compiler knows that the CPU will be able to know what these instructions means. However the compiler does NOT know HOW the CPU will do this, it does not know if the CPU is in order or OOO, it does not know if it is single issue or superscaler, it does not know how much cache the CPU has, or how fast it will run or anything like that. The compiler can only generate the stream of instructions based on some assumptions of what might help the CPU. Then that instruction stream is stored on memory somewhere and eventually reaches our CPU above, the CPU can then use its own logic to try to extract even MORE from the code, in a modern CPU this means alot of things will go on in terms of reordering instructions and issuing to multiple functional units in parellel etc.

The thing you need to understand here is that the software world and the hardware one are VERY VERY different, the compiler is more like a translator between languages, translating between a higher level language to assembly, now when it does so it tries to do that as intelligently as possible, but for example in an X86 CPU there is no way it can suggest to the CPU how to issue the instructions because that is not part of the ISA. Also, it does not make sense to talk about how many "issue" a compiler is, it does not issue ANY instruction, it just translates them

In terms of Nehalem the CPU is an X86 CPU, it can run every single piece of X86 code that you or I are running right now on our own computers without any need to recompile. It is not exactly know all of the improvements that will allow it to gain extra speed, but a few ARE known which I will try to relate to you. One is an on-chip memory controller. This is a nice addition for two reasons, it should both increase bandwidth and decrease latency to the memmory. Currently the Core2 CPU talks to memmory over the "front side bus (FSB)" to the northbridge which has the memmory controller. However the FSB is a pretty low bandwidth piece of crap these days, so even if you have two channels of DDR2 you are not getting full bandwidth because the FSB can't handle all that information. Also since the data is taking a detour through the northbridge it is slower than going directly to the CPU, this is "latency" and is very important because if the CPU can't get the data it needs soon enough it just sits there being worthless for several clockcycles when it could be working. Another improvement is simultaneous multi threading (SMT), this means that each CPU core can handle 2 threads at once, the reason this is nice is because while the CPU can issue 4 instruction at a time there are very rarely ever 4 possible instructions from a single thread all ready at the same time. With SMT all those unused functional units can still be working on the second thread. This requires some more hardware of course to decode and control two threads at the same time, but it *should* be worth it in terms of speedup. There is also some changes to the cache structure it would appear whereby there are 3 levels of cache. This means that the L2 cache is smaller. Depending on the type of code and how well this is done it *could* provide a speedup because we would hope that the smaller L2 cache will have a lower latency which can be a HUGE factor in how fast code runs, so long as the L3 isn't too slow this will likely provide some improvement. Also there is talk of how Nehalem will "fully leverage" the 45nm node, this is not fully explained, however it is know that at the 45nm node Intel was able to get surprisingly high drive current on the PMOS transistors, this *could* mean a tweaking of either the ratio of the PMOS to NMOS W/L ratios, or else a tweaking of the logic gates used to compensate for the fact that the PMOS is less of a a slowdown now (see, this is some 100% speculation based on REAL facts that actually make sense, and at least I admit there is no proof of it, but still makes alot more sense than what Nemesis 1 is saying).

LINK:
PMOS drive article
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Because TI go rid of their research group into new semiconductor nodes and we will NEVER see it from them...
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Brown town I understand almost everthing you talked about . But I am stuck on hardware logic . even the way you explained it I don't get it. Lets say in a fix function state. I always thought this was software. Maybe software is really bad word to use. But that circuit performs a function a transitor in and off itself can't do anything other than 0/1

So inorder for the circuit to operate as you said transitors are grouped. In a certain state it does this or that . TO me thats programming. So in effect I view that as software .
So I think I understand how they group and such . But I don't ounderstand the correct terms to discribe that state. If you have a bit. Doesn't that require software to leverage the state of grouped transitors.Bit byte blocks. Lets say I have a group of transitors 10011010 does it not require programming to adress this state. Even if its hardware?

I don't know. But thats how I always viewed it.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Maybe you should google "digital logic", or look up some pictures of what different logic gates look like. There is no outside influence on the transistors in the logic gates I am talking about, by wiring up transistors in different ways you can get the logical functions (NAND/NOR/AND/OR), you can get diagrams of these gates online and then work through yourself how they achieve these functions, it would take a good amount of time for me to try to explain how a MOSFET works and all of digital logic, I mean I have taken at 6-8 different college course explaining how transitors work and are fabricated and how logic circuits are designed, so its not the easiest thing to understand, but if you simplify it enough (act like a transistor is just a swtich) and stuff like that it shouldn't take too long to get the basics down.
 

Lord Banshee

Golden Member
Sep 8, 2004
1,495
0
0
lol, that PMOS article that BrownTown posted. The last paragraph, was my professor, Scott Thompson, last semester for Digital Integrated Circuits class :p i feel special lol.

Also very good read there BrownTown, your knowledge in the EE field always impress me.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Nemesis 1
I assume TI is texas instruments. May I ask why we won't see 45nm from them anytime soon.

Correct. TI = Texas Instruments. They shuttered their entire semiconductor R&D program in spring of 2007. At the time I was one of many working on 32nm (and tying up loose ends on 45nm).

No one was more surprised than SUN who was/is 100% reliant on TI as the sole-source manufacturer for their Sparc and Niagara product lines.

On the flipside I can't imagine a nicer gift to IBM and Intel than TI bowing out of the market. Who doesn't like less competition?

Needless to say my exposure to that segment of the industry (the bleeding edge process development stuff) gives me a somewhat unique perspective into what is reasonable or unreasonable to expect from the industry in the coming decade or so.

That Intel is the king comes as no surprise, they have no excuse (resource wise) to be anything but the best there can be. What has always amazed me is just how well AMD has done with their vastly less resources...both design and process wise. You can always do more when you have more, but to do more (or even just the same, or even just slightly less) when operating with much much less is rather special. No one expects the Kenyan tobagan team to win gold at the 2010 olympics in Vancouver, but if they win silver or bronze I think folks would be absolutely astounded.

At any rate if you are expecting great things to happen in your lifetime then you are best off expecting them to come from Intel before any other company. That's just common sense in aligning expectations with resources. If it was 1950 and you wanted to see a man on the moon before you died then you were best off expecting this to happen at the hands of the well resourced USA or USSR rather than deluding yourself into thinking it would be done first by Canada.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Brown Town I understand that Fixed function logic has no outside programming , changes its function. But isn't logic circuit a mini. programm.

Befor I became A nurse . I was a maintenance tech . I have every certifacation you can get.
So I had to reprogramm more than one E-prom in that time period. I understand that burning a E-Prom is a lot differant than what were discussing . But This maybe why its so hard for me to grasp what your saying. We always wrote as E-Prom . I guess so no misunderstanding I should write it as EEPROM


I always get Eproms and eeproms mixed up . Big differance for me is how there programmed . I think Eproms has to be burned and eeproms can be changed elecitricly . I could have that backwards as I have always mixed them up.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
In eproms and eeproms fuses or floating gate transistors (maybe other things too I don't want to look it all up) are used to store data. In terms of the things we are talking about (a CPU) this is not something that are used. The sort of thing you appear to be describing would be something like a CPLD or FPGA implementation of a "soft core" CPU, this is not how a ASIC CPU is manufactured, instead all the connection are hardwired and can NEVER be changed. In terms of the control of a CPU it is all done via the pins. The instructions are all "codes" which the decoders will convert into different control lines on the chip which control different parts. So the instruction might contain codes for operand registers and the destination register as well as a code for the instruction you want etc (X86 is pretty complicated and I have never worked with it, but you get the idea). The only way the software controls the hardware is through those codes. That is the point people here are trying to make concerning your multiple posts about the Elbrus compiler, the compiler can't see into the CPU the way you suggest. In a different ISA you *could* put alot more control from the complier which is the idea behind EPIC in that the compiler does alot of the control logic instead of the CPU. Obviously this might be nice for something like an 80 core CPU because you get all that hardware off the chip so the cores are much smaller, but the complier is not gonna be as good as the hardware. Anyways, so maybe for the terascale project or Larabee the compiler would be a HUGE factor in performance and maybe something like this Elbrus compiler technology could be used, but the point is you need to completely redesign to CPU in order to do this, and Nehalem is not completely redesigned. Nehalem is an EVOLUTIONARY step over Core 2. Larabee et al. are not, but at the same time as yet they are not something we know much of anything about and there is no good way for us to quantify their performance, also given how different they are and the lack of knowledge Intel is likely to supply us with anytime soon we don't even have and real understanding of this chip.

NOTE: I don't really know a single thing about Larabee and know EPIC only in theory, I have no knowledge of Itanium, so there may (probably are) some errors in that part, but thats how it is to the best of my knowledge, myabe someone who knows about Itanium can fill you in on that more, definitely WIKI can if you really care.

EDIT: oh yeah, as for raytraced graphics, it would be possible to do that now to a pretty good extent if a raytracing processor were developed. Right now people use CPUs or maybe GPUs to do those calculations and niether of these is desinged for this task. IT would be like using software graphics acceleration on a CPU instead of a hardware GPU, obviously the performance is FAR less. Same to for raytracing, if this were done in hardware instead of software it could be speed up considerably.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
They shuttered their entire semiconductor R&D program in spring of 2007. At the time I was one of many working on 32nm (and tying up loose ends on 45nm).

Crap, sorry to hear that.

What did you do? Are you still with them in some capacity?
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
yes i understand were not talking apples to apples. Cpu is differant animal than memory devices. It was show why I don't Understand the termanology hardware logic.

I don't believe Nehalem will be anything other than an x86 processor.


I believe Larrabbee will have A super compiler. That transalates in real time X86 code.

I also believe the compiler in Neleham will leverage larrabbe. Beings how the man who did the majority of compiler work on Elbrus compiler works for intel . And in fact is the head man on intel compilers . This makes sense.


SO basicly is what I believe is this.

If you took a larrabbee PCI-E card and Run it on an amd machine it will either not work or its performance will be greatly deminished. Compared to an Intel system . Because of the compiler on the intel Cpu . I think Intel when using larrabee as a gpu . Is going to use alot of cpu resources Wasteing nothing
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Phynaz
They shuttered their entire semiconductor R&D program in spring of 2007. At the time I was one of many working on 32nm (and tying up loose ends on 45nm).

Crap, sorry to hear that.

What did you do? Are you still with them in some capacity?

I was a process development engineer working on both BEOL and FEOL processes. Hik/MG for instance.

They laid off some 95% of the R&D people as well as closing the Kilby fab (after Jack Kilby, co-inventor of the IC as well as recipient of the noble prize in physics, great guy, really enjoyed talking with him before he passed away) and laying of 95% of those people as well.

In the aftermath they kept six process development guys to optimize their leading edge Analog processes. 130nm design rule stuff, truly ancient and totally uninspiring work for folks with our experience. I was one of the "chosen six" for a brief period of time before I quit to start my own business. It was a nice paycheck and they expected virtually nothing of us, but without a challenge to inspire you to show up for work it got kind of old showing up for lunch and then leaving to take an afternoon nap.

So I am no longer in the industry so to speak. Now I am merely just a geek enthusiast consumer.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: Nemesis 1
yes i understand were not talking apples to apples. Cpu is differant animal than memory devices. It was show why I don't Understand the termanology hardware logic.

I don't believe Nehalem will be anything other than an x86 processor.


I believe Larrabbee will have A super compiler. That transalates in real time X86 code.

I also believe the compiler in Neleham will leverage larrabbe. Beings how the man who did the majority of compiler work on Elbrus compiler works for intel . And in fact is the head man on intel compilers . This makes sense.


SO basicly is what I believe is this.

If you took a larrabbee PCI-E card and Run it on an amd machine it will either not work or its performance will be greatly deminished. Compared to an Intel system . Because of the compiler on the intel Cpu . I think Intel when using larrabee as a gpu . Is going to use alot of cpu resources Wasteing nothing

They certainly have the resources and capability to accomplish what you are outlining here, but is this really their strategy and does it garner any priority within the company to bring results to the market place. We have seen Intel release announcements about plenty of their skunkworks stuff (photonics, bubble-memory, etc) only later watch them spin it off or ramp it down to nothing. (that is how skunkworks work BTW, am not knocking the reality of the system that not everything becomes a product)

So will tera-scale come to the market as Larrabee or Gesher? Its possible, but probable?
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Originally posted by: Nemesis 1
If you took a larrabbee PCI-E card and Run it on an amd machine it will either not work or its performance will be greatly deminished. Compared to an Intel system . Because of the compiler on the intel Cpu . I think Intel when using larrabee as a gpu . Is going to use alot of cpu resources Wasteing nothing
That doesn't make a whole lot of sense. Obviously, the compiler has to run on a CPU, most likely an x86-compatible CPU. Which AMD also produces. So theres no reason why it wouldn't work with an AMD CPU as well.

I think your rose-colored glasses are starting to make you blind.