• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Z170 Mobo recommendation?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Let me explain this simply. The CPU can only do one task at a time with software. First Bios instructions loads into the CPU then into memory, then windows load into memory, while doing all this the CPU only works with on task at a time with all things linked to it, the CPU goes round robin. So when it is the PCI E turn being called from by the software, the slots run in order of IRQs within the windows software and only one device communicates with the CPU at a time, then the data is sent to memory and read from the memory, this is all done in ns, then the CPU takes the next instruction from the software.

The only main difference from PCI to PCI-E, is the PCI lanes runs a parallel data set sent to and from the CPU and PCI-E lanes is Serial data set sent to and from the CPU, like SATA Drive communication.

The only thing I did not mention was the south bridge that controls the traffic one data set at a time.

So now you can see that the CPU only handles one task at a time, otherwise there would be a collision in calculations by the CPU.

Let me guess, you read "Computer architecture for dummies", and now you think you understand how the PC works.

Hint: look up "async operation" and "direct memory access".

PS. CPUs these days, have multiple cores. And each core, can have multiple operations "in flight" at once.
Doesn't that boggle your mind. ^_^

Edit: Even busses are multi-tasking / multi-qeued / threaded. Look up SiS's MuToil bus specs, for what was achieved years ago.

ID-Based Transaction Ordering:
Reduces transaction late
ncies in the system
Conventional transaction ordering in PCIe is ve
ry strict (limits transactions bypassing older
transactions) to ensure correctness of the
producer-consumer model. For an example:
Reads can not go around Writes to guarantee
that resulting operations use correct and not
stale data. This requirement has a root in a very simplified and restricted platform model
(from an early era of PC) where there was a sing
le Host CPU coordinating a work of an IO
subsystem that used serialization of interconnect operations. Strictly following this
requirement in modern systems can cause fa
irly long transaction stalls of 100’s of
nanoseconds affecting system performance.
Going forward with multi-core/multi-thread
Hosts connected to more sophisticated/complex
IO devices via PCIe fu
lly split-transaction
protocol, there is an opportuni
ty to optimize flows between
unrelated transaction streams.
For an example: graphics card initiated traffic may not have any direct relation to network
controller traffic. Currently available mechanisms in PCIe to mitigate these restrictions are
based on notion of Virtual Channels as well as on so called Relaxed Ordering transaction
attribute. However, both of these have been
defined from server usage point of view and
carry inherent cost making it less attractive fo
r mainstream use. To a
ddress this deficiency,
PCIe 3.0 defines ID-Based Transaction Orderi
ng mechanism that re
lies on current PCIe
protocol mechanism for differentiating traffic so
urced by different devi
ces (i.e. Requestor ID
and Target ID). New mechanism
enables PCIe devices to asse
rt an attribute flag (on a
transaction basis) that allows relaxation of transaction ordering within PCIe fabric and within
Host subsystem (including cache/memory hier
archy). This mechanism can be effectively
applied to unrelate
d streams within:

Multi-Function device (MFD) / Root Port Direct Connect

Switched Environments

Multiple RC Integrated Endpoints (RCIEs)

http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf


5. PCI Express Architecture
And unlike PCI, where the bus bandwidth was shared among devices, this bandwidth is provided to each device.

http://www.ni.com/white-paper/3767/en/

If the CPU only serviced PCI-E devices round-robin, and there was no concurrency happening with devices and system RAM / cache / etc., then what would the purpose be of providing bus bandwidth to each device in parallel, if parallel operation wasn't possible?

Hint: Your model is wrong.

Edit: To clarify, I'm not calling you a "dummy", just saying that your model description seems so simplistic that it could have been taken from a "for Dummies" book.
 
Last edited:

stateofmind

Senior member
Aug 24, 2012
245
2
76
www.glj.io
So what are your suggestions?

Are there any advantages for, say, a "Z170-PRO-GAMING" over the Z170-K (lower price locally) or the Asrock pro4S?
 

wingman04

Senior member
May 12, 2016
393
12
51
How I acquired voltage readings? I'd seen some people show a need to use a multi-meter. I just examine the software readings. I understand that VID changes with load.

I had everything on stock settings, and then experimented with lowering LLC by taking it off "Auto." By trial and error, watching the peak and drooped voltage during stress, I came to a good idea that these boards give LLC=5 on Auto. That-- by watching the VCore reading -- the VID readout is higher by an increment. Ultimately, if you overclock this and maybe other Z170 boards, and reviewing what others did, you need LLC=5 anyway above 4.5 Ghz -- if all the boards have the same scale. On the Sabertooth, there are 8 LLC levels. My best judgment of it and from what I'd read: you WANT to have some vDroop in the mix, even if it's just 10 mV. I think I'd seen pronouncements either about this particular board or Z170 overclocking in general suggesting you'd want Vcore to be close to the VID. But howsoever that may be so, I could see how going from LLC=1 to 2 . . . etc. would narrow the droop value at load against the unloaded peak.
How do you use VID core voltage dynamically on your ASUS board?
 

BonzaiDuck

Lifer
Jun 30, 2004
16,632
2,027
126
So what are your suggestions?

Are there any advantages for, say, a "Z170-PRO-GAMING" over the Z170-K (lower price locally) or the Asrock pro4S?

There seems to be a plethora of boards marketed as "Gaming" models. I touted my purchase of a Sabertooth Z170 S in another thread, noting I only got "one M.2 NVMe" port. Someone told me I might have had two M.2 ports if I'd bought a Gigabyte Gaming Z170 board offered at the same price.

But I wasn't shopping for multiple M.2 ports. I could see in hindsight the advantage. Not a major loss, given what I want to do with that single port.

You can examine the features offered by this or that Z170 board. You can try and assess prospects for trouble through customer reviews and lab-test reviews.

I still stand by my rule of thumb, which directs me to look for the best phase-power-design (highest phase count). Even that is of limited usefulness, if one considers that the quality of components play just as much a role in overclocking capability as the phase-power-design.

I wouldn't discourage anyone from spending a pile on a top-tier motherboard, but I would encourage them to save money when it makes sense to do so.
 

stateofmind

Senior member
Aug 24, 2012
245
2
76
www.glj.io
There seems to be a plethora of boards marketed as "Gaming" models. I touted my purchase of a Sabertooth Z170 S in another thread, noting I only got "one M.2 NVMe" port. Someone told me I might have had two M.2 ports if I'd bought a Gigabyte Gaming Z170 board offered at the same price.

But I wasn't shopping for multiple M.2 ports. I could see in hindsight the advantage. Not a major loss, given what I want to do with that single port.

You can examine the features offered by this or that Z170 board. You can try and assess prospects for trouble through customer reviews and lab-test reviews.

I still stand by my rule of thumb, which directs me to look for the best phase-power-design (highest phase count). Even that is of limited usefulness, if one considers that the quality of components play just as much a role in overclocking capability as the phase-power-design.

I wouldn't discourage anyone from spending a pile on a top-tier motherboard, but I would encourage them to save money when it makes sense to do so.

1. Thanks. How do you see how many phases one MB has?
2. If there are no features that makes one MB preferred over the other, for me, how can I find out anything about stability/OC abilities?
3. Why one brand and not the other? In my place, the Z170 Gaming Pro is relatively cheap, compared to the Gigabyte for the same price in the US. No clue, ha?
 

wingman04

Senior member
May 12, 2016
393
12
51
Let me guess, you read "Computer architecture for dummies", and now you think you understand how the PC works.

Hint: look up "async operation" and "direct memory access".

PS. CPUs these days, have multiple cores. And each core, can have multiple operations "in flight" at once.
Doesn't that boggle your mind. ^_^

Edit: Even busses are multi-tasking / multi-qeued / threaded. Look up SiS's MuToil bus specs, for what was achieved years ago.



http://www.intel.com/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf




http://www.ni.com/white-paper/3767/en/

If the CPU only serviced PCI-E devices round-robin, and there was no concurrency happening with devices and system RAM / cache / etc., then what would the purpose be of providing bus bandwidth to each device in parallel, if parallel operation wasn't possible?

Hint: Your model is wrong.

Edit: To clarify, I'm not calling you a "dummy", just saying that your model description seems so simplistic that it could have been taken from a "for Dummies" book.
There is calculations going at the same time, also out of order execution at the same time, however the sum is reduced to one output at a time. That is why 2 computers with 2 cores, the output is twice as fast as one computer with 4 cores.

Each component connected to the CPU including memory sends and receives data in a sequential order to the CPU round-robin, PCI sends data parallel, PCI-E sends data Serial.
In a computer, an interrupt request (or IRQ) is a hardware signal sent to the processor that temporarily stops a running program and allows a special program, an interrupt handler, to run instead. https://en.wikipedia.org/wiki/Interrupt_request_(PC_architecture)

Conventional PCI and PCI-X are sometimes called Parallel PCI in order to distinguish them technologically from their more recent successor PCI Express, https://en.wikipedia.org/wiki/Conventional_PCI
PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed serial computer expansion bus standard https://en.wikipedia.org/wiki/PCI_Express

Now you explain how a PCI and PCI-E bus works and the difference, just explain it simply, (Einstein quote) If you can't explain it simply, you don't understand it well enough.

A CPU just finishes one task at a time know mater how many cores it has, put simply there is only one input and output on x86 CPU.
The multi cores help with parallel calculations from work that has to be done, the multi cores don't help with serial calculations.

Of course, when you look under the hood, you discover that a processor can only execute one thing at a time. Everything that goes beyond this simple technical fact is “smoke and mirrors.” For an operating system to support multithreading (or multitasking for that matter), it must implement some mechanism that allows one task or thread to do a little bit of work, and then quickly switch to the next thread or task. Once that thread has been given some processing time, the OS moves on to the next thread, and so on. This switching happens so frequently and so quickly that the illusion of things happening at the same time works quite well. http://www.codemag.com/article/060033

I'm trying to explain simply for you, because folks need to grasp the basics of how a processor works,
computers are simple to understand how they work.
Here is some help.
 

UsandThem

Elite Member
May 4, 2000
16,068
7,383
146
1. Thanks. How do you see how many phases one MB has?
2. If there are no features that makes one MB preferred over the other, for me, how can I find out anything about stability/OC abilities?
3. Why one brand and not the other? In my place, the Z170 Gaming Pro is relatively cheap, compared to the Gigabyte for the same price in the US. No clue, ha?

1. From reviews or on the manufacturer's website.

2. Reviews or from user posts on the various hardware forums, eg just Google the motherboard model number plus "overclocking".

3. Personal preference. I decide what features I want, narrow my choices down based on reviews, and wait for whichever one goes on sale first. ;)

The $130 - $180 (can be lower with rebates or promotions) range z170 motherboards are the sweet spot for most users. They usually are well built, have higher-end audio, and are mostly well equipped. Below that range, they usually start stripping away features (like better audio and USB type-C), and if you go above that price range, you are entering into the 'enthusiast' category. The only way to find out which board works for you, is to read some reviews.
 

wingman04

Senior member
May 12, 2016
393
12
51
So what are your suggestions?

Are there any advantages for, say, a "Z170-PRO-GAMING" over the Z170-K (lower price locally) or the Asrock pro4S?
For a mild overclok 4.5GHz all you need is a budget motherboard for Intel. They all have a 3 year warranty and don't get caught up in phases, that is for overclocking with LN2 not using air or water.

I have a budget Gigabyte Z170 HD3 board that cost $89.00USD OC 4.5GHz and it will last overclocked 5+ years, like all the rest of the OC budget boards I have built in the past.
 

UsandThem

Elite Member
May 4, 2000
16,068
7,383
146
Now you explain how a PCI and PCI-E bus works and the difference, just explain it simply, (Einstein quote) If you can't explain it simply, you don't understand it well enough.

I'm trying to explain simply for you, because folks need to grasp the basics of how a processor works,
computers are simple to understand how they work.
Here is some help.

Let me guess, you read "Computer architecture for dummies", and now you think you understand how the PC works.

Hint: look up "async operation" and "direct memory access".

Hint: Your model is wrong.

Edit: To clarify, I'm not calling you a "dummy", just saying that your model description seems so simplistic that it could have been taken from a "for Dummies" book.

It's a walk-off folks. We've got ourselves a good 'ole fashioned walk-off. ;)

tumblr_mgl35cdULK1rvpwzso1_540.jpg
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
There is calculations going at the same time, also out of order execution at the same time, however the sum is reduced to one output at a time. That is why 2 computers with 2 cores, the output is twice as fast as one computer with 4 cores.
No, it has to do with bottlenecks. Modern PCs (Let's use Skylake as an example), have dual-channel RAM / memory controllers. Each memory controller is 72 bits wide. However, 8 of of those bits is only used for ECC is servers. So the effective net data width is 64-bits, per channel, or 128-bits total.

Now, due to pre-fetching and whatnot, generally, the CPU reads a cache line at a time. I don't know how wide a cache line is for Skylake's L3, but I could look up that info. But generally, it's larger than 128-bits at a time.

Now, as for CPU core multi-processing, each core can be reading or writing data. There are arbiters for each core, and the memory controller / uncore / L3 cache, that control access.

Generally, though, modern quad-cores are going to be bottle-necked if they are reading / writing main memory with all four cores, and not hitting the cache instead. You won't see a simple 2x to 4x speedup, just by reading and writing main memory with two versus four cores. That's way too simplistic a model.

Each component connected to the CPU including memory sends and receives data in a sequential order to the CPU round-robin, PCI sends data parallel, PCI-E sends data Serial.
Now you explain how a PCI and PCI-E bus works and the difference, just explain it simply, (Einstein quote) If you can't explain it simply, you don't understand it well enough.
Yes, PCI-E uses a serial PHY, rather than parallel, for greater bandwidth per pin. But the differences don't stop there. They allow peer-to-peer transfers, async DMA essentially between PCI-E devices, at full speed. Instead of a purely host-driven model. They have a switching fabric between the PCI-E root ports and the CPU, much in the same way a networking switch operates. Your simplistic model doesn't address the new concurrency possible with PCI-E.

Edit: For an analogy, consider the evolution of ethernet networks. Originally (OK, there was "thick ethernet", that was before my time) there was coax ThinNet (10base2?), that was a shared medium, between network nodes, and required terminators on each ethernet segment. That's how PCI worked, effectively - it was a shared bus, with each slot having bus request / grant lines, and a bus arbiter in the system chipset that would act like a traffic cop, and allow whichever bus slot to control the bus. (Edit: In ethernet, access to the shared medium is via the CSMA/CD mechanism. You can Google that for more info.)

But ethernet evolved, and now we have (faster) point-to-point links, rather than a shared medium, and they are switched, connected by a device with a switching fabric, that allows concurrency between nodes.

PCI-E is like that, it's a bunch of faster point-to-point links, connected by a switching fabric, that allows node-to-node (or device-to-device) concurrency.

For an example of this concurrency, look at how AMD implements PCI-E based XDMA Crossfire.

Or do you believe that every byte crossing the PCI-E root ports goes through the CPU, caches, registers, whatever?

A CPU just finishes one task at a time know mater how many cores it has, put simply there is only one input and output on x86 CPU.
I guess it would blow your mind, to find out that Xeon CPUs have MULTIPLE QPI bus connections, for 2S and 4S and even 8S configurations. There is NOT "only one input and output" on an x86 CPU.

Of course, when you look under the hood, you discover that a processor can only execute one thing at a time. Everything that goes beyond this simple technical fact is “smoke and mirrors.”
It's not just "smoke and mirrors". You just don't understand it, and hand-wave that additional complexity and concurrency away.
Edit: You're confusing the Von Neumann-like architectural model, with the actual hardware implementations. According to the architectural model, opcodes in the binary execute and retire in serial fashion. What I've been trying to tell you is, the real-world model is FAR more complex, in what it actually does.

For an operating system to support multithreading (or multitasking for that matter), it must implement some mechanism that allows one task or thread to do a little bit of work, and then quickly switch to the next thread or task. Once that thread has been given some processing time, the OS moves on to the next thread, and so on. This switching happens so frequently and so quickly that the illusion of things happening at the same time works quite well. http://www.codemag.com/article/060033
Yes. Time-slicing of CPU architectural "cores", in software. Which, has nothing to do with hardware concurrency.

I'm trying to explain simply for you, because folks need to grasp the basics of how a processor works,
computers are simple to understand how they work.
Here is some help.
HAHA. Funny. Thanks, though, for the effort.

Edit: My analogy is this: Computer CPUs are like little factories. Your argument seems to be, because the factory only has one driveway leading to the road, that there's really only one person working inside the factory. Which, in my mind, couldn't be more wrong.

Then, there are the questions, would two small factories or one big factory be better? How many assembly lines per factory? How many loading docks do you need? Etc.

Edit: I just noticed. You're confusing IRQ signals with bus-request / grant signals. They are not the same thing.

Edit: You are correct about one thing though - the prior PCI bus, generally did have a "round-robin" mode for the bus arbiter (among possible others), and PCI was primarily host-driven (though devices also could do bus-master DMA, I think in PCI 2.1 or 2.2).
 
Last edited:
  • Like
Reactions: UsandThem

wingman04

Senior member
May 12, 2016
393
12
51
No, it has to do with bottlenecks. Modern PCs (Let's use Skylake as an example), have dual-channel RAM / memory controllers. Each memory controller is 72 bits wide. However, 8 of of those bits is only used for ECC is servers. So the effective net data width is 64-bits, per channel, or 128-bits total.

Now, due to pre-fetching and whatnot, generally, the CPU reads a cache line at a time. I don't know how wide a cache line is for Skylake's L3, but I could look up that info. But generally, it's larger than 128-bits at a time.

Now, as for CPU core multi-processing, each core can be reading or writing data. There are arbiters for each core, and the memory controller / uncore / L3 cache, that control access.

Generally, though, modern quad-cores are going to be bottle-necked if they are reading / writing main memory with all four cores, and not hitting the cache instead. You won't see a simple 2x to 4x speedup, just by reading and writing main memory with two versus four cores. That's way too simplistic a model.


Yes, PCI-E uses a serial PHY, rather than parallel, for greater bandwidth per pin. But the differences don't stop there. They allow peer-to-peer transfers, async DMA essentially between PCI-E devices, at full speed. Instead of a purely host-driven model. They have a switching fabric between the PCI-E root ports and the CPU, much in the same way a networking switch operates. Your simplistic model doesn't address the new concurrency possible with PCI-E.

Edit: For an analogy, consider the evolution of ethernet networks. Originally (OK, there was "thick ethernet", that was before my time) there was coax ThinNet (10base2?), that was a shared medium, between network nodes, and required terminators on each ethernet segment. That's how PCI worked, effectively - it was a shared bus, with each slot having bus request / grant lines, and a bus arbiter in the system chipset that would act like a traffic cop, and allow whichever bus slot to control the bus. (Edit: In ethernet, access to the shared medium is via the CSMA/CD mechanism. You can Google that for more info.)

But ethernet evolved, and now we have (faster) point-to-point links, rather than a shared medium, and they are switched, connected by a device with a switching fabric, that allows concurrency between nodes.

PCI-E is like that, it's a bunch of faster point-to-point links, connected by a switching fabric, that allows node-to-node (or device-to-device) concurrency.

For an example of this concurrency, look at how AMD implements PCI-E based XDMA Crossfire.

Or do you believe that every byte crossing the PCI-E root ports goes through the CPU, caches, registers, whatever?


I guess it would blow your mind, to find out that Xeon CPUs have MULTIPLE QPI bus connections, for 2S and 4S and even 8S configurations. There is NOT "only one input and output" on an x86 CPU.


It's not just "smoke and mirrors". You just don't understand it, and hand-wave that additional complexity and concurrency away.
Edit: You're confusing the Von Neumann-like architectural model, with the actual hardware implementations. According to the architectural model, opcodes in the binary execute and retire in serial fashion. What I've been trying to tell you is, the real-world model is FAR more complex, in what it actually does.


Yes. Time-slicing of CPU architectural "cores", in software. Which, has nothing to do with hardware concurrency.


HAHA. Funny. Thanks, though, for the effort.

Edit: My analogy is this: Computer CPUs are like little factories. Your argument seems to be, because the factory only has one driveway leading to the road, that there's really only one person working inside the factory. Which, in my mind, couldn't be more wrong.

Then, there are the questions, would two small factories or one big factory be better? How many assembly lines per factory? How many loading docks do you need? Etc.

Edit: I just noticed. You're confusing IRQ signals with bus-request / grant signals. They are not the same thing.

Edit: You are correct about one thing though - the prior PCI bus, generally did have a "round-robin" mode for the bus arbiter (among possible others), and PCI was primarily host-driven (though devices also could do bus-master DMA, I think in PCI 2.1 or 2.2).
All of what you said is not my point. My point from the beginning is the CPU does one processing task at a time even though CPUs now have multi cores. That is what I'm trying to explain, it is the basics .

In computer architecture, 64-bit computing is the use of processors that have datapath widths, integer size, and memory address widths of 64 bits (eight octets). Also, 64-bit computer architectures for central processing units (CPUs) and arithmetic logic units (ALUs) are those that are based on processor registers, address buses, or data buses of that size. From the software perspective, 64-bit computing means the use of code with 64-bit virtual memory addresses. https://en.wikipedia.org/wiki/64-bit_computing

AMD Zen microarchitecture details. One input from Micro-op Queue to Integer or FPU, then one output from load/store Queues. Also multi cores cannot write or read from the L3 and system memory at the same time.

attachment.php


Conceptually, the PCI Express bus is a high-speed serial replacement of the older PCI/PCI-X bus.[5] One of the key differences between the PCI Express bus and the older PCI is the bus topology; PCI uses a shared parallel bus architecture, in which the PCI host and all devices share a common set of address, data and control lines. In contrast, PCI Express is based on point-to-point topology, with separate serial links connecting every device to the root complex (host). Due to its shared bus topology, access to the older PCI bus is arbitrated (in the case of multiple masters), and limited to one master at a time, in a single direction. Furthermore, the older PCI clocking scheme limits the bus clock to the slowest peripheral on the bus (regardless of the devices involved in the bus transaction). In contrast, a PCI Express bus link supports full-duplex communication between any two endpoints, with no inherent limitation on concurrent access across multiple endpoints. https://en.wikipedia.org/wiki/PCI_Express

The CPU processes one bus task at a time.

830px-Example_PCI_Express_Topology.svg.png


How Computers Work: Computation (Part II)


So now you can see that a Processor can only process one task at a time.


0,1740,iid=501564,00.asp', '593', '600')
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Due to its shared bus topology, access to the older PCI bus is arbitrated (in the case of multiple masters), and limited to one master at a time, in a single direction. Furthermore, the older PCI clocking scheme limits the bus clock to the slowest peripheral on the bus (regardless of the devices involved in the bus transaction). In contrast, a PCI Express bus link supports full-duplex communication between any two endpoints, with no inherent limitation on concurrent access across multiple endpoints. https://en.wikipedia.org/wiki/PCI_Express

The CPU processes one bus task at a time.

So now you can see that a Processor can only process one task at a time.
So, now you've resorted to quoting WikiPedia, because you can't explain in your own words, like I did, the differences between PCI and PCI-Express. Worse yet, what you quoted, directly contradicts what you stated in your own words right after it. (I bolded the contradiction for you.)

Do you even know what the word "concurrency" means? I think I've brought it up in nearly every post that I made, as an argument about something that you refused to mention, and you constantly say that the CPU and bus can only process one thing at a time. THIS IS WRONG.

So, either your a troll, or a simpleton. Either way, welcome to my ignore list. Not because you're wrong, but because you refuse to learn from being wrong, and accept the truth. Enjoy quoting WikiPedia to yourself.

Edit: Just so you don't think that I don't know what "concurrency" means - it means things happen at the same time.

Edit: Ok, I apologize for calling you a simpleton. I was upset, at seeing you repeatedly spout what I consider to be mis-information. You really need to read up more on this subject, research what kind(s) of concurrency is possible on modern x64 CPUs, and understand that there is a heck of a lot of different transactions happening, AT THE SAME TIME, over the PCI-E bus, the memory controller, the various CPU cores, that themselves are HyperThreaded, etc.

The CPU core-centric one-thing-at-at-a-time model, is just that, a theoretical model. The real world is very different.
 
Last edited:
Feb 25, 2011
16,992
1,621
126
PCI and PCI-E are used to connect expansion cards. The differences and inner workings really only matter to engineers, because everything is PCI-E and that's it. No other choices.

Some use cases benefit from additional PCI-E lanes... sometimes... so Xeon CPUs. Otherwise... it doesn't matter.

So relax and have a beer. Jesus H Christ.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,632
2,027
126
PCI and PCI-E are used to connect expansion cards. The differences and inner workings really only matter to engineers, because everything is PCI-E and that's it. No other choices.

Some use cases benefit from additional PCI-E lanes... sometimes... so Xeon CPUs. Otherwise... it doesn't matter.

So relax and have a beer. Jesus H Christ.

At first I thought I'd started it by posing a question, but that was in another thread. So I'm a bit puzzled over the ruffled feathers concerning parallelism. You only get so many PCI-E lanes with this or that processor. Devices on the board share some IRQs. On some boards, it is possible to enable too many onboard devices at once. It's that simple.

The OP just wanted a recommendation for a Z170 motherboard.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I guess, I have this thing about mis-information. Wingman04 is spouting some very simplistic ideas about PC architecture, that I've tried to correct him on, and he refused to listen, and stop spouting his mis-information.

Such as, PCI-E bus only does one thing at a time, multi-core CPUs, still only process one thing at a time, and since x64 CPUs are "64-bit", wider memory buses do nothing useful that can be utilized / noticed by software running on the CPU, because "it's limited to 64-bit".

For the record, all of those statements are incorrect, when talking about actual modern x86 / x64 multi-core processors of today.

As for my credentials, I use to do system-level programming in DOS, for bit-banging hardware. PCI was around back then, but not PCI-E. But I have read up on PCI-E quite a bit, and feel that I have an adequate seat-of-the-pants understanding of its main capabilities.

Edit: Heck, even the 16-bit ISA bus had a DMA controller. It wasn't used often, but the Sound Blaster 16 could utilize it, to play sound in the background, without constant CPU involvement.

You would set up the buffers in RAM (I think, under 1MB mark), and then program the ISA DMA controller, for the DMA channel that the SB card used (set in an environment variable in the DOS config files), then program the SB DMA controller registers, using PIO, and then enable the DMA start, and then the SB would signal an IRQ when it was finished. Actually, I think that there was more to it, there was some double-buffering involved, to prevent sound glitches.

Ahh, the fun times of DOS. When there was nothing between your software and the hardware that it was controlling.

Edit: Sorry for going spaz-tastic on the n00b, people.

Edit: And apologies to the OP, for the thread-jack.
 
Last edited:

BonzaiDuck

Lifer
Jun 30, 2004
16,632
2,027
126
Well, I was trying to save my fingers by staying out of the thick of it, and if you noticed my metaphor about "ruffled feathers" -- well -- it wasn't exactly your replies I was speaking of. ;-) Still, taken by themselves, certainly long excerpts of published articles could be of interest to the OP, if he has the patience.

I guess, I have this thing about mis-information. Wingman04 is spouting some very simplistic ideas about PC architecture, that I've tried to correct him on, and he refused to listen, and stop spouting his mis-information.

Such as, PCI-E bus only does one thing at a time, multi-core CPUs, still only process one thing at a time, and since x64 CPUs are "64-bit", wider memory buses do nothing useful that can be utilized / noticed by software running on the CPU, because "it's limited to 64-bit".

For the record, all of those statements are incorrect, when talking about actual modern x86 / x64 multi-core processors of today.

As for my credentials, I use to do system-level programming in DOS, for bit-banging hardware. PCI was around back then, but not PCI-E. But I have read up on PCI-E quite a bit, and feel that I have an adequate seat-of-the-pants understanding of its main capabilities.

Edit: Heck, even the 16-bit ISA bus had a DMA controller. It wasn't used often, but the Sound Blaster 16 could utilize it, to play sound in the background, without constant CPU involvement.

You would set up the buffers in RAM (I think, under 1MB mark), and then program the ISA DMA controller, for the DMA channel that the SB card used (set in an environment variable in the DOS config files), then program the SB DMA controller registers, using PIO, and then enable the DMA start, and then the SB would signal an IRQ when it was finished. Actually, I think that there was more to it, there was some double-buffering involved, to prevent sound glitches.

Ahh, the fun times of DOS. When there was nothing between your software and the hardware that it was controlling.

Edit: Sorry for going spaz-tastic on the n00b, people.

Edit: And apologies to the OP, for the thread-jack.
 

wingman04

Senior member
May 12, 2016
393
12
51
I guess, I have this thing about mis-information. Wingman04 is spouting some very simplistic ideas about PC architecture, that I've tried to correct him on, and he refused to listen, and stop spouting his mis-information.

Such as, PCI-E bus only does one thing at a time, multi-core CPUs, still only process one thing at a time, and since x64 CPUs are "64-bit", wider memory buses do nothing useful that can be utilized / noticed by software running on the CPU, because "it's limited to 64-bit".

For the record, all of those statements are incorrect, when talking about actual modern x86 / x64 multi-core processors of today.

As for my credentials, I use to do system-level programming in DOS, for bit-banging hardware. PCI was around back then, but not PCI-E. But I have read up on PCI-E quite a bit, and feel that I have an adequate seat-of-the-pants understanding of its main capabilities.

Edit: Heck, even the 16-bit ISA bus had a DMA controller. It wasn't used often, but the Sound Blaster 16 could utilize it, to play sound in the background, without constant CPU involvement.

You would set up the buffers in RAM (I think, under 1MB mark), and then program the ISA DMA controller, for the DMA channel that the SB card used (set in an environment variable in the DOS config files), then program the SB DMA controller registers, using PIO, and then enable the DMA start, and then the SB would signal an IRQ when it was finished. Actually, I think that there was more to it, there was some double-buffering involved, to prevent sound glitches.

Ahh, the fun times of DOS. When there was nothing between your software and the hardware that it was controlling.

Edit: Sorry for going spaz-tastic on the n00b, people.

Edit: And apologies to the OP, for the thread-jack.

As far as my credentials, I attended college fore computer science.

Now you have attacked me with one word quotes and keep telling me that I'm wrong. So now it is your turn to shine, tell me how a CPU and PCI-E bus works with all the data at the same time? I would like to understand what I missed in college.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Well, first of all, the design philosophy, as I understand it, behind 3GIO (what later became known as PCI-Express), was that they were both moving towards a higher-speed serial interconnect, for a PHY (physical layer), and they were moving towards using networking principles to implement that, rather than a "dumb bus".
If you study the whitepapers and technical details, you'll see that they allow for concurrency, between multiple PCI-E links, and that they use a request-reply protocol, much like a networking stack. That is, you can have multiple transactions, as I understand it, "in flight". Much like TCP/IP packets on a wire. (They do use a physical framing / packetization scheme for data on the wire on PCI-E too.)

Also, the memory-controller of a modern CPU, besides having multiple channels, they also address multiple DRAM pages at a time. DRAM is accessed in units called "pages", and they use a command request-reply mechanism, but entirely host-driven. (The DRAM can't interrupt, nor signal the host, that I know of.) But it doesn't access just one memory location at a time, it sends commands for "open page", "read page / line", "close page", and they interleave these commands, such that multiple pages can be "open" in the DRAM at one time.

Everything is oriented towards multiple transfers, making more efficient usage of cache and cache lines on the CPU. So, if you only need one 64-bit quantity of data, it may well read an entire cache line from DRAM and into the L3 cache.

Likewise, with the CPU cores, the arcthitectural model of x86 is that opcodes execute and retire in sequence, one by one. That's the model. The actual implementation, is pretty-much close to rocket science (well, computer science). It has a unit that analyses the incoming opcode stream, breaks them down into micro-ops, analyses dependencies, and sends the non-dependent micro-ops down various multiple execution "ports" / pipelines in the CPU core, where they are executed, using a pipeline mechanism that does a little bit of the execution every clock. Some micro-ops that are really simple, can probably execute in only a few clocks. (See Agner Fog's opcode timing charts, for approx. instruction latencies.) But the important thing is, in a modern x86 / x64 CPU, there are multiple execution pipes, and multiple steps in the pipelines for micro-ops to execute in. It doesn't just do them "one at a time".

As far as L3 cache and multiple-core access goes, the L3 cache is divided up into "slices", and each slice is connected via a ring-bus (in Haswell and newer? Maybe Sandy Bridge and newer? Please correct me where I'm wrong on this.) I guess each core has a slice, and that's how core-to-core communication works too? I know that on Core2-era CPUs, there were really two separate CPU dies, that communicated with each other's L2 cache, as well as the system chipset, using a shared FSB. (Very in-efficient.) Also, while the system chipsets of the day supported dual-channel RAM, the FSB itself was only 72-bits wide for data, IIRC, so effectively, those CPUs couldn't really make good use of the dual-channel memory. But for modern (with IMC) Intel CPUs, that's no longer an issue.

Edit: Just a quick note here about AMD's dual-channel memory controllers on their AM2/AM2+/AM3/AM3+ CPUs. They have a BIOS option for "ganged" (operates as a single 128-bit channel), or "unganged" (operates as two truly independent 64-bit channels). Unganged can offer lower latency for multi-processing.

I mean, at some point, there's only so many wires going into the CPU, and at that level, unless the signals were RF multiplexed like cable TV / internet is, then there's only one signal on those wires at any given clock.

But that reasoning, is like seeing a factory, with only one road leading to it, and concluding that that factory only makes one product at a time.

Basically, pipelining, multiplexing, and concurrency, are the things that appear to be missing from your models.
 
Last edited:
  • Like
Reactions: UsandThem

wingman04

Senior member
May 12, 2016
393
12
51
I work in a building full of those. Half of 'em barely know how Outlook works.
My quote below was simplified for understanding and it caused controversy.
Motherboard resources is mostly marking hype. PCs can only do one task at a time it is all just smoke and mirrors when it comes to sharing tasks. The PCI-E bus actually runs data in order from one device to the next.
Here is a short video showing input data and output data.