Search results

  1. N

    Discussion Apple Silicon SoC thread

    I used to think this, until I investigated closely the exact hardware present in both. Now I think "unification GPU and NPU" is something nv will push (for obvious reasons) ... right up until they release their separate NPU... A substantial part of the reason for a GPU, and then an NPU, is the...
  2. N

    Discussion Apple Silicon SoC thread

    This is a business decision, not a technical decision. It may stay the same indefinitely, implemented as good enough for human factors needs (in the same way that no-one expects a Mac Pro to come bundled with three keyboards), so Apple ships an ANE that matches what they expect for inference...
  3. N

    Discussion Apple Silicon SoC thread

    That's to multiple devices. I believe the Blackwell chip-to-chip link is 1.8TB/s so still slightly behind Apple. (Of course to be fair we know nvLink scales, in a way that we believe is true for UltraFusion but have not actually seen; AND nvLink can cover longer distances.)
  4. N

    Discussion Apple Silicon SoC thread

    So then what are we fighting about? I reacted to the claim "Isn’t autovectorization still pretty shoddy for SVE?" by saying "no it's not pretty shoddy, for any reasonable definition of the term, and here's my evidence for why". And was piled on for doing so. But it seems that you AGREE with...
  5. N

    Discussion Apple Silicon SoC thread

    You can read the full set of tweets here: and assume I am wrong. OR you can actually understand what I am saying, which is equivalent to what Chris Lattner and many others are saying: you can only do so much if you are forced to use C's low-level abstractions (in particular specification of...
  6. N

    Discussion Apple Silicon SoC thread

    I'm curious. Smart people at LLVM, for example, have been working on Linalg for 4+ years. https://mlir.llvm.org/docs/Dialects/Linalg/ What do you think drives them? What do you think their endgame is? Likewise for SVE and SME. You expect the outcome hoped for ten years from now is that...
  7. N

    Discussion Apple Silicon SoC thread

    Like I said, now we are getting into semantics about what counts as "pretty shoddy". I'm frustrated that people (frequently the same people) get excited about some chip being able to boost by 100MHz, but still insist that a free boost of their code by 5% or so from the compiler is not...
  8. N

    Discussion Apple Silicon SoC thread

    I JUST GAVE a reply answering exactly that question. If people refuse to look at the references the first time, why would I bother to answer again? If you're interested in scoring debating points around the precise meaning of "pretty shoddy", well go find someone to fight with. Is 2/3 or so...
  9. N

    Discussion Apple Silicon SoC thread

    The ARM blog doesn't think so. If you go through their annual changes to LLVM and GCC, every year they call out some big change in one of the SPEC benchmarks enabled by some new vectorization, though each year it tends to be a different function. eg...
  10. N

    Discussion Apple Silicon SoC thread

    What happens if you allow SME? In PRINCIPLE LLVM should - detect loops that look like matrix multiples or similar (and also appropriate long vector loops) - map them to linalg operations - which should then be lowered to SME or SSVE if the compiler has been given permission to do so The...
  11. N

    Discussion Apple Silicon SoC thread

    OK, so OK. so your concern is with scoring debating points, not with either - understanding hardware or - understanding how to write OPTIMAL code for that hardware Good to know, going forward. It's minds like this that answer the question "Why doesn't Apple tell us that they have added SME...
  12. N

    Discussion Apple Silicon SoC thread

    We have a second confirmation of the 250GB/s SSVE results here: https://forums.macrumors.com/threads/m4-chip-generation-speculation-megathread-merged.2393843/page-22?post=33148171#post-33148171 We have my thoughts on the issue here...
  13. N

    Discussion Apple Silicon SoC thread

    Dude, wait for the people who know what's going before making ideologically based comments. The initial SSVE results are in fact way too low. Accumulating to a Z register rather than a ZA register is apparently allowed (terrible footgun by ARM there, allowing that – how long till the...
  14. N

    Discussion Apple Silicon SoC thread

    What evidence do we have for this (increased L1 latency)? BTW if the pipeline grows longer, that will NOT show up in "per instruction" latencies, only in branch misprediction cost (which is EXTREMELY difficult to measure correctly on an Apple-level chip, I haven't seen any good analyses). If...
  15. N

    Discussion Apple Silicon SoC thread

    I think SOMETHING like that occurred but (controversial claim), different ordering, and so there was a falling out between ARM and Apple over details. If we look at SVE, the spec seems to have, uh, a troubled history. SVE, various obvious problems so SVE2, then a constant stream of minor...
  16. N

    Discussion Apple Silicon SoC thread

    My finances do not allow me to buy every shiny new Apple device that appears :). And I'm leaving the low level testing game to younger folk. I have other projects I need to work on. My hope is that over the next few years at least a few such people will take up the challenge. All the students...
  17. N

    Discussion Apple Silicon SoC thread

    There are two ways to push frequency higher. Run the individual transistors faster, or cut the pipeline into more stages that can each run faster because they do less sequential work in each stage. My GUESS (given the power numbers) is that Apple has been concentrating on the second rather...
  18. N

    Discussion Apple Silicon SoC thread

    Supposedly TSMC has Arrow Lake GPU commitments to Intel using N3B. So in terms of that, I assume it will persist. (And Intel is probably not agile enough to easily move to N3E the way Apple can, especially since they NEED Arrow Lake to ship on schedule, otherwise their whole "4 Processes in 5...
  19. N

    Discussion Apple Silicon SoC thread

    I understand the logic that SME might be present because of the header discovery. I don't fully understand the logic that it's speeding up Object Discovery. Here's my issue: how does Object Discovery execute its neural nets? I THOUGHT that GB6 CPU, deliberately, executes all such code (and...
  20. N

    Question Geekbench 6 released and calibrated against Core i7-12700

    Uhh, wot? So your contention is that most people were buying GB5 to engage in dick-measuring, and are upset that it's now targeted at useful information rather than dick-measuring? OK...
  21. N

    Discussion Apple Silicon SoC thread

    Here's a nicer (or at least richer context) version of the same GB5 result: https://browser.geekbench.com/v5/cpu/compare/19959697?baseline=19960927
  22. N

    Discussion Apple Silicon SoC thread

    You might want to add Jetstream2 to your list... If we believe preliminary numbers, there has been a MASSIVE improvement in JS performance...
  23. N

    Discussion Intel current and future Lakes & Rapids thread

    You can for Wolfram (company that makes Mathematica). Their design meetings are open and available on YouTube and have been for at least five years or more. Mathematica probably has the best bugs to complexity ratio of any software on earth (at least that I know of; maybe avionics is better –...
  24. N

    Question Alder Lake - Official Thread

    That's a wrong (or at least extremely limited) presumption. You can compare the two on git: https://github.com/videolan/x265/tree/master/source/common/aarch64 https://github.com/videolan/x265/tree/master/source/common/arm vs https://github.com/videolan/x265/tree/master/source/common/x86 Unless...
  25. N

    Discussion Apple Silicon SoC thread

    Test. Are article comments down? --- Well forum is working. I guess we wait and see if they are fixed tomorrow.
  26. N

    Discussion Microarchitecture Comparison Chart

    The basic numbers are nice to have, but as I keep saying even more important are the ALGORITHMS used by the CPU. What are the branch prediction algorithms, the cache replacement algorithms. Something as basic is: is a cache maintained of structures encountered during a page walk, and if so of...
  27. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    (a) You assume people read these threads in a particular order and so know what others have said in reply. That's a bad assumption. There are many ways that you land up thrown into the middle of a thread. One common one is looking at one's comments that were replied to. Another is starting at...
  28. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    Just as a future hint — it doesn’t come across as very impressive when your complaint that someone has hijacked a thread gets the topic of the thread wrong... The CPU is called Altra not Altera. I’ll ignore the fact that you clearly didn’t understand my comment, since one thing the internet...
  29. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    What are you talking about??? No "SVE2 is locked-up behind ARMv9" exists anywhere except in your imagination. SVE and SVE2 are part of ARMv8, have been since the day they were announced.
  30. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    We now know a *lot* more than a few days ago: and the whole successor threads
  31. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    I'd put it differently. People talk about diversity in CPUs as though that's some sort of wonderful flower that everybody wants -- OEMs want to support 5 different CPUs, customers want to own seven different OS's running on 8 different cores. Utter nonsense! Diversity is a massive expensive...
  32. N

    Discussion Apple Silicon SoC thread

    Yes, $40 for A14 (just the SoC, not including the DRAM also on the package) matches other iPhone BOM estimates.
  33. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    What EXACTLY are you wondering about? There is no ARM 256bit SIMD. There is SVE/2, and you don't have to wonder about it, there's plenty of documentation available on the internet...
  34. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    Please, for the love of god, stop talking about "128bit simd" like that is some sort of monolithic thing across all ARM. There is ZERO reason why multiple SIMD units implemented on a very wide machine should perform any worse than fewer wider SIMD units, and this is in fact exactly what we see...
  35. N

    News Ampere Altra Launched with 80 Arm Cores for the Cloud(Performance Estimates)

    The reason for "bothering" is an expectation that ARM will get faster more rapidly than AMD. Which means that the large companies that aren't Amazon (and Apple?...?), if they have any sense, will be buying a few today to start preparing for their large scale transitions over the next few years.
  36. N

    Discussion Speculation: The Rise of RISC-V

    No. If you rely on a BS benchmark, which you then cannot even calculate close to correctly you have UTTERLY squandered your credibility. (As have EE Times, not that they had much to begin with.) Next question.
  37. N

    Discussion Apple Silicon SoC thread

    No it is not! It has to do with the ordering of stores. Suppose I have a CPU that executes code store rA to addrA store rB to addrB These instructions can be executed out of order depending on the order in which their operands rA, rB, addrA, addrB become available. But effective execution on...
  38. N

    Discussion Apple Silicon SoC thread

    The very fact that you can say such a thing "If it's doing work, it should be counted!" shows how UTTERLY clueless you are. Quick question. How many CPUs do you think exist on an M1? 8? 15? 50? The same is somewhat true (though less extreme) for an Intel chip. No-one bothers counting how many M0...
  39. N

    Discussion Apple Silicon SoC thread

    Well this is the difference between people whose goal is to understand technology and people whose goal isredacted Inappropriate language for the tech forums. esquared Anandtech Forum Director
  40. N

    Discussion Apple Silicon SoC thread

    Apple is comfortable with training on their devices right now... https://blog.tensorflow.org/2020/11/accelerating-tensorflow-performance-on-mac.html Sure it's not at nV hundreds of watts level - yet... It is likely that at least some of the training is being done using AMX on the CPU, and this...