Yup, everyone wants this, although I think an almost-assembly like CUDA PTX might enable evolution better. The hard part is how to start a project which leads to all OSs and GPU manufacturers coordinating. I think Apple is best integrated for it, but Google much more likely to.
-
-
The thing is to get the real benefits out of the hardware your going to need to do hardware specific stuff. And all the hardware can't just be the same due to patents etc.
1 reply 0 retweets 0 likes -
I don't want one ISA. I mean that's fine if it can happen, but I don't care. I just want an ISA per vendor. There is no reason we can't have this right now, because they all basically do - we just need people to stop with the API layers and make OSes load GPU ISAs like CPU.
1 reply 0 retweets 3 likes -
You mean not doing final compilation at runtime? That has impacts on performance
1 reply 0 retweets 0 likes -
So does doing final compilation at runtime.
1 reply 0 retweets 2 likes -
Sure but one off, if your running at 6 hour simulation that fact that it takes 10 seconds longer to start is not a big factor if that makes it run 2% faster.
1 reply 0 retweets 0 likes -
Or it makes it run 2% slower, because there's a CRC mismatch and it runs the wrong code path (AMD), or its a different version of the driver that you optimized for and they broke something (everybody), etc., etc. Stable ISAs have their own significant perf advantages.
1 reply 0 retweets 1 like -
But that requires hardware vendors to either creat bloated instruction sets or add a instruction set decoder like modern cpus that converts it into the hardware specific instructions. Both of these have a high transistor cost increasing core size a lot
2 replies 0 retweets 0 likes -
Honestly, that sounds like something you just made up. Taking ARM as an example, since that is a long-lifetime RISC-like ISA which is what a GPU would be worst case, can you show me a modern high-FLOP chip die shot where the _decoder_ is taking a significant portion of the chip?
1 reply 0 retweets 2 likes -
Modern ARM cpus do not run ARM internals they translate it into micro ops.
2 replies 0 retweets 0 likes
So, before we continue this discussion, do you actually know what you're talking about, or not? Like do you know what a front-end decoder is or not? Not trying to be insulting, just asking. Because your last tweet makes no sense at all, and I'd like to know why.
-
-
Yes I do. The key difference between GPU ISAs and most modern CPU ISAs is that the GPUs do not have an additional internal ISA and thus save on not needing the transistors to covert into that internal representation
2 replies 0 retweets 0 likes -
You didn't answer my question. I asked you to show me some evidence that the decoder on ARM takes up significant die space. You said, "Modern ARM cpus do not run ARM internals they translate it into micro ops." That is a nonsense answer. Do you know why, or not?
1 reply 0 retweets 1 like - Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.