AMD, nVidia, and Intel each define their own long-term binary ISA and we compile to them. This is what we've been doing for literally decades on every other platform known to mankind. It's time for GPU vendors to stop getting a free pass.
-
-
Instead, for some reason we have these "APIs" that are supposed to make programmer easier, but all they do is multiply the problem, because now you have m APIs times n drivers, so it's an O(m*n) compat problem when it could have just been O(n). It's insane and absurd.
2 replies 1 retweet 6 likes -
Replying to @cmuratori @Jonathan_Blow
Yup, everyone wants this, although I think an almost-assembly like CUDA PTX might enable evolution better. The hard part is how to start a project which leads to all OSs and GPU manufacturers coordinating. I think Apple is best integrated for it, but Google much more likely to.
1 reply 0 retweets 0 likes -
The thing is to get the real benefits out of the hardware your going to need to do hardware specific stuff. And all the hardware can't just be the same due to patents etc.
1 reply 0 retweets 0 likes -
I don't want one ISA. I mean that's fine if it can happen, but I don't care. I just want an ISA per vendor. There is no reason we can't have this right now, because they all basically do - we just need people to stop with the API layers and make OSes load GPU ISAs like CPU.
1 reply 0 retweets 3 likes -
You mean not doing final compilation at runtime? That has impacts on performance
1 reply 0 retweets 0 likes -
So does doing final compilation at runtime.
1 reply 0 retweets 2 likes -
Sure but one off, if your running at 6 hour simulation that fact that it takes 10 seconds longer to start is not a big factor if that makes it run 2% faster.
1 reply 0 retweets 0 likes -
Or it makes it run 2% slower, because there's a CRC mismatch and it runs the wrong code path (AMD), or its a different version of the driver that you optimized for and they broke something (everybody), etc., etc. Stable ISAs have their own significant perf advantages.
1 reply 0 retweets 1 like -
But that requires hardware vendors to either creat bloated instruction sets or add a instruction set decoder like modern cpus that converts it into the hardware specific instructions. Both of these have a high transistor cost increasing core size a lot
2 replies 0 retweets 0 likes
Honestly, that sounds like something you just made up. Taking ARM as an example, since that is a long-lifetime RISC-like ISA which is what a GPU would be worst case, can you show me a modern high-FLOP chip die shot where the _decoder_ is taking a significant portion of the chip?
-
-
Modern ARM cpus do not run ARM internals they translate it into micro ops.
2 replies 0 retweets 0 likes -
Replying to @hishnash @cmuratori and
https://en.m.wikipedia.org/wiki/Micro-operation … any cpu that does branch prediction etc will be doing this.
0 replies 0 retweets 0 likes
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.