I'm not sure about the specifics of SIMD, but I think there are levels of SIMD support for different processors? You could that support as part of the API, and ensure that certain properties have been checked for before calling certain ops.
Conversation
I already have adequate solutions for that, actually, and a compiler is unlikely to place the check patterns correctly because that depends on out-of-band information, so it is fine to insist the programmer do so.
You see, in today's world you need to account for code mobility.
1
1
That, or you need to simply make it UB to move from machine with higher feature levels to a lower feature level.
But it would be totally unacceptable to make the check before every call in to a SIMD function. It's enough to make the choice based on entering the right scope.
1
1
I'm not talking about checking on every call, I'm talking about ensuring that you a calling in a context where the check has previously been made 😅
2
1
Current compiler analyses and programming idioms really are good enough tho', if it can't actually magically find and set the right CPUID checks at exactly the right places then why bother /s
and it's actually worse because each architecture has its own model for SIMD execution.
2
1
Fair enough! As I say I don't know the specifics of this situation (I realise things like SIMD can be fraught the more you look into it), and I'm certainly naive here. I'm curious about the issues and if we can do better, but I understand if you are fine with the status quo. 😅
1
2
well like,
x86-64 has 128, 256, or 512 bit registers.
Arm Neon has 128 bit registers.
Arm SVE2 will have 128 * N bit registers.
RISCVV allows vector registers to have almost any length starting at 32 bits.
And users don't care about this and want their program to "just work".
1
1
They don't really wanna write a lemma about program validity for each different architecture, y'see, and each of these arches has subtly different rules about how these instructions are legalized and so on, but they want the maximum perf the hardware affords anyways.
1
2
Yeah, I can see how this stuff is painful… and yeah I'm definitely not advocating for forcing you to write validity lemmas for this stuff – sorry if I sounded like I was! Most of the stuff I'm interested in is a bit more high level and automated (there are limits there though).
2
1
I'm also interested in what the F* folks are doing wrt. generating code for cryptographic stuff (which sometimes involves SIMD I think) in project Everest, but that more on trusted code generators for small DSLs that automate writing F* proofs for specific domains.
2
1
I wish we could do better than that (seems kinda clunky), but I think it's the state of the art right now. And it might still struggle to scale to your use cases, and is probably still not worth your time. PL people still have a lot of work to do 😫

