As for Rosetta 2, it's good, but I'm still *really* curious how it'll do in the audio domain. We're talking lots of floating point processing with some integer mixed in, written by lots of different teams, some scalar, some vector, *definitely* a lot of it not well optimized.
-
Show this thread
-
And with hard realtime constraints - if the JIT fires off anything substantial in the audio processing thread, you *will* get a dropout - and even if it's not substantial, you'll probably get a pile of priority inversion hazards that will cause inconsistent dropouts.
3 replies 0 retweets 29 likesShow this thread -
So it looks like for day-to-day stuff Mac users can probably be confident that they won't lose much vs. their older Intel Mac under Rosetta 2, and gain in many instances. But I wouldn't put my money on M1+R2 for all workloads yet.
1 reply 0 retweets 23 likesShow this thread -
It'll be interesting to see these performance details worked out in more detail; e.g. people have talked about M1 being way faster at ObjC object management, so presumably it has *way* faster atomics. That matters a lot for some kinds of software, and not at all for others.
2 replies 0 retweets 30 likesShow this thread -
But the question is how, and why - presumably their bus system is tighter than typical x86 ones? I'm looking forward to a deeper dive, and whether AMD/Intel care to improve this in the future.
1 reply 1 retweet 25 likesShow this thread -
Also, remember that Apple cheated with their control over the CPU for Rosetta 2. Getting R2 x86 performance on any other ARM is impossible, due to the memory model mismatch. You have to massively slow down all loads and stores.
3 replies 5 retweets 51 likesShow this thread -
-
Replying to @ohunt
See the next tweet. Apple made the M1 able to switch to x86's consistency model. No other ARM chip can do that.
2 replies 0 retweets 1 like -
Hmmm... no. NVIDIA Denver and Carmel processors (64-bit Tegra K1 and Tegra Xavier) implement sequential consistency as the memory model *for everything*, which is even stronger than x86's. So "no other ARM chip" doesn't apply, and there are others too in server land...
1 reply 0 retweets 8 likes -
Also Arm Cortex cores have much faster barriers than before, took them a while though. (and Qualcomm's SD820 Kryo cores had very fast barriers, but RIP) Please don't use arguments like that, it just hurts credibility on what you write more than anything...
1 reply 0 retweets 4 likes
Okay, I didn't know about Denver and Carmel having that property... but those also aren't ARM chips, they're a proprietary nVidia architecture running "rOsetta" and pretending to be ARM, so it's no surprise they're special :-)
-
-
Fast barriers don't help unless they're practically *free*; you can't really predict what loads/stores of x86 code need to have the properties of the x86 model, so you pretty much have to do it for all of them.
1 reply 0 retweets 2 likes -
Yeah, that's being handled through multiple ways, one of which having chips where uncontended atomics are as cheap as a regular memory access. Change to use ARMv8.1-A atomics in the JIT-generated code because of those reasons was rolled out in Windows 10 since a bit.
1 reply 0 retweets 0 likes - Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.