if we're talkin' strange timelines: ship everything at -O1 and have a JIT infrastructure that can build and drop in the -O3 version using runtime feedback on where dynamic calls land etc. (kinda like .NET NGEN, but starting from a language made for AOT)
Conversation
Android has this kind of compilation stack for code targeting the Android Runtime (ART). Historically they moved from an interpreter to interpreter + JIT to near full AOT compilation (interpreter still used for one-time init and dynamic code) before arriving at the current model.
1
2
The interpreter / JIT compilation mode generates profiles used to identify the hot code for profile-guided AOT compilation in the background. This is aided by updates being performed in the background since the AOT compilation is redone without any wait (source.android.com/devices/tech/o).
1
1
The JIT compiler can do some low-level tricks and optimizations that an AOT compiler cannot, so it still has a purpose once code has been AOT compiled. There are also a lot of options for configuring how this works. Can still use near full AOT compilation or full AOT compilation.
1
2
There's partial documentation on this here:
source.android.com/devices/tech/d
For example, I use the near full AOT compilation mode ('speed') without JIT or profiling:
github.com/GrapheneOS/pla
Full AOT compilation mode (disabling heuristics for interpreting cold code) is 'everything'.
1
They've also done some interesting optimizations, most of which I need to disable for my work. You may know that it spawns apps by forking from a Zygote acting as a template with the common classes / libraries already loaded and initialized, a preloaded OpenGL context and so on.
1
They also have some weird optimizations like shared RELRO sections and pre-generating the heaps for libraries, not just the code. I think the way it works is they load them up in a deterministic environment and then write out the heap in a way that's quick to verify on boot.
1
That sounds a lot like what Darwin does with the shared cache, to prelink the system dylibs into one image, pre-bind ObjC and Swift runtime data structures, etc., though we're starting from already native code
1
2
The shared RELRO thing is for native libraries / executables, specifically Chromium, since the library gets mapped in all of the Chromium renderer sandbox processes which are also spawned for every app using the WebView. There's actually a separate WebView sandbox Zygote now.
1
I have to drop most of this in order to have proper per-app ASLR and other probabilistic mitigations rather than shared bases and other secrets. It makes a big difference for app spawning time and also memory usage, since a lot of the initial heap in the Zygote remains the same.
Can see some of how they take advantage of the Zygote here: android.googlesource.com/platform/frame. This is run a single time on boot for each supported architecture.
The image classes section in source.android.com/devices/tech/d explains that bit. For their use it only really optimizes Zygote start-up.
1
In my past work, which I still need to port forward, I just disabled all of this and used on-demand loading. It definitely makes a noticeable difference though. There's a 200-300ms or so delay like launching a desktop app instead of everything launching in a couple frames.
1
Show replies


