There's partial documentation on this here:
source.android.com/devices/tech/d
For example, I use the near full AOT compilation mode ('speed') without JIT or profiling:
github.com/GrapheneOS/pla
Full AOT compilation mode (disabling heuristics for interpreting cold code) is 'everything'.
Conversation
They've also done some interesting optimizations, most of which I need to disable for my work. You may know that it spawns apps by forking from a Zygote acting as a template with the common classes / libraries already loaded and initialized, a preloaded OpenGL context and so on.
1
They also have some weird optimizations like shared RELRO sections and pre-generating the heaps for libraries, not just the code. I think the way it works is they load them up in a deterministic environment and then write out the heap in a way that's quick to verify on boot.
1
That sounds a lot like what Darwin does with the shared cache, to prelink the system dylibs into one image, pre-bind ObjC and Swift runtime data structures, etc., though we're starting from already native code
1
2
The shared RELRO thing is for native libraries / executables, specifically Chromium, since the library gets mapped in all of the Chromium renderer sandbox processes which are also spawned for every app using the WebView. There's actually a separate WebView sandbox Zygote now.
1
I have to drop most of this in order to have proper per-app ASLR and other probabilistic mitigations rather than shared bases and other secrets. It makes a big difference for app spawning time and also memory usage, since a lot of the initial heap in the Zygote remains the same.
1
Can see some of how they take advantage of the Zygote here: android.googlesource.com/platform/frame. This is run a single time on boot for each supported architecture.
The image classes section in source.android.com/devices/tech/d explains that bit. For their use it only really optimizes Zygote start-up.
1
In my past work, which I still need to port forward, I just disabled all of this and used on-demand loading. It definitely makes a noticeable difference though. There's a 200-300ms or so delay like launching a desktop app instead of everything launching in a couple frames.
1
The JIT compiler is also way different than something like the Java runtime JIT compiler for reasons tied to that too. GC is also a lot different, with priorities a lot more like the Go GC. I'm sure people using Java for desktop apps would love to have a desktop version of ART.
1
1
They tried to use LLVM for AOT compilation and maybe even JIT compilation at one point. It never shipped. They have a homegrown optimizing compiler instead: android.googlesource.com/platform/art/+. So it goes Java bytecode -> dex -> odex (what is shipped) -> { interpreter, SSA IR -> native }.
1
1
It's really so much better having the specialized compilation stack, despite it lacking so much of the optimization work in LLVM. If you don't care about compilation time, you can use higher level IR over LLVM, but if you JIT, compile-time matters a lot even with many layers.
It's obviously a nice ideal to have the enormous amount of work shared between languages, but it really breaks down with JIT. I think the WebKit move to LLVM and then away from it is very telling. LLVM just takes way too long and takes too many resources (hurting battery life).
1
Thank you for laying this out. Funny that ART's an open-source system used by so many folks but not many folks (including me) know much about how it works. This makes me curious about the details of the GC.
1
1
Show replies


