After 3 months of coding weekends, my attempt to create a relocating nonblocking garbage collector in C++ has failed. It works in theory, but in practice the read barrier is pervasive and costly, and the threading invariants are incredibly tricky to maintain.
-
-
What's the usecase for relocation? Fragmentation? Large allocs with serial access?
-
Relocation ensures less fragmentation and higher cache and TLB locality, and can also enable lower-cost freeing than random access memory managers. Yet any purely concurrent scheme requires some pinning, so the benefits of relocation are reduced.
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
Some years ago it became quite clear to me, that GC is not designed for shared-memory multithreading, at least if you want any realtime guarantees. And if you try writing low-garbage/garbage-free code in Java, it becomes way more tedious and restrictive than C.
-
All the leading GCs were designed for a small number of cores or applications where some pauses were acceptable. I’m optimistic a new GC can be developed that scales well to many-core, is non-blocking, and minimizes contention.
-
Sorry, but after messing a bit with JVM and with .NET I don’t see this happening in near future. VMs already appear to be full of hacky opts to have decent perf on the current archs and reduce GC frequency. They seem to become more and more bogged down by all that complexity.
-
Making GC not suck is hard, hence why my efforts mostly gave up on it. Some tricks to make GC less bad can be made explicit and combined with a manual MM general case (say, "new!" means "this object has an automatic lifetime", giving error if the object goes out of scope, ...).
-
As I suspect, existing VM JIT impls already try to pin data to scope(possible stack alloc), then to thread (TLS alloc, simplified GC) before giving up an putting it into the common heap.
-
Essentially, they do. Stack alloc = bump heap, escape analysis here and there, generations, and hacks upon hacks after a lot of testing. An idealized GC is indeed impossibly hard to devise, esp. with the limited tools / primitives provided by OS and CPU.
-
And then there's C++. I did try to sketch one once. Generic, reusable, relocating and a bunch of other properties. It was a mistake to mix these two. But then they were also interdependent... Eh.
-
... But something manageable is still possible if you are to specialize. :)https://github.com/hsutter/gcpp
- 3টি আরও উত্তর
নতুন কথা-বার্তা -
-
-
HW support for read barriers has been proposed, have you looked into that? Might be relevant for you: https://people.eecs.berkeley.edu/~maas/papers/maas-asbd16-hwgc.pdf …. And there is Azul, who abuse the page tables to implement read barriers. I've been wondering for some time why Intel/AMD don't plan to offer sth. in hw.
-
I think of modern CPU architectures like SkyLake as offering hardware support for everything, since they can issue 4-6 instructions per clock and often do. Only problem is the high-cost unspeculated atomics.
-
The authors of the paper I linked claim that their approach allows for speculation, and that marking and relocation are a bad fit for general purpose cores. My understanding of the area is very shallow though, and sadly the paper is not very detailed :)
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
Doesn’t the new .NET background GC allow allocations on non-GC collecting threads to continue for a prespecified amount of allocations while GC is happening? IIRC there should be a 16MB window, at least on workstations...
-
My understanding of .NET: mutator threads can run during some parts of GC but not during the stop-the-world stack scanning and relocation phase. For that, it has to pause all threads.
-
There seems to be some conflicting and unclear information out there... This seems like a good resource https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals … . There’s a distinction between the foreground collection and just background collection. Foreground halts all managed threads
-
Foreground collection is the combination of the background collect (gen 2) and ephemeral one (gen 0 and 1)
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
-
-
For games I think incremental may be better than concurrent, but where you manually invoke the GC once per frame while waiting for the present queue semaphore (or some minimum after presenting if not GPU bound)
ধন্যবাদ। আপনার সময়রেখাকে আরো ভালো করে তুলতে টুইটার এটিকে ব্যবহার করবে। পূর্বাবস্থায়পূর্বাবস্থায়
-
-
-
Thoughts on using an auto reference counting (ARC) system?
ধন্যবাদ। আপনার সময়রেখাকে আরো ভালো করে তুলতে টুইটার এটিকে ব্যবহার করবে। পূর্বাবস্থায়পূর্বাবস্থায়
-
-
-
Curious if you have ideas about
@Jonathan_Blow s idea of "context"?https://youtu.be/uZgbKrDEzAs?t=38m54s … -
forgot to mention Context has "Local Storage" concept https://youtu.be/uZgbKrDEzAs?t=41m18s … interesting alternative to GC'ed type languages
কথা-বার্তা শেষ
নতুন কথা-বার্তা -
লোড হতে বেশ কিছুক্ষণ সময় নিচ্ছে।
টুইটার তার ক্ষমতার বাইরে চলে গেছে বা কোনো সাময়িক সমস্যার সম্মুখীন হয়েছে আবার চেষ্টা করুন বা আরও তথ্যের জন্য টুইটারের স্থিতি দেখুন।