So Firefox 106 triggered a bug on 1st gen Ryzen processors. We found this by noticing a bunch of nonsensical crashes in the painting code that only happened on these processors.
How did we fix it? By rebuilding the exact same code as 106. The timing differences caused by using a new profile when doing PGO were enough to cause the generated code to be different enough to avoid triggering the processor bug.
How common is it for PGO perturbances to affect stability?
Even having it happen once must terrify anyone thinking of releasing a hotfix, no matter how small the change is.
Can the PGO process be made more deterministic, while keeping the performance benefits and not adding much more complexity? Like maybe measuring instruction counts rather than time, or measuring w/ a more deterministic thread scheduler, or even using PGO data from a previous run.
What actually is the processor bug though? Even if a crash only shows on one processor type my first assumption would be unlucky timing exposing an application bug / UB.