Does anyone know what happens with a compressed file (specifically zstandard) if there is a bit flip in the file? Does it cause garbage decompression after that point? How tolerant is Zstandard with bit-rot? Might be worth exploring considering I have 250 TB compressed now.
Conversation
I have multiple copies currently, but this is something I'm going to need to eventually figure out for long-term archival.
1
1
Replying to
As I recall: libz is not fault tolerant. A single corruption breaks the stream at that point. If the corruption is 40% into the file, then you can decode the first 40% and the rest is corrupt. libbz2 and has the same issue.
1
Replying to
So if the bit rot wasn't severe, you might be able to detect garbage if you know what type of data to expect (JSON) and could start flipping bits around the corruption as a poor man's error correction.
Is PAR2 still the best for this sort of thing?
1
I don't know if you'd possibly run into a situation where the JSON was syntactically correct but every a is now a b or something silly like that. I guess if you had enough data you might detect that. Might be worth experimenting.
1
Replying to
These days, I wouldn't expect a bit-flip error. I don't even see that anymore with streaming network data. (Many lower layer protocols have ECC so upper layers don't have errors.) But a corrupt sector (e.g., 512 sequential bytes that are corrupted) is a serious concern.
1
1
Replying to
Testing PAR2 now -- it's actually faster than I thought. It's been around since what? The BBS days?

