Conversation

TIL rsync's best checksum is md5. This means that if you're conducting a supply chain attack on a system which uses rsync to keep the mirrors in sync, you could pollute the mirrors w/ bad copies that collide w/ md5 and even rsync --checksum wouldn't know they were modified.
4
7
Replying to
It also does the entire thing in advance and it entirely blocks progress on uploading while calculating all the checksums. It's very impractical for syncing incremental changes to a very large overall amount of data. It's a lot more than just not being parallel / async.
1
2
Replying to and
I find rsync very impractical with checksums enabled. For large amounts of data, you need to use -t and rely on the last modified date + size checks to make it happen in a reasonable amount of time. Can use checksum mode as an integrity check but probably not for regular usage.
2
2
Replying to and
It could start syncing changed directory/file structure and files with a mismatched modification date or time right away. For everything else, it could be calculating those checksums in a thread pool. It's not good at actually using the available network, CPU and I/O resources.
2
2
Replying to and
I agree. I had to switch the MidnightBSD portsnap mirror over to using a zfs send/recv because of the rsync overhead. Colin did write a client to fetch them but I didn't know about it when I set it up.
1