TIL rsync's best checksum is md5. This means that if you're conducting a supply chain attack on a system which uses rsync to keep the mirrors in sync, you could pollute the mirrors w/ bad copies that collide w/ md5 and even rsync --checksum wouldn't know they were modified.
Conversation
Replying to
It also does the entire thing in advance and it entirely blocks progress on uploading while calculating all the checksums. It's very impractical for syncing incremental changes to a very large overall amount of data. It's a lot more than just not being parallel / async.
1
1
2
I find rsync very impractical with checksums enabled. For large amounts of data, you need to use -t and rely on the last modified date + size checks to make it happen in a reasonable amount of time. Can use checksum mode as an integrity check but probably not for regular usage.
2
2
It could start syncing changed directory/file structure and files with a mismatched modification date or time right away. For everything else, it could be calculating those checksums in a thread pool. It's not good at actually using the available network, CPU and I/O resources.
Some people may want to minimize network usage but I'm used to using OVH where bandwidth is entirely unmetered / unlimited for nearly everything. I simply want it to go as fast as possible. I'd rather have it uploading data opportunistically while it's calculating hashes too...
2
2
Replying to
If properly threaded, it really shouldn't take very long to figure out what's out of sync, needing to be sync'd up.
Replying to
Yep, why I miss cvsup, and wish there was a CLI version of it still around (along w/ the magic to make it work over ssh).

