By the way if anyone has clever ideas about how to move 80TB of data more virtually, or wants an exciting unpaid internship at Pinboard ops, I'm all ears. Last time I did a full off-site backup I was chased out of California by fire, this time by ice.
-
Show this thread
-
Replying to @Pinboard
depends entirely on what part of the system it is hard to change. If your database/datastore doesn't have a replication option that is easy, I'd look at putting your primary local backups on a server with zfs, and pushing incrementals to the remote with zfs send
1 reply 0 retweets 1 like -
Replying to @Luke_s_Crawford @Pinboard
Easy to script, easy to send just the changed blocks for the day. zfs send can be tunneled over ssh, too, so setup and authentication is also easy. I mean, the only gotcha is that a snapshot is like yanking one drive out of a mirror and using that for your backup
1 reply 0 retweets 1 like -
Replying to @Luke_s_Crawford @Pinboard
it doesn't by default quiesce anything. but you could put a 'flush tables with read lock..' or whatever your equivalent is before taking the snapshot. Also, if you are backing up from your local backup disk to a remote backup disk, this isn't an issue.
1 reply 0 retweets 1 like -
Replying to @Luke_s_Crawford @Pinboard
It's better than just rsync 'cause it doesn't need to touch every file; it just looks at changed blocks, so it's usually way faster, and causes way less disk read load vs. rsync.
1 reply 0 retweets 1 like -
Replying to @Luke_s_Crawford
The really big backups I do (the ones I have to drive around) are of archived web pages, not database anything. Just a few thousand tgz files per user
2 replies 0 retweets 1 like -
Replying to @Pinboard
well, I mean, how much do they change every day? that's the idea; you maybe drive the first run, but then you have a daily replication that only moves over the changes. I mean, in the simple case, that's just you drive the data to get it started, then have a nightly rsync.
1 reply 0 retweets 0 likes -
Replying to @Luke_s_Crawford @Pinboard
rsync has a per-file overhead (you can lower that overhead by setting it to only checksum the files with timestamps that have changed, then it's just a stat per file that hasn't changed) That is probably the most your style.
1 reply 0 retweets 0 likes -
Replying to @Luke_s_Crawford @Pinboard
(note, for the files with newer timestamps, it does a checksum of the file on both ends, which eats a lot of disk bandwidth but not very much network bandwidth if the files are the same data.)
1 reply 0 retweets 0 likes -
Replying to @Luke_s_Crawford @Pinboard
rsync also has a handy option to limit throughput, so... like if you don't want to drive the first run, you just set --bwlimit= to something small enough it doesn't hurt you and... I mean 80TiB just isn't what it used to be. even if you limit it to 10mbps, that's like a day.
2 replies 0 retweets 0 likes
Yeah, rsync and I are besties. And I'm going to try your approach now that I have uncapped residential internet. But I think it takes ten days to transfer a terabyte at 10Mbps, not one day
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.