You know you're in for a wild ride when you pipe "gzip -cd file | wc -l" and it takes over an hour to tell you how much data there is.
Conversation
Replying to
You can "cheat" and get an estimate with `head -n 10000 data.file > sample.file; ll -h sample.file`. should tell you size on disk for 10k lines, then divide the size of the data by the sample to estimate lines.
1
2
Replying to
That's really awesome -- so you're basically finding the average line length and then using that to guestimate the total lines. Thanks!
Replying to
Yup! It assumes that the first however many lines are representative. Often that's true, but now always. Bigger samples help too, but if it geyt big enough, your sample is just the whole dataset and you save nothing haha.
1
1

