Here's something I've found really useful in my DeOldify research: Keep a separate huge master set of images created from various sources (eg open images), then use Jupyter notebooks made specifically to generate training datasets from that master and output them elsewhere. 1/
-
Show this thread
-
In the notebooks I do a lot of filtering out of images from the master, depending on the task. You can filter out images for being grayscale when you're doing colorization, for example. Or filtering out for a min resolution, etc. I've found this helps data quality a lot. 2/
1 reply 0 retweets 9 likesShow this thread -
The bigger your master set is, the more aggressive filters you can apply and still get a big enough dataset at the end. This also gives you the freedom to use sloppy but good enough filtering techniques that have a lot of false positives such as blurry image detection. 3/
1 reply 0 retweets 9 likesShow this thread -
For the master you can use a huge and fairly slow/cheap drive as opposed to a fancy ssd/nvme. Save the speedy drive space for the resulting filtered and processed datasets. 4/
3 replies 0 retweets 10 likesShow this thread -
Replying to @citnaj
This sounds a lot like your own image data lake pattern. Raw, transformed, curated, published images. On cheap storage. How do you track the models, versions and what inputs / parameters affected the model? How are outcomes measured / what metrics would you review?
4 replies 0 retweets 0 likes -
Replying to @andrew_sears
One more thing: I've tried W&B but I found I was running into a case of having a whole bunch of data that was both overwhelming and almost entirely meaningless. I think what's more important anyway is development an intuition for what works and what doesn't by -doing-.
2 replies 0 retweets 5 likes -
Replying to @citnaj @andrew_sears
Isn't this a function of what data you choose to log?
2 replies 0 retweets 0 likes -
Replying to @safijari @andrew_sears
Btw...Simbe....know a guy named Mirza who went to Penn State? If so tell him I said hi :)
1 reply 0 retweets 0 likes -
Replying to @citnaj @andrew_sears
That's my boss
. I was the second non-founder he hired. I'll convey your regards.1 reply 0 retweets 1 like
We actually worked together as interns back at Penn State. He was my favorite person there honestly!
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.