Here's something I've found really useful in my DeOldify research: Keep a separate huge master set of images created from various sources (eg open images), then use Jupyter notebooks made specifically to generate training datasets from that master and output them elsewhere. 1/
-
Show this thread
-
In the notebooks I do a lot of filtering out of images from the master, depending on the task. You can filter out images for being grayscale when you're doing colorization, for example. Or filtering out for a min resolution, etc. I've found this helps data quality a lot. 2/
1 reply 0 retweets 9 likesShow this thread -
The bigger your master set is, the more aggressive filters you can apply and still get a big enough dataset at the end. This also gives you the freedom to use sloppy but good enough filtering techniques that have a lot of false positives such as blurry image detection. 3/
1 reply 0 retweets 9 likesShow this thread -
For the master you can use a huge and fairly slow/cheap drive as opposed to a fancy ssd/nvme. Save the speedy drive space for the resulting filtered and processed datasets. 4/
3 replies 0 retweets 10 likesShow this thread -
Replying to @citnaj
This sounds a lot like your own image data lake pattern. Raw, transformed, curated, published images. On cheap storage. How do you track the models, versions and what inputs / parameters affected the model? How are outcomes measured / what metrics would you review?
4 replies 0 retweets 0 likes -
Replying to @andrew_sears
Metrics: I do whatever an array of metrics that are relevant, to the tasks such as PSNR, SSIM and FID (and one more I don't want to reveal). The first two aren't as sophisticated but they're fast and still helpful. I use all but FID after each epoch.
1 reply 0 retweets 2 likes -
Replying to @citnaj
Thanks, interesting! Do you leverage video frames or streams as a data source for training / testing? Could be interesting to add a feature to invert the color of an image and then colorize/enhance it. For tintype photos.
1 reply 0 retweets 2 likes
I haven't really done much with video since August 2019, and I haven't tried that yet. It's on the radar though!
-
-
Replying to @citnaj
Seems like DeOldify and your enhancing model would work well as a video post-processor. And would probably be interesting watching a video feed improve as the model is trained. https://github.com/osai-ai/tensor-stream … Did this frame-by-frame but never in a realtime stream with an audio channel.
0 replies 0 retweets 2 likesThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.