Here's something I've found really useful in my DeOldify research: Keep a separate huge master set of images created from various sources (eg open images), then use Jupyter notebooks made specifically to generate training datasets from that master and output them elsewhere. 1/
-
-
Thanks for sharing this approach, very useful.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
This sounds a lot like your own image data lake pattern. Raw, transformed, curated, published images. On cheap storage. How do you track the models, versions and what inputs / parameters affected the model? How are outcomes measured / what metrics would you review?
-
Model versions: Github! Inputs/Parameters/Outcomes: Well documented experiment Jupyter notebooks that I save in a separate backup drive to freeup space along with tensorboard outputs and model checkpoints. Point is I can reproduce with ease and not think too much about it.
- Show replies
New conversation -
-
-
Would a metadata database be helpful here? Store each image's location plus the metadata like grayscale, resolution, etc? I'd imagine if yoh have many images on a slow mechanical hard drive, that this could speed up the filtering quite a bit?
-
Sure that would be a really smart idea actually and I have yet to do it LOL. I'd just suggest to keep it simple and just make a simple local file with perhaps key/value entries. i.e. just treat it as a cache. Then you can just as easily delete it if you want to start fresh.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.