I don’t think it’s solvable within the deep learning class of techniques tbh. Needs an injection of both GOFAI and data on sensory-verbal correlations beyond narrow application-tagged sets like cars or driverless car road images. Needs open-tagged broad image sets to start.
This Tweet was deleted by the Tweet author. Learn more

