If the goal is to reproduce human thinking (which may be a bad goal) I don’t think multimodal learning is enough. You need a dose of GOFAI because human symbolic representation systems are… symbolic.
But otoh, it’s worth asking what’s a natural “native” language for AIs?
Replying to
Quote Tweet
Replying to @vgr
The future is multimodal models. To me, generative image models are more akin to a visual perceptual module in human brains. To approx human cognition, we need to connect modules just as our brains do.
Work on past-tense verbs from the 80's I'm studying: stanford.edu/~jlmcc/papers/
1
4


