I’m already bored of frustratingly vague but visually rich dalle2 stuff…want more precise impoverished stuff
Can someone with access try to generate comics in xkcd style?
Like “joke about sidewalk scooters in xkcd style”
Or anything with very few bits
Conversation
Replying to
Here’s an attempt… I think it tripped on the problem of images that already have text
Quote Tweet
Replying to @slashdottir @vgr and @huggingface
Your wish is my command...
The joke went over my head
1
3
19
Huggingface takes silver with an entry that’s… not in the ballpark visually either
Quote Tweet
Replying to @vgr
This is by @huggingface it's called LAION and is kind of like an almost Dalle-2
1
6
New Yorker cartoon attempt
Quote Tweet
Replying to @vgr
i don't have DALL-E access yet but if this is doable with Latent Diffusion its probably doable in DALL-E as well
2
9
Notice something? The text is breakable in all 3. The text to image uses a text model to generate an image from an image model but didn’t recognize that the image model includes images containing text… so it treated the cartoon-text as an image. I think we broke this.
2
24
This reminds me of something I learned recently (claim made by a psychologist character on a TV show, but sounded true) — you can’t read text while dreaming because your text processing part is not active while dreaming.
6
3
36
For this thing to do cartoons it will need some sort of fractal mixing on the models, not one pass
1
9
Another one, again fails to generate in-image text
Quote Tweet
1
8
If anyone wants to try a simpler version of this challenge, prompt the system with ‘A man holding a sign saying “hello world”’
This might work because signs are a simpler mixed image-text concept than speech bubbles that firm a joke
But I suspect you’ll get an unreadable sign
5
1
13
Basically, the image model didn’t learn text, and the text model didn’t learn images, this is 2 blind men and an elephant
3
27
Remember text models are trained on input that’s already encoded as digital text strings. They are not literate, and can’t read in an OCR way. This thing only has oral literacy. It hasn’t realized that certain visual patterns correspond to text it knows in the form of a code.
2
1
13
the text part is blind and the vision part is pre-verbal. The entire effect rests on black box correlations between text and images… labeled images where the labels live in text space and the images live in image space
2
1
12
Hah I guessed right. This thing is trying to learn language literacy in a way humans don’t even try to. We get literate by being taught the alphabet and the mapping from sounds to visual images (non-Chinese) and learning the “formula” to decide any strong recognized as a string
1
14
It’s trying to learn “to read” through brute statistics of billions of images of text
1
12
I think there may be a fundamental problem here… how do you learn that input patterns in one kind of model map to tokens in another? The word “cat” is a label for the category of cats in images, but also the name of the set of visual renderings of the word itself…
5
13
There’s some kind of subtle levels/indirection thing going here. I recall Dennett I think talking about it in some book. The name of a thing is itself a nameable thing. Cat is the name of the category containing 🐱🐈, but the name of the word ‘cat’ is actually grbxl (say)…
1
7
The prompt ‘A man with a cat holding a sign saying “cat”’ should be rendered ‘A man with a cat holding a grbxl sign’
But then you can get infinite regress… or pointer chasing
At some point the system also needs to learn that grbxl is the name of the word for cat
2
8
In the current paradigm it will never learn this… it will just learn that the world contains lots of images with high correlated presence of cats and grbxls… but grbxls might as well be any other thing cats are often found correlated with, like food saucers
1
8
Deep learning gonna have issues
GIF
read image description
ALT
2
16
These are either Godel-grade objections, or AIs will evolve without ever needing to make maps or distinguish them from territories.
Maybe sufficiently powerful statistics is indistinguishable from signification
Maybe signification is a hack for meatbags that AIs don’t need
2
21
tldr: AIs haven’t learned (in any sense) that the visual form of the written word “cat” corresponds to the token “cat” in language models, but HAVE learned that the cat token corresponds to pictures of cats themselves
Not a showstopper but not a trivial issue either
4
1
13
Speech recognition might have an important bridging role to play here somehow 🤔
The map between text-token space an a continuous sensor field is more direct there, and recorded sound data is dominated by verbal sounds
3
3
If the goal is to reproduce human thinking (which may be a bad goal) I don’t think multimodal learning is enough. You need a dose of GOFAI because human symbolic representation systems are… symbolic.
But otoh, it’s worth asking what’s a natural “native” language for AIs?
Quote Tweet
Replying to @vgr
The future is multimodal models. To me, generative image models are more akin to a visual perceptual module in human brains. To approx human cognition, we need to connect modules just as our brains do.
Work on past-tense verbs from the 80's I'm studying: stanford.edu/~jlmcc/papers/
1
4
If AIs made up their own symbolic language from raw sensor data what would it be like? Would it even need concepts like “cat” encoded on an arbitrary 26-character string generation scheme?
No!
Making jokes in human cultural media like cartoons is an unnatural problem for AIs
1
1
8
Having AIs learn human language at all is a transient hack priority. Like humans learning whale squeak language.
And trying to come up with cartoon squeaks whales would find funny.
Humans use text representation schemes because we have specific limits/constraints
2
1
9
If you have an AI a rich enough embodiment with raw sensor info and showed it birds and airplanes, it might figure out flight but bypass all human representation schemes we think must be encountered along the way — symbolic language and math, physics theories etc
1
1
9
We can no more teach AIs to fly than we can teach birds to fly. Birds sort of taught us how to fly but by simply existing as a demonstration, not by conveying their evolutionary learning history.
1
6
This makes me think language models are a giant yak shave that are valuable right now but in the long term irrelevant. If AIs need language they’ll invent their own. If they need math they’ll discover their own.
1
2
11
Learning the human versions of important signification maps will be a temporary crutch but not speed up their own actual evolution of comparable native structural phenomena.
It’s not clear to me they ever will though. Faking shallow versions of the human thing might be it.
1
6
They have a lot of useful power but it may not be relevant to the capabilities we are considering here. Horses are much more powerful than humans in muscle terms but Clever Hans learning to fake counting didn’t lead to a species of genius horses
1
3
We are simultaneously overestimating and underestimating what AIs can do by projecting the arbitrary biases of our own embodiments onto a radically different substrate.
What humans have learned over 1 million years of evolution with a 2lb brain is simply not that relevant here
2
1
13
Show replies






