Conversation

If the main advantage of written language is out of order scannability, could you create a type of “reading” of audio where visualization of soundtrack is not text or waveform, but say sentiment color or evocative iconic dalle images, and eyeball tracking drives the audio track?
8
53
I don’t listen to podcasts, and find straight transcripts unreadably lightweight, but I can see myself “affect reading” like this
1
7
This might also be how actual reading gets obsolete and turned into something like machine code You can retain words but as a sort of vibe medium, via word clouds. Like subtitles but not windows of serial text. More like what GPT has in its hidden layer attention buffers
1
11
As in, in a 1000 word text if you’re on word 523, you wouldn’t show the 515-530 window but a) immediate phrase b) local context words from paragraph neighborhood c) important multi-scale context words from chapter, act, book etc d) gestalt vibe words All picked up by GPT
1
5
Replying to
Yeah this is the image track idea in primitive form. We already navigate streaming video kinda like this when rewinding/seeking to a frame
Quote Tweet
Replying to @vgr
That’s a cool idea. Similarish to this demo twitter.com/karenxcheng/st…
1
1
Dalle is bad at text-in-images but comic books are the primitive version of this type of “reading++” They persist context across many frames of dialogue, and sometimes persist dialogue across frames of fast moving dialogue Cf various subtleties in Scott McCloud’s books
1
2
Eg: Stylized 3-frame rendition of Will Smith v. Chris Rock* [start of punch]: keep my wife’s name… [end of punch]: …out of your… [follow through]: …goddamn mouth! * in reality the line came after the punch, but this is a better comic book viz
1
2
Oh shit, I think I just defined the problem in a clean Grand Challenge way: auto-transcribe a movie/live video into a comic book with speech bubbles and those narrator boxes, with occasional block-quotes. Very challenging.
12