Conversation

Replying to
a colab allowing you to tweak SD images by editing prompts, incredibly powerful!
Quote Tweet
The absolute chads at @weights_biases (I think @soumikRakshit96) took the prompt-editing paper from Google Research (which used Imagen), and implemented it with Stable Diffusion instead. So yea, tweaking SD images just got wayyy easier 👀 wandb.ai/wandb/cross-at
Show this thread
Image
1
6
super cool strategy to get actually coherent videos with Stable Diffusion (colab to be released soon supposedly)
Quote Tweet
1.3 -"But we've already had 3D algorithm for DD and SD!" - Yes, and it also worked on top of MiDaS depth maps. The difference is that it interpolates missing info by 'warping' space, which is good for artistic or trippy videos, but not good for realistic animation. Visualization:
Show this thread
Embedded video
0:04
4.4K views
2
7
Someone finally managed to add speaker detection to Whisper! Should be very handy for podcast/interview transcriptions :D (colab in thread)
Quote Tweet
I've been using OpenAI's Whisper model to generate initial drafts of transcripts for my podcast. But Whisper doesn't identify speakers. So I stitched it to a speaker recognition model. Code is below in case it's useful to you. Let me know how it can be made more accurate.
Show this thread
Image
2
16
Runway releases AI magic tools for video! Closed source and you have to use them thru their website, but way too cool not to share
Quote Tweet
Introducing AI Magic Tools Dozens of creative tools to edit and generate content like never before. New tools added every week. Available now: runwayml.com
Embedded video
0:56
218.9K views
1
10
reading people's thoughts with frightening accuracy? yea why not (would be very interesting to check if accuracy varies with people that report not having an internal monologue)
Quote Tweet
very excited to share our paper on reconstructing language from non-invasive brain recordings! we introduce a decoder that takes in fMRI recordings and generates continuous language descriptions of perceived speech, imagined speech, and possibly much more biorxiv.org/content/10.110
Show this thread
2
4
text-to-{textured 3d model with animations}?? only paper for now, but looks very very cool
Quote Tweet
#AvatarCLIP (Text2Avatar Model) generates and animates 3D avatar from natural language description of body shape, appearance and motion. - Project Page: hongfz16.github.io/projects/Avata - Demo Video: youtube.com/watch?v=-l2ZMe - Code: github.com/hongfz16/Avata
Embedded video
2:16
4.8K views
2
3
first text-to-HDR model I've seen, and looks like it comes with code too!
Quote Tweet
#Text2Light generates VR environment (in the form of HDR panorama) from natural language description. - Project Page: frozenburning.github.io/projects/text2 - Demo Video: youtube.com/watch?v=XDx6tO - Code: github.com/FrozenBurning/
Embedded video
0:12
7K views
1
4
Remember Phenaki? (it was one of the first text-to-video models in this thread, released with no code or papers). Well, someone is working on re-implementing the code from the paper! Should be able to generate videos of up to 2 minutes once done 🤩
Quote Tweet
Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch github: github.com/lucidrains/phe
Image
2
8
Replying to
"audio hypercompression" model by Meta, seems to get insane results and be fully open-source 👀
Quote Tweet
Today we’re sharing some of our latest progress on AI-powered hypercompression of audio. Our researchers achieved ~10x compression rate vs. MP3 at 64 kbps with no quality loss — the first time this has been done with 48 kHz sampled stereo audio ➡️ bit.ly/3FfgPCb 1/5
Show this thread
Embedded video
0:16
88.4K views
1
8
NERFs in 4D! (aka, reconstructing a virtual camera you can freely move around a scene from a video, but now the video can also play instead of just staying still)
Quote Tweet
NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields abs: arxiv.org/abs/2210.15947 project page: lsongx.github.io/projects/nerfp
Embedded video
2:09
13.5K views
1
11
Fine-tuning for Whisper! The article explains how you can improve quality of a language, but I wonder if you could improve domain knowledge too (to make, for example, crypto podcast transitions more accurate)
Quote Tweet
Whisper fine-tuning is **here**! Check out the blog post for a step-by-step guide on fine-tuning Whisper with 🤗 Transformers: huggingface.co/blog/fine-tune Boost WER performance vs zero-shot with as little as 8h of training data 🤯 All on a single Google Colab notebook...
Show this thread
1
5
Generate videogame cutscene camera shots from a movie director you like 🤩
Quote Tweet
Human-AI co-creation of #videogame cutscenes in the style of eminent directors @acmchiplay! Great work by @lineupthesky, supervised by @perttu_h & me. @unity plugin, extendable dataset github.com/inanevin/Cine-, OA paper dl.acm.org/doi/10.1145/35, presentation programs.sigchi.org/chi%20play/202
Show this thread
Image
1
2
really nice greenscreening model, with code/weights supposedly coming soon!
Quote Tweet
FactorMatte: Redefining Video Matting for Re-Composition Tasks abs: arxiv.org/abs/2211.02145 project page: factormatte.github.io
Embedded video
0:54
7.7K views
2
4