Opens profile photo
Follow
Agrim Gupta
@agrimgupta92
CS PhD
Science & Technologyweb.stanford.edu/~agrim/Joined January 2017

Agrim Gupta’s Tweets

Feature request for prompt: Interesting follow ideas on X. The model should do more than extracting future work snippets. Can potentially be really helpful with the paper as context combined with huge database of papers --> brainstorming parter!
Quote Tweet
🪐 Introducing Galactica. A large language model for science. Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more. Explore and get weights: galactica.org
Show this thread
Embedded video
0:43
438.8K views
1
Every day we have a new LLM paper but web search still sucks! Having trouble answering simple questions in 2022. Interestingly 4th link does have the correct age highlighted 🤷‍♂️. Also checked same failure mode :/ Hopefully LLMs disrupt search soon!
Image
3
I'm recruiting! If you are excited about teaching robots to perceive, reason about, manipulate, and move around everyday environments, apply the CS Ph.D program at GT (Interactive Computing) and mention my name. Apps from underrepresented groups in AI&Robo are especially welcome!
Image
5
285
Show this thread
Task specification is a challenging problem in robotics. We introduce VIMA: A transformer based model which can perform *any* task as specified by multimodal prompts. Intuitive & multimodal task interface is going to be essential for useful embodied agents.
Quote Tweet
We trained a transformer called VIMA that ingests *multimodal* prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data!🧵
Show this thread
Embedded video
0:47
67.8K views
9
🤯🤯
Quote Tweet
Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos! #ImagenVideo imagen.research.google/video/ Work w/ @wchan212 @Chitwan_Saharia @jaywhang_ @RuiqiGao @agritsenko @dpkingma @poolio @mo_norouzi @fleet_dj @TimSalimans
Show this thread
Embedded video
0:05
152.7K views
3
Amazing results!
Quote Tweet
Happy to announce DreamFusion, our new method for Text-to-3D! dreamfusion3d.github.io We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed! Joint work w/ the incredible team of @BenMildenhall @ajayj_ @jon_barron #dreamfusion
Show this thread
Embedded video
0:48
216.8K views
2
Replying to
Haha, that was the fun with working with great students - just push them till it’s “impossible” 🤣 In 2018, our lab did publish an early scene generation paper - it’s amazing to see the progress of our field!
198
Delighted to announce the public open source release of #StableDiffusion! Please see our release post and retweet! stability.ai/blog/stable-di Proud of everyone involved in releasing this tech that is the first of a series of models to activate the creative potential of humanity
147
5,634
Show this thread
Our new paper uses insights from the classic 8-point algorithm to tweak modern ViTs, giving a clean end-to-end model that inputs two RGB images and outputs their relative translation + rotation. It helps to think carefully about a model's inductive biases!
Quote Tweet
What if ViTs could do classical computer vision? Our #3DV2022 work shows with minor changes, a ViT performs similar computations to the 8-point algorithm.💡 This is a useful inductive bias for estimating camera pose! abs: arxiv.org/abs/2208.08988 ✍️ @jcjohnss and David Fouhey
Show this thread
Image
3
205
Excited to share ACID, our RSS 2022 work on volumetric deformable manipulation🐻. Honored to be a Best Student Paper Award finalist. Much thanks to friends from #guibaslab for the discussion and feedback along the way.
Quote Tweet
We've released an updated version of ACID, our #RSS2022 paper on volumetric deformable manipulation, with real-robot experiments. ACID predicts dynamics, 3d geometry, and point-wise correspondence from partial observations. It learns to maneuver a cute teddy bear into any pose.
Show this thread
Embedded video
0:32
6.1K views
22
3/ Second, during training, we mask a variable percentage of tokens instead of a fixed mask ratio. For inference, MaskViT generates all tokens via iterative refinement where we incrementally decrease the masking ratio following a mask scheduling function.
Image
1
4
Show this thread
2/ Our approach is based on two simple design decisions. First, for memory and training efficiency, we use two types of window attention: spatial and spatiotemporal. We find that this simple de-coupling greatly improves the training speed without sacrificing quality.
Image
1
5
Show this thread
Our new paper detects objects and predicts their 3D shape and layout, trained without any 3D shapes! Instead we use multiview 2D silhouettes + differentiable rendering losses. This is an important step in scaling up 3D perception, where we can't rely on large-scale ground-truth.
Quote Tweet
Object recognition should happen in 3D because the world is 3D. On the heels of Mesh R-CNN, we build a model that detects all objects, predicts their 3D position in space (layout) and their 3D shape *from a single image of a novel complex scene*. At CVPR 2022. (1/n)
Show this thread
Embedded video
0:09
35.5K views
2
202
After LLMs, the next big thing will be LCPs: Large Control Policies. Very general pretrained goal-conditioned policies for embodied agents. If you provide it with a goal vector / example / text, it can do a large number of tasks in a large number of environments. Then we retire🤖
15
344
Aaand it's finally out, so so excited to share what we've been working on! Dall-E makes images from descriptions. I could tell you how amazing this neural net is, but its nothing like seeing it so go check it out here! :) openai.com/dall-e-2/
Quote Tweet
Our newest system DALL·E 2 can create realistic images and art from a description in natural language. See it here: openai.com/dall-e-2/
Embedded video
0:30
873.8K views
15
755
Thanks a lot and for inviting me to share our latest work on evolving agent morphologies and learning universal controllers! Joint work with my wonderful collaborators
Quote Tweet
First up: @agrimgupta92 from Stanford is interested in how the form of a machine changes its ability to learn, shifting the focus away from learning algorithms operating by themselves and onto learning combined with a kind of bodily evolution. #EmTechDigital
Show this thread
14
AI augmenting human creativity is exciting! Generative models are attractive for that. But users need to have more control. Make-A-Scene does just that (in addition to being SOTA)! Also check out the video of a lovely story Oran wrote and illustrated with this approach! 👇
Image
11
398
Show this thread
Check out our work on large-scale pre-training for robot morphologies! Transformer learns universal controllers that generalizes zero-shot to novel robot configuration and tasks 🦾
Quote Tweet
1/ Can we replicate the success of large scale pre-training --> task specific fine tuning for robotics? This is hard as robots have different act/obs space, morphology and learning speed! We introduce MetaMorph🧵👇 Paper: arxiv.org/abs/2203.11931 Code: github.com/agrimgupta92/m
Show this thread
Embedded video
1:14
25.4K views
49
Our new #iclr2022 paper - Towards a foundation model for robotics: one transformer to control many new robot morphologies through large-scale pre-training on another set of morphologies. Expertly lead by & collab w/ paper: arxiv.org/abs/2203.11931 thread ->
Quote Tweet
1/ Can we replicate the success of large scale pre-training --> task specific fine tuning for robotics? This is hard as robots have different act/obs space, morphology and learning speed! We introduce MetaMorph🧵👇 Paper: arxiv.org/abs/2203.11931 Code: github.com/agrimgupta92/m
Show this thread
Embedded video
1:14
25.4K views
1
16
Our newest #iclr work called MetaMorph leverages on Transformer models to learn universal controller for modular robot morphology. Awesome collaboration with supported by 😀
Quote Tweet
1/ Can we replicate the success of large scale pre-training --> task specific fine tuning for robotics? This is hard as robots have different act/obs space, morphology and learning speed! We introduce MetaMorph🧵👇 Paper: arxiv.org/abs/2203.11931 Code: github.com/agrimgupta92/m
Show this thread
Embedded video
1:14
25.4K views
4
103
Show this thread