After thinking about it for weeks and lots of procrastinating, I finally decided to reboot my computer
Boris Dayma 
@borisdayma
craiyon.comJoined February 2012
Boris Dayma 🖍️’s Tweets
Real world applications of Large Models
Learn how leaders in the emerging LLM industry are deploying models:
▪️ from
▪️ from
▪️ from
alongside our co-founder .
Register: fullyconnected.com/?utm_source=tw
9
19
If you're a founder/CEO and struggling today, feel free to reach out to me or other founders, my DMs are open, we need to support one an other more than ever on days like this!
13
33
286
The conference next week is stacked with leaders in machine learning😮
fullyconnected.com/?utm_source=tw
-
-
-
-
-
-
...
3
20
62
Show this thread
For those interested in understanding how to build an image search engine, I recommend going through repo which makes it very easy and explains the technical details:
1
21
Show this thread
You can now easily search through millions of images generated through the new Craiyon model
Quote Tweet
3
7
62
Show this thread
1/ There is a huge headroom for improving capabilities of our vision models and given the lessons we've learned from LLMs, scaling is a promising bet. We are introducing ViT-22B, the largest vision backbone reported to date: arxiv.org/abs/2302.05442
13
163
784
Show this thread
So apparently it's a "bat eared fox" from the African savanna!
Found it thanks to rom1504.github.io/clip-retrieval/
2
10
Show this thread
Craiyon's brand new AI model is LIVE🚀
⚡We decided to make it accessible for everyone right away... Could be a big mistake but we'll see 😅
Try the new model at craiyon.com
🧵there's more...
#nowaitlist #craiyon #AIart
11
64
280
Show this thread
Amazing video 😍
Strongly recommend! I couldn't have thought of a smarter way to gather data to reproduce ChatGPT!
Pretty confident it's gonna work 👏
Quote Tweet
It's surprisingly fun to collect data for OpenAssistant - Our open-source alternative to ChatGPT!
Check out the video:
youtu.be/64Izfm24FKA
#openassistant #chatgpt
Show this thread
2
41
22
294
1,150
Show this thread
To index efficiently embeddings there is faiss: github.com/facebookresear
To make the process easier there is autofaiss: github.com/criteo/autofai
To make it even easier there is clip-retrieval index: github.com/rom1504/clip-r
Next step would be "Guess what I want and where it is"
6
23
174
I've actually been wanting to write something similar, illustrated by my logged experiments on W&B.
2
16
Show this thread
Excellent thread on training guidelines ❤️
This type of knowledge is essential for training well large models
Quote Tweet
Excited to announce our Deep Learning Tuning Playbook, a writeup of tips & tricks we employ when designing DL experiments. We use these techniques to deploy numerous large-scale model improvements and hope formalizing them helps the community do the same! github.com/google-researc
Show this thread
1
3
32
Show this thread
Efficiency is key in ML apps; be it training or inference
And most critical decisions for efficiency are usually the initial key choices.
Its much harder to stack improvements after, but its not impossible.
6
7
55
Show this thread
So glad people had fun with dalle-mini this year.
More cool stuff coming in 2023!
Quote Tweet
Replying to @ClementDelangue @borisdayma and 7 others
my this year favourite was the dalle mini by @borisdayma it was just amazing and was really quick when almost everyone was waiting for their chance to get access to dall e, he allowed us to use it before openai
3
1
40
What am I excited about for 2023?
Supporting more open-source science, models, datasets and demos like Dalle-mini by , Bloom by , GPTJ by , by compvis , Santacoder by & many more!
14
33
283
CLIP is a fun model to experiment with when trained because it's visual.
It can be challenging to train because you need large batches that can see all the other samples so cannot rely on gradient averaging per device + accumulation but need all-gather operations for the loss.
2
22
Show this thread
I like the new JAX/Flax API for large models.
I'll be playing with it in my CLIP repo and experiment with how fast we can train a small CLIP model.
1
3
71
Show this thread
Great guide on parallelism with JAX pjit. Glad to see how the API is getting more simple and concise.
Note that for a large range of smaller models pmap is super easy to use.
Quote Tweet
New Flax guide on auto-parallelism with JAX's pjit!
We are adding a guide that shows how to scale up your Flax Module on multiple devices using JAX's latest auto-SPMD API, pjit, and to explore various partition layouts with custom dimension names.
flax.readthedocs.io/en/latest/guid
1
9
Using transformers for diffusion, awesome results and great detailed ablations!
Interesting that code is released in Pytorch when it was trained in JAX on TPU pods 🤔
Transformers are ideal in JAX because you can "scan" repetitive blocks for fast compilation
Quote Tweet
Scalable Diffusion Models with Transformers
abs: arxiv.org/abs/2212.09748
largest DiT-XL/2 models outperform all prior diffusion models on the class conditional ImageNet 512×512 and 256×256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter
Show this thread
8
12
103
It's always fun to see 2 runs racing each other where the lead keeps on changing
3
1
46
A huge thanks to the HITY (Has It Trained Yet) workshop organizers ( ) for bringing and me together for a day of commiserating on "graduate student descent"!
2
9
73
Show this thread
Quote Tweet
To inspire you for our just-released Diffusion Models Course
with @johnowhitaker
we are excited to share the free online event with @hardmaru, @deviparikh, @Buntworthy, @robrombach, @pess_r and @multimodalart on Nov 30th at 18h CET
Register here: huggingface.us17.list-manage.com/subscribe?u=7f
Show this thread
5
Final touches for my talk in "Has It Trained Yet" workshop at NeurIPS
👉 Full schedule here:
1
3
4
read image description
ALT
1
4
17
Does the number of channels per head matter in attention layer?
I see it's typically 64. Would it make a big difference to go to 32 or 128?
2
4
35
Very cool approach for video editing using atlas + edit layer 😍
Shows how beneficial it is to learn existing techniques and enhance them
Quote Tweet
Played with optimizing Neural Atlases through Stable Diffusion. So much fun! Here are a few examples of video edits:
@RafailFridman @DanahYatim
Show this thread
0:13
15.5K views
3
26
2
8
Craiyon updates:
-📈Upscaler:
Images are going from 256x256 to 1024x1024 resolution. You can zoom in with less pixelation and make your image creations larger.
-👕Shirts & Merch:
Love what you created? You can now get it printed on a shirt!
🧵
#craiyon
6
36
179
Show this thread
I think also something that can be dangerous with lecun init is if you add also absolute position embeddings, which would have a very different initialization scale (due to vocab size vs max number of positions).
1
3
Show this thread
Being a one-hot operation, I don't see why the initialization of this layer should depend on vocab size, especially for very large vocab.
But maybe I'm missing the point 🤔
2
3
Show this thread
I've seen lots of ways people initialize them:
- std of 1.0
- std of 0.02 (like all layers)
- Lecun uniform (scale depends on vocab size)
2
6
Show this thread
There is a 15% difference when initializing embedding layers with normal distribution with std of 0.02 vs 1.0
5
20
224
Show this thread

















