! Happy to hear you've found it easier to use and more accurate than other systems ☺️! Best of luck with your project finding out about women who worked at the Smithsonian!
One trick that sometimes seems to work is to add examples to the prompt of behavior. This seems to work "ok" if you have a set of examples that OpenAI got wrong before.
But this moves the problem into a new one; which examples are good/bad for the prompt?
Words that seem really minor to me can totally tick the output the other way.
I might be comparing it too much to a bag of words model, which isn't fair. But I really would like to understand/debug my prompt better here.
I'm exploring prompt engineering a bit in OpenAI, and it turns out to be tricky!
The difference between these two input examples is "television ad" vs "television appearance".
Just to ping folks; got the same experience? I'm trying to understand this better.
NEW RELEASE: Introducing spaCy v3.5!
Since v3.4, we've added:
⌨️ Three new CLI commands
💥 Fuzzy matching
🚀 Improvements to our entity linking functionality
💫 A range of language updates and bug fixes!
https://spacy.io/usage/v3-5
TIL that there was a pandemic on World of Warcraft in 2005.
It was called the "corrupted blood incident" and it was caused by a disease that spread from a raid boss, to players, to their pets, to capital city NPCs and from there to low level players.
https://koaning.io/til/social-distance-wow/…
But maybe that A/B compared dataset can be super useful for something else! Varification!
Still a work in progress, but I'm pretty interested in how it'll turn out.
If folks did something similar, please ping!
One idea is to train two models. A classifier and a sorter.
But technically, if I have a classifier that can predict a score between 0 and 1 then I can already use that to sort ....
But I also made a much simpler one that just lets me deem an example as "would click".
This feels a lot simpler to reason about, but I'm wondering how I might be able to combine annotations from both.
Working on a system that can generate my own front page of the internet.
So I'm toying around with annotation interfaces. The first one allows me to compare two options.
It is finally here, the v0.13 release of BERTopic! 🎉
Explore multi-topic assignments, supervised topic modeling, outlier reduction, light- and heavyweight options, and much more in this bigger-than-expected release!
Changelog:
https://maartengr.github.io/BERTopic/changelog.html…
An overview thread👇🧵
Announcing finetuners for embetter!
There are scikit-learn components that can update the embedded space to make bulk labelling easier!
I wrote a full tutorial here: https://koaning.github.io/embetter/finetuners/…
Here's a before and after on the PCA space of a sentence-transformer. The color is the class in the test set.
After finetuning, the space is much more "polarised" but also makes it easier to select candidates of interest.
The trick is to re-use a neural network but as a sklearn transformer. We use the `y`-label, but only to use the gradient signal to update the representation.
Announcing finetuners for embetter!
There are scikit-learn components that can update the embedded space to make bulk labelling easier!
I wrote a full tutorial here: https://koaning.github.io/embetter/finetuners/…
is there an app to turn blogposts into audio/podcast segments?
i recall folks talking about it a while ago and id really like to add more blogs into my morning wall with the stroller.
@fishnets88@jeremyjordan this was just posted on Medium (human-learn and a link to jeremy's normconf video :) https://towardsdatascience.com/human-learn-rule-based-learning-as-an-alternative-to-machine-learning-baf1899ecb3a…
@fishnets88 attended a few conferences talking about techniques and tools for data and machine learning. He's always handing out stickers!
@DamianRomero_CL and Magda also discussed data annotation at several events, and @victorialslocum presented on spancat at PyData Global!
Here's the link again for the paper: https://arxiv.org/abs/2212.09255
The hyperparameters are in the appendix as part of the config file, allowing easy reproducibility. We will also release a spaCy project for this very soon! (4/4)
🆕 Typeset plugin ✏️
Insiders 4.27.0 ships a new built-in plugin, which allows to preserve formatting in parts of the site that MkDocs normally doesn't 😏 This has been requested soooo many times and finally, it's possible!
Documentation:
https://squidfunk.github.io/mkdocs-material/reference/#built-in-typeset-plugin…
Did you know that you can use Prodigy from your mobile phone?
In our latest Prodigy Short, @fishnets88 (who has a newborn) shows how to set up Prodigy so that you can use swipe gestures and need only one hand to annotate Watch it here: https://youtu.be/-Bx-3DaE64A
So instead of starting big ... maybe it's better to start small. That way, you're more nible and you can iterate when you spot an issue.
Iteration really is a beautiful thing.
For details, check the blogpost!
https://koaning.io/posts/go-emotions/…
That means that you shouldn't be surprised if these annotators end up not agreeing with each other. Especially when you're giving them a subjective task like emotion detection.
Imagine training an algorithm on a dataset full of disagreement.