We've fine-tuned GPT-2 using human feedback for tasks such as summarizing articles, matching the preferences of human labelers (if not always our own). We're hoping this brings safety methods closer to machines learning values by talking with humans.https://openai.com/blog/fine-tuning-gpt-2/ …
-
-
I dunno about that. How do you train that adversarial Transformer? By distinguishing real/fake? Even assuming you have the relevant corpus, then that's just a GAN, and those don't work well for sequences like text.
-
Traditional GANs don't work for text because text is non-continuous and it messes up the gradients (is my understanding). But an adversarial target has got to be fundamentally the right answer vs. maximum likelihood. Those advantageous properties are still there.
- Još 5 drugih odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.