We've fine-tuned GPT-2 using human feedback for tasks such as summarizing articles, matching the preferences of human labelers (if not always our own). We're hoping this brings safety methods closer to machines learning values by talking with humans.https://openai.com/blog/fine-tuning-gpt-2/ …
-
-
Surely if you can do this, you can use another transformer as an adversarial module to clean up some of the artifacts (repetition) right? No reason these supervision signals have to come from a human.
-
I dunno about that. How do you train that adversarial Transformer? By distinguishing real/fake? Even assuming you have the relevant corpus, then that's just a GAN, and those don't work well for sequences like text.
- Još 6 drugih odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.