So, the multi-speaker neural TTS models have gotten eerily good; the few-sample/voice-cloning NNs are also now credible with a few minutes of samples. Has anyone told the various media fandoms they could get their favorite characters' voices on tap with some elbow-grease...?
-
Show this thread
-
Replying to @gwern
Without an implementation that also does style transfer for emotion and inflection, it's hard to make media using these tools. But yeah, technically we're a couple of thesis projects away from being able to resurrect any show as a fan cartoon with the original voice actors.
1 reply 0 retweets 7 likes -
Replying to @AndreTI
My thinking is normal conversational speech is already a big chunk of what people would use it for. I also suspect existing models may encode substantial emotion/inflection range into the embeddings/latents already which can be reverse-engineered & then controlled like w/GANs.
2 replies 0 retweets 6 likes -
"$CHARACTER narrates your voicemail message" or "$CHARACTER narrates your Facebook comment" would be extremely popular product features, concerns about aesthetics regarding artistic integrity or opinion of rightsholders aside.
2 replies 1 retweet 7 likes
The future may be audio memes, too. (Oh dear.)
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.