Music Player Network
Previous Thread
Next Thread
Print Thread
Cornell University - Voice Cloning from 5 Seconds of source
#3028631 02/12/20 04:46 PM
Joined: Nov 2014
Posts: 7,770
Likes: 10
MP Hall of Fame Member
OP Offline
MP Hall of Fame Member
Joined: Nov 2014
Posts: 7,770
Likes: 10
"We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples"

https://arxiv.org/abs/1806.04558



Live: Casio PX-560, Roland VR-700
Home: Rebuilt 1910 Chickering 5'2", Fender Rhodes MKI 88k, Yamaha S90ES
KC Island
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028673 02/12/20 08:09 PM
Joined: Jan 2011
Posts: 129
J
Senior Member
Offline
Senior Member
J
Joined: Jan 2011
Posts: 129
This is both amazing and terrifying. The era of Deep Fakes is upon us. It will not end well.

Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028675 02/12/20 08:13 PM
Joined: Aug 2017
Posts: 728
Likes: 5
Gold Member
Offline
Gold Member
Joined: Aug 2017
Posts: 728
Likes: 5

Gives me the heebs. That work was done in my neck of the woods here, it seems.


Samuel B. Lupowitz
Composer. Arranger. Musician. Food Enthusiast. Bad Pun Aficionado.
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028684 02/12/20 09:11 PM
Joined: Feb 2015
Posts: 4,216
Likes: 4
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: Feb 2015
Posts: 4,216
Likes: 4
That's really cool. Do you think he was drawn to this work because of how exactly his own voice sounds like a machine produced it?


"
Re: Cornell University - Voice Cloning from 5 Seconds of source
MathOfInsects #3028687 02/12/20 09:41 PM
Joined: May 2009
Posts: 2,999
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: May 2009
Posts: 2,999
Originally Posted by MathOfInsects
That's really cool. Do you think he was drawn to this work because of how exactly his own voice sounds like a machine produced it?

I thought he was going jump out from behind the curtain and shout "TADA! This entire voice over was TTV from a 5 second sample of my voice."


Whenever you find yourself on the side of the majority, it is time to pause and reflect.
-Mark Twain
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028689 02/12/20 09:48 PM
Joined: May 2009
Posts: 2,999
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: May 2009
Posts: 2,999
Another thing that occurred to me is I wonder how it would perform across different languages? One example had an English woman with an English accent converting English text. Okay, what if they had a person speaking Spanish into the sampler? Would the English text sound like their voice speaking Spanglish? Or Mandarin to French. It would be truly amazing if it could convincingly capture the inflection, intonation, and cadence of dissimilar languages.


Whenever you find yourself on the side of the majority, it is time to pause and reflect.
-Mark Twain
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028692 02/12/20 10:11 PM
Joined: Aug 2012
Posts: 2,047
J
MP Hall of Fame Member
Offline
MP Hall of Fame Member
J
Joined: Aug 2012
Posts: 2,047
T-800 speaking to the T-1000 on the phone using young John Connors voice:

"How's Wolfie?" T-1000 (in his mother's voice) "Wolfie's just fine dear, when are you coming home?"

T-800 hangs up and says to John: "Your mother is dead."

They not only can deep fake us, they can deep fake each other.

Other than that fun thought, I think this is pretty cool


Hammond SK1, Kurzweil PC3, Korg Pa3x, Roland FA06, Band in a Box, Real Band, Studio One, too much stuff...
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028695 02/12/20 10:26 PM
Joined: Jun 2010
Posts: 2,307
Likes: 12
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: Jun 2010
Posts: 2,307
Likes: 12
I'm so glad we're now that much closer to being able to produce indistinguishably convincing evidence of people doing and saying things they never did and said. What could possibly go wrong?

Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028703 02/12/20 11:14 PM
Joined: Nov 2014
Posts: 7,770
Likes: 10
MP Hall of Fame Member
OP Offline
MP Hall of Fame Member
Joined: Nov 2014
Posts: 7,770
Likes: 10
i think languages and accents, regional habits and colloquialisms, word choice, etc. would be a lot more backend work (Sourcing such a speaker to do the phrasing) before applying the other person’s timbre. Of course with public officials who are on the mic all the time it’s easy for a creative person to put together entirely believable performances - not unlike an impressionists does.

In other uses, I can imagine AI capturing the timbre and phrasing of say Eric Clapton’s playing and applying it to songs he never played.


Live: Casio PX-560, Roland VR-700
Home: Rebuilt 1910 Chickering 5'2", Fender Rhodes MKI 88k, Yamaha S90ES
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028711 02/12/20 11:33 PM
Joined: Jul 2018
Posts: 743
Likes: 2
Gold Member
Offline
Gold Member
Joined: Jul 2018
Posts: 743
Likes: 2
There's actually some VERY positive uses for this too. My wife is getting her PhD in linguistics, and computer modeling is absolutely huge in the field. Even outside of voice reconstruction, linguistics uses a lot of statistical modeling in areas like language documentation, to be able to help save endangered languages which are disappearing over the planet at an alarming rate (along with cultural identity). Linguists in the field are using modeling to more precisely document and eventually retrain indigenous cultures whose native speakers are aging out. We have a friend who just got accepted to the PhD program at Cornell, very fantastic school (even if they're still mired in dated Chompskian theory at times). I wouldn't be surprised if she gets involved in this project.


Puck Funk! smile

Equipment: Laptop running lots of nerdy software, some keyboards, noise makers…yada yada yada…maybe a cat?
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028727 02/13/20 03:28 AM
Joined: May 2015
Posts: 496
R
Senior Member
Offline
Senior Member
R
Joined: May 2015
Posts: 496
This is really fantastic!

I can’t wait to have my favorite Penthouse Letters read to me in Spongebob’s voice. oh yeah baby.

Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028790 02/13/20 12:47 PM
Joined: Feb 2005
Posts: 21,112
Likes: 5
Triple Secret Banninated
20k Club
Offline
Triple Secret Banninated
20k Club
Joined: Feb 2005
Posts: 21,112
Likes: 5
I can think of one excellent use of this. They can reproduce Majel Barrett-Roddenberry's voice and use it for all the computers on all new Star Trek projects!

Personally, I want her voice for Siri as well, at least on my own devices. But good luck with the rights to that.


The great thing about music is that there's always something to learn. The frustrating thing about music is that there's always something to learn!
Re: Cornell University - Voice Cloning from 5 Seconds of source
Jazzmammal #3028809 02/13/20 02:34 PM
Joined: Aug 2017
Posts: 728
Likes: 5
Gold Member
Offline
Gold Member
Joined: Aug 2017
Posts: 728
Likes: 5
Originally Posted by EricBarker
We have a friend who just got accepted to the PhD program at Cornell
Tell your friend to drop by the Language Resource Center and say hi!

Originally Posted by Joe Muscara
I can think of one excellent use of this. They can reproduce Majel Barrett-Roddenberry's voice and use it for all the computers on all new Star Trek projects!

Personally, I want her voice for Siri as well, at least on my own devices. But good luck with the rights to that.
If a petition starts to circulate the internet, count me in. Didn't Snoop Dogg license his voice for some pre-Siri GPS awhile back? Wait, can we get Snoop Dogg to be the voice of the Star Trek computers now? Count me in for that petition, too.

Originally Posted by Jazzmammal
T-800 speaking to the T-1000 on the phone using young John Connors voice:

"How's Wolfie?" T-1000 (in his mother's voice) "Wolfie's just fine dear, when are you coming home?"

T-800 hangs up and says to John: "Your mother is dead."

They not only can deep fake us, they can deep fake each other.

Other than that fun thought, I think this is pretty cool
"WHAT IS THE DOG'S NAME."


Samuel B. Lupowitz
Composer. Arranger. Musician. Food Enthusiast. Bad Pun Aficionado.
Re: Cornell University - Voice Cloning from 5 Seconds of source
samuelblupowitz #3028830 02/13/20 04:46 PM
Joined: Feb 2015
Posts: 4,216
Likes: 4
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: Feb 2015
Posts: 4,216
Likes: 4
Originally Posted by samuelblupowitz
Tell your friend to drop by the Language Resource Center and say hi!


What do you do there? My uncle was a prof there until about three years ago. I don't think he minds Chomsky even a little.


"
Re: Cornell University - Voice Cloning from 5 Seconds of source
MathOfInsects #3028869 02/13/20 06:09 PM
Joined: Aug 2017
Posts: 728
Likes: 5
Gold Member
Offline
Gold Member
Joined: Aug 2017
Posts: 728
Likes: 5
Originally Posted by MathOfInsects
Originally Posted by samuelblupowitz
Tell your friend to drop by the Language Resource Center and say hi!


What do you do there? My uncle was a prof there until about three years ago. I don't think he minds Chomsky even a little.
I handle the media (digital and otherwise) and distance learning equipment here, and manage the small recording studio. My background in literary criticism does occasionally come in handy, I say while keeping out of the Chomsky debate for the purposes of this thread. wink

I will say, I can imagine this voice cloning technology coming in handy for when our weekly podcast guests ignore our e-mails and don't schedule a time to come into the studio!


Samuel B. Lupowitz
Composer. Arranger. Musician. Food Enthusiast. Bad Pun Aficionado.
Re: Cornell University - Voice Cloning from 5 Seconds of source
samuelblupowitz #3028907 02/13/20 10:07 PM
Joined: Feb 2015
Posts: 4,216
Likes: 4
MP Hall of Fame Member
Offline
MP Hall of Fame Member
Joined: Feb 2015
Posts: 4,216
Likes: 4
Very cool. Yah, I was a Lit major undergrad. Clearly we were only interested in careers that were guaranteed to make us filthy rich.


"
Re: Cornell University - Voice Cloning from 5 Seconds of source
ElmerJFudd #3028932 02/14/20 12:38 AM
Joined: Sep 2012
Posts: 1,689
Likes: 3
Platinum Member
Offline
Platinum Member
Joined: Sep 2012
Posts: 1,689
Likes: 3
So, I can't believe anything I read, only half of what I see and now some clown can send me an offer of marriage in the voice of Jack Nicholson. Wow, that tech sure curdled fast! laugh


First SETI contact hailed as aliens beam out Voyager response!:
"SEND MORE CHUCK BERRY!"

Moderated by  Dave Bryce, Stephen Fortner 

Link Copied to Clipboard
Powered by UBB.threads™ PHP Forum Software 7.7.4