AI Speech vs Human Speech

Question

0 votes

Is it possible to use matlab to detect whether a human or AI voice is talking? If so, can someone give me links to assist.

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Walter Roberson le 17 Avr 2019

Not if it is a sufficiently good AI program.

But until then:

Sythesized speech is usually cleaner (less noise) than human speech.
Synthesize speech usually says the same word the same way each time. Human speech seldom does
Human speech does much more blending -- modification of the initial sounds of a word depending on the sounds at the end of the previous word. Some of this is just smooth movement between sounds being easier than sudden movement, but humans tend to modify the sounds themselves, in ways that you can notice if you really listen but which you might have trouble expressing
If you can get the voice to say "Merry Mary, marry", and you can clearly understand which word is which, then probably it is AI. If two of the words come out exactly the same, then probably it is AI. If some of the words come out almost but not quite exactly the same and you have trouble saying what the difference is, then the voice might be human. (There are large regional differences in how the words get said, but it takes speech synthesis to make them exactly the same.)
Try it on homonyms. For example, recently I told Alexa to play one of Elton John's albums, and it said that it was going to play "Live in Australia", with a short i (the verb form, as in, "I live in Canada"), instead of using the long i adverb form, "Filmed in front of a live audience")

Brantley le 17 Avr 2019

How would you use matlab to determine if the AI or human is talking?

Walter Roberson le 17 Avr 2019

The first two items I posted are obviously actionable:

Measure noise in the signal. More noise would tend to imply human.
Find copies of the same word and compare them to see how similar they are. You might use mfcc to recognize words, and then once recognized, isolate the words from the stream, and xcorr. High cross correlation makes it more likely that it is AI. You might have a look at dynamic time warping: the less warping that is needed, the more likely that it is AI generated, since AI is less likely to have micro-changes in timing.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Gagan Agarwal le 30 Mai 2024

0 votes

Hi Brantley

Yes, it's possible to use MATLAB to detect whether a sound is produced by a human or an AI-generated voice. This task falls under the broader category of audio analysis and machine learning.

Here's a high-level overview of how you might approach this problem:

Collect a dataset that includes both human and AI-generated voices. The dataset should be large and diverse enough to train a robust model.
Audio data generally requires preprocessing before it can be used for training a model. This might involve converting the audio files into a uniform format, sampling rate normalization etc.
Choose the deep learning model for training.
After training evaluate the performance of the model.

I hope it helps!

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

AI Speech vs Human Speech

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

AI Speech vs Human Speech

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens