Extracting Audio File Frequency

10 views (last 30 days)
Hello there,
I need to find the frequency of the audio file for specific segments. In my code I find the segments of talking and take the fft of these portions and find the frequencies. But the problem arises at the frequency part I need to find different frequencies but find exactly the same values. Could you please help?
Thanks in advance.

Accepted Answer

William Rose
William Rose on 15 Apr 2022
I have listened to file A1.wav. The instances of singing are not at 15 second intervals, even though this is expected by the code. Therefore the segments analyzed do not always contain singing. The amplitude of the singing is small. There are significant unrelated background noises. The pitch being sung sounds like the E flat above middle C (Eflat4). Therefore the dected dominant frequency should be around 311.1 Hz.
Approximate times of vocalization, in seconds: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107.
There is background talking during 61-66. There is coughing or some other background sound in 82-87.
Conclusion: The frequency analysis of file A1.wav by rmscalculation.m is affected by background noises and incorrect timing. The signal to noise level is poor.
Recommendations:
  • Improve the recording set up to increase signal amplitude and reduce background noise.
  • Edit the audio file to extract the exact segments that contain the singing which you want to analyze.
I have looked at your code: rmscalculation.m.
Analysis of the script:
rmscalculation.m has three nested loops.
The outer loop is: for k=1:number of participants.
The middle loop is: for l=1:number of tests. This loop reads in a different audio file on each pass. It computes envolpe of hte signal as the moving average (with width 1000 points=1/44 of a second) of the absolute vaue of the signal. When the moving average crosses a threshold is deemed to be the time when talking starts.
The inner loop is: for i=1:6. Each pass extracts a segment of the signal. The segment start times are 15 seconds apart. The segments are 4.9 seconds long. The power spectrum of the segment is determined. The frequency that has max. power, within the frequency range 236 to 367 Hz, is determined for each segment.
Does that sond correct?
The script rmscalculation.m does not run. It gives the error
Error using xlsread (line 136)
Unable to open file 'F4_A1'.
File 'F4_A1' not found.
Error in rmscalculation (line 11)
a = xlsread(fname1); % comand to read excel/ particle count file
I commented out the lines related to file F4_A1. Then the script ran without error. It does not display any results.
To see the results:
>> disp(seg_Freq')
261.9312
239.8933
261.9312
261.9312
255.0339
262.0995
The frequency range of 90% to 140% of the middle C frequency will allow detection of frequencies corresponding to pitches from just below B3 to just above F4.

More Answers (3)

William Rose
William Rose on 12 Apr 2022
[moved my answer from a comment to an answer]
The google drive link you provided requres access permission. You may attach the audio file if you zip it first.
You probably know this already, but I will mention this just in case you do not know this:
When you compute the FFT or power spectrum of a segment of the signal, the frequencies of the FFT or power spectrum will be the same for each different segment (assuming the segment lengths are the same). The amplitude or power at each frequency will vary from segment to segment. You can compute the mean frequency for a segment, or you can compute the frequency with maximum power in each segment, etc. The script below does both, for an 8-second signal with gradually increasing frequency, divided into 0.5 second long segments. It plots the results. It appears that the max power frequency is better behaved than the mean frequency, in this example.
%% constants
Fs=8000; %sampling rate (Hz)
T=8; %signal duration (s)
wi=220*2*pi; %initial frequency (radians/s)
wf=880*2*pi; %final frequency (radians/s)
Tseg=0.5; %segment duration (s)
%% compute the signal
dt=1/Fs; %sampling interval
N=Fs*T; %signal duration (samples)
t=dt*(0:N-1); %vector of time values
phase=wi*t+(wf-wi)*t.*t/(2*T); %phase for signal with changing frequency
x=cos(phase); %signal amplitude
%% compute FFT of each segment
N1=Fs*Tseg; %segment duration (samples)
Nseg=T/Tseg; %number of segments
fmax=zeros(1,Nseg); %allocate array for max.power frequency of each segment
fmean=zeros(1,Nseg); %allocate array for mean frequency of each segment
df=1/Tseg; %frequency interval
f=(0:N1/2)*df; %vector of frequencies, up to Nyquist frequency
Nf=length(f); %number of frequencies in one-sided FFT
Y=zeros(Nf,Nseg); %allocate array for FFTs
for i=1:Nseg
X=fft(x((i-1)*N1+1:i*N1));
Y(:,i)=abs(X(1:Nf)); %magnitude of one-sided FFT
[~,indmax]=max(Y(:,i)); %index of largest element of Y
fmax(i)=f(indmax); %frequency with maximum power
fmean(i)=sum(f'.*Y(:,i))/sum(Y(:,i)); %mean frequency (amplitude-weighted)
end
%% plot results
figure;
subplot(211), plot(1:Nseg,fmax,'rx',1:Nseg,fmean,'bo');
xlabel('Segment'); ylabel('Frequency (Hz)');
legend('Max.Freq.','Mean Freq.'); grid on
title('Max & Mean Frequency vs. Segment')
subplot(212)
colorspec=[1,0,0;1,.33,0;1,.67,0;
1,1,0;.67,1,0;.33,1,0;
0,1,0;0,1,.33;0,1,.67;
0,1,1;0,.67,1;0,.33,0;
0,0,1;.5,0,1;
1,0,1;1,0,.5];
for i=1:Nseg
plot(f,Y(:,i),'Color',colorspec(i,:));
hold on;
end
xlabel('Frequency (Hz)'); ylabel('Amplitude'); xlim([0,1200])
grid on; title('Amplitude Spectra for Segments')
Try it. Good luck.
  1 Comment
mehtap agirsoy
mehtap agirsoy on 12 Apr 2022
Hi, many thanks for the help.
Zip file exceeds thelimits so I added drive link but forgot to change permissions, now it is ok.When you've time if you can check I'd be glad.
My freq results should fluctuate aroun 262 Hz when I tried max and mean results 617 and 22049.9 respectively. My segment freq are
261.931228637695
239.893341064453
261.931228637695
261.931228637695
255.033874511719
262.099456787109
I'm not sure these are ok or not, a bit suspicious.

Sign in to comment.


William Rose
William Rose on 13 Apr 2022
Middle C! The frequency sweep in my code goes from A3 to A5.
  2 Comments
mehtap agirsoy
mehtap agirsoy on 13 Apr 2022
So sorry for the inconveninence. When I compress the file it still exceeds the limit. Anyone with the link are editor now.

Sign in to comment.


William Rose
William Rose on 16 Apr 2022
I wrote a script that extract 3 seconds of sound from each vocalization. As I said before,the times of note-singing are approximately: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107 seconds.
Therefore I extract sound from 2-5, 23-26, 43-46, 62-62, 83-86, 103-106 seconds.
I measure the mean frequency and the frequency of maxmimum power in each segment.
The max.power frequencies are about 620-630 Hz, consistent with the subjects singing E flat 5, also known as the E flat above treble C. The expected frequency of this pitch is 622 Hz, with A440 equal temperament tuning.
The script plots the max frequency for each segment and the power spectrum for each segment.
You confined the frequency search to 0.9 - 1.4 times middle C. This singing signal has very little power in that frequency range. Most of the power is around 630 Hz. I initially thought thse children were singing in octave 4 (using scientific pitch notation). Now I think they are singing an octave higher, in octave 5. It is not always easy to decide.
My code also creates a file, A1sel.wav, which is the selected audio segments, plus 1 second of silence after each segment. The graphical output from the script is below.
  2 Comments
William Rose
William Rose on 16 Apr 2022
@mehtap agirsoy, You are welcome. Good luck with your work!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by