- Improve the recording set up to increase signal amplitude and reduce background noise.
- Edit the audio file to extract the exact segments that contain the singing which you want to analyze.
Extracting Audio File Frequency
10 views (last 30 days)
I need to find the frequency of the audio file for specific segments. In my code I find the segments of talking and take the fft of these portions and find the frequencies. But the problem arises at the frequency part I need to find different frequencies but find exactly the same values. Could you please help?
Thanks in advance.
William Rose on 15 Apr 2022
I have listened to file A1.wav. The instances of singing are not at 15 second intervals, even though this is expected by the code. Therefore the segments analyzed do not always contain singing. The amplitude of the singing is small. There are significant unrelated background noises. The pitch being sung sounds like the E flat above middle C (Eflat4). Therefore the dected dominant frequency should be around 311.1 Hz.
Approximate times of vocalization, in seconds: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107.
There is background talking during 61-66. There is coughing or some other background sound in 82-87.
Conclusion: The frequency analysis of file A1.wav by rmscalculation.m is affected by background noises and incorrect timing. The signal to noise level is poor.
I have looked at your code: rmscalculation.m.
Analysis of the script:
rmscalculation.m has three nested loops.
The outer loop is: for k=1:number of participants.
The middle loop is: for l=1:number of tests. This loop reads in a different audio file on each pass. It computes envolpe of hte signal as the moving average (with width 1000 points=1/44 of a second) of the absolute vaue of the signal. When the moving average crosses a threshold is deemed to be the time when talking starts.
The inner loop is: for i=1:6. Each pass extracts a segment of the signal. The segment start times are 15 seconds apart. The segments are 4.9 seconds long. The power spectrum of the segment is determined. The frequency that has max. power, within the frequency range 236 to 367 Hz, is determined for each segment.
Does that sond correct?
The script rmscalculation.m does not run. It gives the error
Error using xlsread (line 136)
Unable to open file 'F4_A1'.
File 'F4_A1' not found.
Error in rmscalculation (line 11)
a = xlsread(fname1); % comand to read excel/ particle count file
I commented out the lines related to file F4_A1. Then the script ran without error. It does not display any results.
To see the results:
The frequency range of 90% to 140% of the middle C frequency will allow detection of frequencies corresponding to pitches from just below B3 to just above F4.
More Answers (3)
William Rose on 12 Apr 2022
[moved my answer from a comment to an answer]
The google drive link you provided requres access permission. You may attach the audio file if you zip it first.
You probably know this already, but I will mention this just in case you do not know this:
When you compute the FFT or power spectrum of a segment of the signal, the frequencies of the FFT or power spectrum will be the same for each different segment (assuming the segment lengths are the same). The amplitude or power at each frequency will vary from segment to segment. You can compute the mean frequency for a segment, or you can compute the frequency with maximum power in each segment, etc. The script below does both, for an 8-second signal with gradually increasing frequency, divided into 0.5 second long segments. It plots the results. It appears that the max power frequency is better behaved than the mean frequency, in this example.
Fs=8000; %sampling rate (Hz)
T=8; %signal duration (s)
wi=220*2*pi; %initial frequency (radians/s)
wf=880*2*pi; %final frequency (radians/s)
Tseg=0.5; %segment duration (s)
%% compute the signal
dt=1/Fs; %sampling interval
N=Fs*T; %signal duration (samples)
t=dt*(0:N-1); %vector of time values
phase=wi*t+(wf-wi)*t.*t/(2*T); %phase for signal with changing frequency
x=cos(phase); %signal amplitude
%% compute FFT of each segment
N1=Fs*Tseg; %segment duration (samples)
Nseg=T/Tseg; %number of segments
fmax=zeros(1,Nseg); %allocate array for max.power frequency of each segment
fmean=zeros(1,Nseg); %allocate array for mean frequency of each segment
df=1/Tseg; %frequency interval
f=(0:N1/2)*df; %vector of frequencies, up to Nyquist frequency
Nf=length(f); %number of frequencies in one-sided FFT
Y=zeros(Nf,Nseg); %allocate array for FFTs
Y(:,i)=abs(X(1:Nf)); %magnitude of one-sided FFT
[~,indmax]=max(Y(:,i)); %index of largest element of Y
fmax(i)=f(indmax); %frequency with maximum power
fmean(i)=sum(f'.*Y(:,i))/sum(Y(:,i)); %mean frequency (amplitude-weighted)
%% plot results
xlabel('Segment'); ylabel('Frequency (Hz)');
legend('Max.Freq.','Mean Freq.'); grid on
title('Max & Mean Frequency vs. Segment')
xlabel('Frequency (Hz)'); ylabel('Amplitude'); xlim([0,1200])
grid on; title('Amplitude Spectra for Segments')
Try it. Good luck.
William Rose on 16 Apr 2022
I wrote a script that extract 3 seconds of sound from each vocalization. As I said before,the times of note-singing are approximately: 1-5, 22-27, 42-46, 61-66, 82-87, 102-107 seconds.
Therefore I extract sound from 2-5, 23-26, 43-46, 62-62, 83-86, 103-106 seconds.
I measure the mean frequency and the frequency of maxmimum power in each segment.
The max.power frequencies are about 620-630 Hz, consistent with the subjects singing E flat 5, also known as the E flat above treble C. The expected frequency of this pitch is 622 Hz, with A440 equal temperament tuning.
The script plots the max frequency for each segment and the power spectrum for each segment.
You confined the frequency search to 0.9 - 1.4 times middle C. This singing signal has very little power in that frequency range. Most of the power is around 630 Hz. I initially thought thse children were singing in octave 4 (using scientific pitch notation). Now I think they are singing an octave higher, in octave 5. It is not always easy to decide.
My code also creates a file, A1sel.wav, which is the selected audio segments, plus 1 second of silence after each segment. The graphical output from the script is below.