Preserving relative proportions in graph after introducing bias in log scale plot

7 vues (au cours des 30 derniers jours)
I am trying to plot multiple spectra togheter with a prediction interval based on historic data with the aim of detecting outliers using the code below:
load('allSpectraData')
for m = 1:1
desiredMeasurementPoint = 'Data';
% Retrieve the spectra data for the current measurement point
if isfield(allSpectraData, desiredMeasurementPoint)
spectraDataMP = allSpectraData.(desiredMeasurementPoint);
else
disp(['No data found for MP: ', desiredMeasurementPoint]);
end
% Initialize a new figure for each MP
figure;
disp(['Generating plot for MP: ', desiredMeasurementPoint]); % Debug line to check the MP
hold on;
% Get the list of years in the spectraData structure for this MP
yearsList = fieldnames(spectraDataMP);
disp(['Found years for MP: ', desiredMeasurementPoint, ': ', strjoin(yearsList, ', ')]);
colors = lines(length(yearsList)); % Generate a distinct color for each year
plotDate = datestr( datetime('today'), 'yy/mm'); % Format date for the legend (optional)
% Iterate over each year in the spectraData structure for the current MP
% Initialize matrices to store spectrum data for prediction band calculation
allFrequencies = [];
allPowerSpectra = [];
for j=1:length(yearsList)
yearString = yearsList{j}; % Current year as a string (e.g., 'x2007')
% Get the spectrum data for this year and MP
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
% Extract frequencies and power spectra for this year
frequencies = ut(:, 2); % Frequency values
powerSpectra = ut(:, 1); % Power (dB) values
% Store data for prediction band calculation
allFrequencies = union(allFrequencies, frequencies);
allPowerSpectra = [allPowerSpectra, powerSpectra];
end
% Initialize vector to store interpolated values for frequencies that
% are not included in the orginal data set for historic data
IPMPS=[];
for i = 1:length(yearsList)-1
yearString = yearsList{i}; % Current year as a string (e.g., 'x2007')
% Extract the numeric part of the year using regexp
year = regexp(yearString, '\d{4}', 'match', 'once'); % Extract the first 4 digits
% Get the spectrum data for this year and MP
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
% Get interpolated power spectral densities for non existing
% frequencies
utniterP=interp1(ut(:, 2), ut(:, 1), allFrequencies, 'linear', 'extrap');
IPMPS = [IPMPS, utniterP];
if length(yearsList)-1==0
IPMPS=utniterP;
end
end
% Caluclate prediction interval based on the historic data
varMPhist=var(IPMPS,0,2); % Variance for each frequency
meanMPhist=mean(IPMPS,2); % Mean for each frequency
t_critical=tinv(1-(1-0.9995),length(yearsList)-1); % Critical t-test value
a=sqrt(varMPhist+varMPhist./(length(yearsList)-1)); % Standard deviation
PIupper=meanMPhist+t_critical*a; % Upper prediction band
PIlower=meanMPhist-t_critical*a; % Lower prediction band
%Plot the the 5 latest measurments if there are multiple measurements
%for that MP
for i = 1:length(yearsList)
yearString = yearsList{i}; % Current year as a string (e.g., 'x2007')
% Extract the numeric part of the year using regexp
year = regexp(yearString, '\d{4}', 'match', 'once'); % Extract the first 4 digits
% Get the spectrum data for this year and MP
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
% Retrieve the sampling frequency `Fs`
Fs = spectraDataMP.(yearString).Fs; % Access the Fs field for this year
% Ensure ut is not empty
if isempty(ut)
disp(['No data for year ', year, ' and MP ', desiredMeasurementPoint]);
continue; % Skip if no data to plot for this year
end
if i>length(yearsList)-5
cleanedMeasurementPoint = strrep(desiredMeasurementPoint, '_U_V_', '');
% Plot the spectrum for this year and MP with proper DisplayName format
plot(ut(:, 2), ut(:, 1), 'DisplayName', sprintf('%s - %s - %s - Fs: %s Hz - %s',year, cleanedMeasurementPoint, num2str(Fs), plotDate), 'Color', colors(i, :)); % Frequency (x), Power (y)
elseif 0>=length(yearsList)-5
plot(ut(:, 2), ut(:, 1), 'DisplayName', sprintf('%s - %s - %s - Fs: %s Hz - %s',year, cleanedMeasurementPoint, num2str(Fs), plotDate), 'Color', colors(i, :)); % Frequency (x), Power (y)
end
end
% Plot prediction interval based on historic data and adjust axis
% limits
if length(yearsList)-1>0
% Remove neagtive prediction
if isempty(PIlower(PIlower<0))
plot(allFrequencies,PIupper,'k',allFrequencies,PIlower,'k','HandleVisibility','off')
xlabel('Frequency (Hz)');
ylabel('Power (dB)');
xlim([1e-2, 1e2]); % Limit the x-axis to a range (you can adjust this as needed)
elseif isempty(PIlower(PIlower>0))
PIlower=zeros(size(PIlower));
plot(allFrequencies,PIupper,'k',allFrequencies,PIlower,'k','HandleVisibility','off')
xlabel('Frequency (Hz)');
ylabel('Power (dB)');
xlim([1e-2, 1e2]); % Limit the x-axis to a range (you can adjust this as needed)
else
plot(allFrequencies,PIupper,'k',allFrequencies,PIlower,'k','HandleVisibility','off')
xlabel('Frequency (Hz)');
ylabel('Power (dB)');
xlim([1e-2, 1e2]); % Limit the x-axis to a range (you can adjust this as needed)
ylim([min(min(PIlower(PIlower>realmin)),min(min(IPMPS(IPMPS>0)))) max(max(PIupper),max(max(IPMPS)))])
end
else
% Set logarithmic scale for both axes
set(gca, 'XScale', 'log', 'YScale', 'log');
% Show legend
legend('show','Location', 'southwest'); % Show legend to label each line by year and measurement point
hold off;
continue
end
% Initialize a flag to ensure only one legend entry for the red dot
outlierLegendAdded = false;
regionLegendAdded = false;
% Get the spectra for the last year in the year range for the specified
% MP
ut=spectraDataMP.(yearsList{end}).(desiredMeasurementPoint);
% Interpolate to find values for frequencies not included in the
% original data
Ut=interp1(ut(:, 2), ut(:, 1), allFrequencies, 'linear', 'extrap');
% Find indices for power spectral densities where the latest measurment exceeds the
% prediction intervals
exceedIndices=find(Ut>PIupper);
% Initialize vector for storing frequencies where the prediction
% interval is exceeded
freqexceed=[];
%Extract frequencies
for k=1:length(exceedIndices)
freqexceed(k)=allFrequencies(exceedIndices(k));
end
% Extract which of the exceeding indices belongs to the orginal data
% set that has no interpolated values
utindices=find(ismember(ut(:,2),freqexceed)>0);
% Initialize vector for storing frequencies where the power spectral
% density exceeds the prediction band that belongs to the original data
% set
utfreqexceed=[];
% Extract frequencies
for k=1:length(utindices)
utfreqexceed(k)=ut(utindices(k),2);
end
%Find exceeding indices that belong to the orginal data set
exceedIndices=find(ismember(allFrequencies,utfreqexceed)>0);
if ~isempty(exceedIndices)
intervals = diff(exceedIndices) >100; % Find breaks in consecutive indices
groupEdges = [0; find(intervals); length(exceedIndices)]; % Get edges of groups
for idx = 1:length(groupEdges)-1
% Get the frequency and power at the index where the spectrum exceeds PIupper
% Get the indices for the current group
groupStart = groupEdges(idx) + 1;
groupEnd = groupEdges(idx + 1);
groupIndices = exceedIndices(groupStart:groupEnd);
% Find the middle index of the group
middleIndex = groupIndices(ceil(length(groupIndices) / 2));
% Get the frequency and power at the middle index
freq = allFrequencies(middleIndex);
power = Ut(middleIndex);
% Plot a red dot at the middle point
if ~outlierLegendAdded
% Add only the first red dot to the legend
plot(freq, power, 'ro', 'MarkerSize', 6, 'MarkerFaceColor', 'r', ...
'DisplayName', 'Outlier');
outlierLegendAdded = true;
text(freq, ...
power + 0.0001, ... % Place text just above the region
sprintf('%.2f Hz', ceil(freq)), ...
'HorizontalAlignment', 'center', 'VerticalAlignment', 'bottom', ...
'FontSize', 10, 'FontWeight', 'bold', 'Color', 'r');
else
% Add red dots without updating the legend
plot(freq, power, 'ro', 'MarkerSize', 6, 'MarkerFaceColor', 'r', ...
'HandleVisibility', 'off');
text(freq, ...
power + 0.00001, ... % Place text just above the region
sprintf('%.2f Hz', ceil(freq)), ...
'HorizontalAlignment', 'center', 'VerticalAlignment', 'bottom', ...
'FontSize', 10, 'FontWeight', 'bold', 'Color', 'r');
end
end
end
% Set logarithmic scale for both axes
set(gca, 'XScale', 'log', 'YScale', 'log');
hold off;
end
Generating plot for MP: Data
Found years for MP: Data: x2011, x2012, x2013, x2014, x2015, x2016, x2017, x2018, x2019, x2021, x2022, x2023, x2024
The issue with the generated plot is then that due to the variance being to high for certain frequencies the lower prediction interval becomes negative and is therefore not included in the plot. To solve this i altered the code to introduce a bias to displace the negative values, see code below:
load('allSpectraData')
for m = 1:1
desiredMeasurementPoint = 'Data';
% Retrieve the spectra data for the current measurement point
if isfield(allSpectraData, desiredMeasurementPoint)
spectraDataMP = allSpectraData.(desiredMeasurementPoint);
else
disp(['No data found for MP: ', desiredMeasurementPoint]);
continue; % Skip if no data exists for this MP
end
% Initialize a new figure for each MP
figure;
disp(['Generating plot for MP: ', desiredMeasurementPoint]); % Debug line to check the MP
hold on;
% Get the list of years in the spectraData structure for this MP
yearsList = fieldnames(spectraDataMP);
disp(['Found years for MP: ', desiredMeasurementPoint, ': ', strjoin(yearsList, ', ')]);
colors = lines(length(yearsList)); % Generate a distinct color for each year
plotDate = datestr(datetime('today'), 'yy/mm'); % Format date for the legend (optional)
% Initialize matrices to store spectrum data for prediction band calculation
allFrequencies = [];
allPowerSpectra = [];
for j = 1:length(yearsList)
yearString = yearsList{j}; % Current year as a string (e.g., 'x2007')
% Get the spectrum data for this year and MP
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
% Extract frequencies and power spectra for this year
frequencies = ut(:, 2); % Frequency values
powerSpectra = ut(:, 1); % Power (dB) values
% Store data for prediction band calculation
allFrequencies = union(allFrequencies, frequencies);
allPowerSpectra = [allPowerSpectra, powerSpectra];
end
% Initialize vector to store interpolated values for frequencies that
% are not included in the original data set for historic data
IPMPS = [];
for i = 1:length(yearsList) - 1
yearString = yearsList{i};
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
% Get interpolated power spectral densities for non-existing frequencies
utniterP = interp1(ut(:, 2), ut(:, 1), allFrequencies, 'linear', 'extrap');
IPMPS = [IPMPS, utniterP];
if length(yearsList) - 1 == 0
IPMPS = utniterP;
end
end
% Calculate prediction interval based on historic data
varMPhist = var(IPMPS, 0, 2); % Variance for each frequency
meanMPhist = mean(IPMPS, 2); % Mean for each frequency
t_critical = tinv(1 - (1 - 0.9995), length(yearsList) - 1); % Critical t-test value
a = sqrt(varMPhist + varMPhist ./ (length(yearsList) - 1)); % Standard deviation
PIupper = meanMPhist + t_critical * a; % Upper prediction band
PIlower = meanMPhist - t_critical * a; % Lower prediction band
% Adjust prediction interval with a bias to ensure PIlower >= 0
minPIlower = min(PIlower);
if minPIlower < 0
bias = abs(minPIlower)+1e-9; % Bias to shift all values above 0
else
bias = 0; % No bias needed if all values are already >= 0
end
PIlower = PIlower + bias;
PIupper = PIupper + bias;
% Plot the the 5 latest measurements if there are multiple measurements for that MP
for i = 1:length(yearsList)
yearString = yearsList{i};
% Extract the numeric part of the year using regexp
year = regexp(yearString, '\d{4}', 'match', 'once'); % Extract the first 4 digits
ut = spectraDataMP.(yearString).(desiredMeasurementPoint);
Fs = spectraDataMP.(yearString).Fs; % Access the Fs field for this year
% Ensure ut is not empty
if isempty(ut)
disp(['No data for year ', yearString, ' and MP ', desiredMeasurementPoint]);
continue;
end
if i > length(yearsList) - 5
cleanedMeasurementPoint = strrep(desiredMeasurementPoint, '_U_V_', '');
plot(ut(:, 2), ut(:, 1)+bias, 'DisplayName', ...
sprintf('%s - %s - %s - Fs: %s Hz - %s', year, ...
cleanedMeasurementPoint, num2str(Fs), plotDate), 'Color', colors(i, :));
elseif 0>=length(yearsList)-5
plot(ut(:, 2), ut(:, 1), 'DisplayName',...
sprintf('%s - %s - %s - Fs: %s Hz - %s', year,...
cleanedMeasurementPoint, num2str(Fs), plotDate), 'Color', colors(i, :)); % Frequency (x), Power (y)
end
end
% Plot prediction interval
if length(yearsList) - 1 > 0
plot(allFrequencies, PIupper, 'k', allFrequencies, PIlower, 'k', 'HandleVisibility', 'off')
xlabel('Frequency (Hz)');
ylabel('Power (dB)');
title(['Power Spectrum Density: ', cleanedMeasurementPoint]);
xlim([1e-2, 1e2]); % Adjust x-axis limits
else
set(gca, 'XScale', 'log', 'YScale', 'log');
legend('show', 'Location', 'southwest');
hold off;
continue;
end
% Initialize a flag to ensure only one legend entry for the red dot
outlierLegendAdded = false;
regionLegendAdded = false;
% Get the spectra for the last year in the year range for the specified
% MP
ut=spectraDataMP.(yearsList{end}).(desiredMeasurementPoint);
% Interpolate to find values for frequencies not included in the
% original data
Ut=interp1(ut(:, 2), ut(:, 1), allFrequencies, 'linear', 'extrap');
% Find indices for power spectral densities where the latest measurment exceeds the
% prediction intervals
exceedIndices=find(Ut+bias>PIupper);
% Initialize vector for storing frequencies where the prediction
% interval is exceeded
freqexceed=[];
%Extract frequencies
for k=1:length(exceedIndices)
freqexceed(k)=allFrequencies(exceedIndices(k));
end
% Extract which of the exceeding indices belongs to the orginal data
% set that has no interpolated values
utindices=find(ismember(ut(:,2),freqexceed)>0);
% Initialize vector for storing frequencies where the power spectral
% density exceeds the prediction band that belongs to the original data
% set
utfreqexceed=[];
% Extract frequencies
for k=1:length(utindices)
utfreqexceed(k)=ut(utindices(k),2);
end
%Find exceeding indices that belong to the orginal data set
exceedIndices=find(ismember(allFrequencies,utfreqexceed)>0);
if ~isempty(exceedIndices)
intervals = diff(exceedIndices) >100; % Find breaks in consecutive indices
groupEdges = [0; find(intervals); length(exceedIndices)]; % Get edges of groups
for idx = 1:length(groupEdges)-1
% Get the frequency and power at the index where the spectrum exceeds PIupper
% Get the indices for the current group
groupStart = groupEdges(idx) + 1;
groupEnd = groupEdges(idx + 1);
groupIndices = exceedIndices(groupStart:groupEnd);
% Find the middle index of the group
middleIndex = groupIndices(ceil(length(groupIndices) / 2));
% Get the frequency and power at the middle index
freq = allFrequencies(middleIndex);
power = Ut(middleIndex)+bias;
% Plot a red dot at the middle point
if ~outlierLegendAdded
% Add only the first red dot to the legend
plot(freq, power, 'ro', 'MarkerSize', 6, 'MarkerFaceColor', 'r', ...
'DisplayName', 'Outlier');
outlierLegendAdded = true;
text(freq, ...
power + 0.0001, ... % Place text just above the region
sprintf('%.2f Hz', ceil(freq)), ...
'HorizontalAlignment', 'center', 'VerticalAlignment', 'bottom', ...
'FontSize', 10, 'FontWeight', 'bold', 'Color', 'r');
else
% Add red dots without updating the legend
plot(freq, power, 'ro', 'MarkerSize', 6, 'MarkerFaceColor', 'r', ...
'HandleVisibility', 'off');
text(freq, ...
power + 0.00001, ... % Place text just above the region
sprintf('%.2f Hz', ceil(freq)), ...
'HorizontalAlignment', 'center', 'VerticalAlignment', 'bottom', ...
'FontSize', 10, 'FontWeight', 'bold', 'Color', 'r');
end
end
end
% Set logarithmic scale for both axes
set(gca, 'XScale', 'log', 'YScale', 'log');
% Show legend
legend('show','Location', 'southwest'); % Show legend to label each line by year and measurement point
hold off;
end
Generating plot for MP: Data
Found years for MP: Data: x2011, x2012, x2013, x2014, x2015, x2016, x2017, x2018, x2019, x2021, x2022, x2023, x2024
This solved the problem with negative values but decreased the resolution of the graph towards higher frequencies as the bias is a lot larger than the power at these frequencies, which is the most interesting region. I was therefore wondering if someone has any idea of how i can change the scale on the y axis or scale the data so that the second graph retains the resolution of the first graph and thereby the same features?
  3 commentaires
Samuel
Samuel le 9 Déc 2024
Yes, but then i would lose the ability to see if the lower confidence bound is exceeded in that point
Mathieu NOE
Mathieu NOE le 9 Déc 2024
hello again .
I wonder if you could use a smoothed version of PIupper and PIlower to avoid sharp notches (try with smoothdata)
and you can also use ylim to restrict the displayed y range

Connectez-vous pour commenter.

Réponse acceptée

William Rose
William Rose le 6 Déc 2024
Modifié(e) : William Rose le 9 Déc 2024
[Edits: Correct spelling errors, including "ch2inv" to "chi2inv".]
[Edit: Correct formula for .]
I have not tried to read your many lines of code in detail. My sense is that you are not happy that you get confidence intervals for your spectra that go negative. That's bad, because a spectral intensity cannot go negative. The root cause of this problem is the use of an incorrect statistical model. The most common approach for estimating an approximate 95% confidence interval is: Conf. Int. = Mean +- 2*SD, where Mean and SD are estimated from the data. This implicitly assumes the data is normally distributed. That is an incorrect assumption for intensities, which are non-negative. Instead, assume the intensity at a given frequency has a scaled chi-square distribution (, where ), where we must estimate A and k from the data. This is a much more sound approach than adding a bias term to avoid negative values.
The 95% confidence interval for Y, at a given frequency, is
where and and y_bar=sample mean, s^2=sample variance, and = the 0.025 critical value of the chi-squared distribution with k degrees of freedom, etc. You can compute the critical values in Matlab with chi2inv(p,k), where p=0.025 or p=0.975 and k=estimated degrees of freedom. Adjust as needed, if you want a 99% C.I., etc. The formula above yields an asymmetric confidence interval that will never go negative.
I hope that you can apply the ideas above to solve the issue.
  6 commentaires
Samuel
Samuel le 12 Déc 2024
You are absolutley correct the unit on the y-axis is incorrect, thank you very much for your answers they have helped immensly

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Object Containers dans Help Center et File Exchange

Tags

Produits


Version

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by