Removing outliers from the data creates gaps. Filling these gaps with missing values or the median of surrounding values does not address the issue.Why?
    12 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
I am analyzing EMG data in windows. In each window, I apply z-score normalization to identify and remove outliers. To address the gaps created by removing these outliers, I attempt to fill the empty spaces with the median of the surrounding values. Additionally, I have experimented with MATLAB built-in functions such as 'movmedian' for this purpose.
here is my function:
function data_clean = remove_outliers_and_fill(data)
% Calculate z-scores for each column
z_scores = zscore(data);
% Define outlier threshold
threshold =3;
% Identify outliers
outliers = abs(z_scores) > threshold;
% Copy data to preserve original shape
data_clean = data;
% Loop through each column
[num_rows, num_cols] = size(data);
for col = 1:num_cols
    for row = 1:num_rows
        if outliers(row, col)
            range_start = max(1, row-10);
            range_end = min(num_rows, row+10);
            neighbors = data(range_start:range_end, col);
            % Exclude the outlier from median calculation
            filtered_neighbors = neighbors(neighbors ~= data(row, col));
            median_value = median(filtered_neighbors);
            data_clean(row, col) = median_value;
        end
    end
end
end
here is the plot where it creates gaps after applying the above function.

2 commentaires
Réponse acceptée
  Star Strider
      
      
 le 17 Juin 2024
        Your version/release is not stated, however beginning with R2017a, the filloutliers function has been available.  Using the 'median' or 'mean' as the ‘findmethod’ (I use 'median' here), it will automatically consider as outlliers anything within outside ±3 standard deviations (equivalent to your ‘zscore’ reference).  See the documentation I linked to here for details.  
If you have R2017a or a later version/release, try this — 
V1 = readmatrix('data.csv');
L = numel(V1);
X1 = linspace(0, L-1, L);
figure
plot(X1, V1)
grid
xlim([4300 5000])
title('Original')
[B,TF,L,U,C] = filloutliers(V1, 'linear', 'median');
figure
plot(X1, V1, 'DisplayName','Original Data')
hold on
plot(X1, B, '-r', 'DisplayName','Outliers Filled (Linear Interpolation)')
hold off
grid
xlim([4300 5000])
legend('Location','best')
title('Filled Outliers')
.
2 commentaires
  Star Strider
      
      
 le 18 Juin 2024
				As always, my pleasure!  
I am not certain what you intend by ‘I am looking for something which can remove the outliers from both time and frequency.’  If you want to remove the outliers rather than fill them by interpolating them, you can use the rmoutliers function.  I do not usually suggest that because it disrupts the integrity of the data.  
If you want to remoove specific frequencies from your data, use the Signal Processing Toolbox to create frequency-selective filters.  There are several filtering options, and I can help you design and implement the filters.  
One caution however is that it will be necessary to have a matching vector of sampling times for each dependent variable data element before you do any processing of the data.  The reason is that the sampling times provide the frequency information and the regularity of the samples themselves.  For optimal performace, the sampling frequency must be constant, and the sampling intervals consistent from sample to sample.  If that is not the situation for your data, there is a function (resample) that can regularise the sampling frequency (and interpolate the dependent variable data) to proivide that.  At that point, you can use various filters.  Again, I can help you design and implement them.  
.
Plus de réponses (1)
  Nipun
      
 le 17 Juin 2024
        Hi Seemab,
I understand that you want to remove outliers from your EMG data, fill the gaps with the median of the surrounding values, and avoid gaps in the resulting data. The gaps might be due to not considering edge cases correctly or the outlier removal leaving isolated data points.
Here's an improved version of your function to address the gaps:
- Use movmedian to smooth the data after outlier removal.
- Ensure the median replacement does not create new outliers
function data_clean = remove_outliers_and_fill(data)
    % Calculate z-scores for each column
    z_scores = zscore(data);
    % Define outlier threshold
    threshold = 3;
    % Identify outliers
    outliers = abs(z_scores) > threshold;
    % Copy data to preserve original shape
    data_clean = data;
    % Loop through each column
    [num_rows, num_cols] = size(data);
    for col = 1:num_cols
        for row = 1:num_rows
            if outliers(row, col)
                range_start = max(1, row-10);
                range_end = min(num_rows, row+10);
                neighbors = data(range_start:range_end, col);
                % Exclude the outlier from median calculation
                filtered_neighbors = neighbors(neighbors ~= data(row, col));
                median_value = median(filtered_neighbors);
                data_clean(row, col) = median_value;
            end
        end
    end
    % Use movmedian to smooth the data after filling
    window_size = 5; % Adjust window size as needed
    for col = 1:num_cols
        data_clean(:, col) = movmedian(data_clean(:, col), window_size);
    end
end
Example Usage
% Sample data (replace with actual EMG data)
data = randn(5000, 1) * 1e-5;
% Add some artificial outliers for testing
data(4700:4720) = 3e-5;
% Clean the data
data_clean = remove_outliers_and_fill(data);
% Plot original and cleaned data
figure;
subplot(2,1,1);
plot(data);
title('Original Data');
xlabel('Time (windows)');
ylabel('Amplitude');
subplot(2,1,2);
plot(data_clean);
title('Cleaned Data');
xlabel('Time (windows)');
ylabel('Amplitude');
For more information on the movmedian function, refer to the MathWorks documentation: https://www.mathworks.com/help/matlab/ref/movmedian.html
Hope this helps.
Regards,
Nipun
1 commentaire
Voir également
Catégories
				En savoir plus sur Statistics and Linear Algebra dans Help Center et File Exchange
			
	Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!





