How to remove extra value from histogram In MATLAB

Hello everyone, I hope you are doing well.
I have the following dataset in which i have a pattern, there are some values which are the outliers or you can say the missing values which occur in different place. i want to remove the values using histogram.
i have compute the histogram of the data as you can see in image untitled.jpg. There are three values 4800 5130 5540 which have histogram value of 322, 317 and 312 while the other have value less then 50.
I want to keep the Values and Indexes of 50% (in above case 161) of maximum value of histogram and remove the remaining values.
I have write the following code. But it just return a single value not the original matrix (4800 5130 5540)
Can any body help me in that Please
h=histogram(Values)
sumofbins=max(h.Values);
size_MP=round(50/100*sumofbins);
ValueofHistogram= h.Values;
Bindata=h.Data
for i=1: length(ValueofHistogram)
if ValueofHistogram(i)<size_MP;
Bindata(i)=0;
end
end

 Réponse acceptée

Try this:
s = load('his.mat')
data = s.Values;
maxValue = max(data)
brightData = data(data >= 0.5 * maxValue)
histogram(brightData);
grid on;

9 commentaires

@Image Analyst This is not working
It is. I actually downloaded your data and ran the code and it worked. It made this histogram
You can see that the max value of your data is 21,010 and the histogram does not show any values less than 10,505, just as you requested: "I want to keep the Values and Indexes of 50% (in above case 161) of maximum value of histogram and remove the remaining values."
"brightData" is the data from the original array with the values more (greater) than half the max removed.
@Image Analyst Not in that why, Can you check the solution above . I want that kind of solution with deleted indexes values
Give no name time - I'm sure he'll give you what you want eventually.
@Image Analyst this code i want the indexes of deleted value from original dataset. for example the sample 18 has value 10340 which is deleted, I want that indexes
can you please modified it
load('His.mat')
h=histogram(Values);
sumofbins=max(h.Values);
size_MP=round(50/100*sumofbins);
ValueofHistogram= h.Values;
Bindata=h.Data;
Binedges=h.BinEdges;
Binedges(end) = Inf;
deleted_values_idx = [];
for i=1: length(ValueofHistogram)
if ValueofHistogram(i)<size_MP;
deleted_values_idx(end+1) = i;
Bindata(Bindata >= Binedges(i) & Bindata < Binedges(i+1)) = [];
end
end
xl = xlim();
h.Data = Bindata;
xlim(xl); % restore axes xlim, if you want to
Sort the values and then find out what bin is the index between the lower and upper 50%. Then delete the darker half.
load('His.mat')
% h=histogram(Values, 100)
maxValue = max(Values)
% Sort the values
[sortedValues, sortOrder] = sort(Values, 'ascend');
% Find out the value of the darkest 50% of the elements.
index50 = round(numel(sortedValues) / 2)
value50 = sortedValues(index50)
% Find out which elements are less than value50 and delete them.
% This will keep the brightest 50% of the elements.
indexesToDelete = find(Values < value50);
Values2 = Values; % Initialize. Make a new output variable.
Values2(indexesToDelete) = []; % Delete elements with low values.
@Image Analyst you are deleting the wrong values you deleted the minimum value i want to delete the value which have low in number for example 4800 has come 312 time so it should remain in array . you are removing 4800 from array
I almost understand now. You want to remove values that fall into bins with more than 50 counts in them. But what if a value does not occur more than 50 times but is in the bin. For example lets say that you have 312 instances of 4800 and 9 instances of 4801, and your bin includes values from 4800 to 4900 inclusive. Do you want the 9 instances of 4801 removed from the data also? If so, this code will do it:
load('His.mat')
uniqueValues = unique(Values)
whos Values
histObject = histogram(Values, 'BinEdges', uniqueValues)
grid on;
% Find out which bins have more than 50 counts in them.
bins50 = find(histObject.Values >= 50)
indexesToDelete = false(1, length(Values)); % Array to keep track of what values to delete.
% Delete the values from the original data if they are in the bin with more than 50
for k = 1 : length(bins50)
thisIndex = bins50(k);
% Get values included in this histogram bin.
lowValue = histObject.BinEdges(thisIndex);
highValue = histObject.BinEdges(thisIndex+1);
% Find indexes of original data where these values lie.
theseIndexes = (Values >= lowValue) & (Values < highValue);
% Mark for deletion.
indexesToDelete(theseIndexes) = true;
end
% Delete the elements
Values(indexesToDelete) = [];
whos Values
I think it will do what you want now. It gives you the indexes in your original data where the counts are less than 50% of the max count. It then uses those indexes to delete those infrequently occurring data from the original data set
load('His.mat')
uniqueValues = unique(Values)
whos Values
subplot(2, 1, 1);
histObject = histogram(Values, 'BinEdges', uniqueValues)
grid on;
% "I want to keep the Values and Indexes of 50% (in above case 161) of maximum value of histogram
% and remove the remaining values."
maxBinCounts = max(histObject.Values)
% Find out which bins have fewer counts than 50% of the max bin count in them.
bins50 = find(histObject.Values <= 0.50 * maxBinCounts)
indexesToDelete = false(1, length(Values)); % Array to keep track of what values to delete.
% Delete the values from the original data if they are in the bin with less than 50
for k = 1 : length(bins50)
thisIndex = bins50(k);
% Get values included in this histogram bin.
lowValue = histObject.BinEdges(thisIndex);
highValue = histObject.BinEdges(thisIndex+1);
% Find indexes of original data where these values lie.
theseIndexes = (Values >= lowValue) & (Values < highValue);
% Mark for deletion.
indexesToDelete(theseIndexes) = true;
end
% Delete the elements. theseIndexes are the indexes of the lower count values in the original data set.
% By the way it's confusing to call your data "Values" because the histogram object calls
% them "Data" and has another variable for "Values" which is the counts in the bins.
% I'd recommend you call your original data "Data" instead of "Values" to avoid confusion.
Values(indexesToDelete) = [];
whos Values
subplot(2, 1, 2);
histObject = histogram(Values, 'BinEdges', uniqueValues)
grid on;

Connectez-vous pour commenter.

Plus de réponses (1)

You can't change the 'Values' property of a histogram directly, but you can change its underlying 'Data'. In this case, you can remove data from within those bins whose Value is less than half the maximum Value:
load('His.mat')
h=histogram(Values)
h =
Histogram with properties: Data: [4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 10340 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 5540 4800 5130 10340 5130 … ] Values: [312 317 322 0 0 0 0 0 0 0 18 17 10 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1] NumBins: 34 BinEdges: [4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000 10500 11000 11500 12000 12500 13000 13500 14000 14500 15000 15500 16000 16500 17000 17500 18000 18500 19000 19500 20000 20500 21000 21500] BinWidth: 500 BinLimits: [4500 21500] Normalization: 'count' FaceColor: 'auto' EdgeColor: [0 0 0] Show all properties
sumofbins=max(h.Values);
size_MP=round(50/100*sumofbins);
ValueofHistogram= h.Values;
Bindata=h.Data;
Binedges=h.BinEdges;
Binedges(end) = Inf;
for i=1: length(ValueofHistogram)
if ValueofHistogram(i)<size_MP;
Bindata(Bindata >= Binedges(i) & Bindata < Binedges(i+1)) = [];
end
end
xl = xlim();
h.Data = Bindata;
xlim(xl); % restore axes xlim, if you want to

12 commentaires

@_ Thanks for your answer. Can i also get the indexes of the values which is less then 50% of maximum (Which is deleted)
Voss
Voss le 17 Avr 2022
Modifié(e) : Voss le 18 Avr 2022
Yes:
load('His.mat')
h=histogram(Values);
sumofbins=max(h.Values);
size_MP=round(50/100*sumofbins);
ValueofHistogram= h.Values;
Bindata=h.Data;
Binedges=h.BinEdges;
Binedges(end) = Inf;
deleted_data_idx = false(size(Bindata));
for i=1: length(ValueofHistogram)
if ValueofHistogram(i)<size_MP;
deleted_data_idx(Bindata >= Binedges(i) & Bindata < Binedges(i+1)) = true;
end
end
Bindata(deleted_data_idx) = [];
xl = xlim();
h.Data = Bindata;
xlim(xl); % restore axes xlim, if you want to
disp(find(deleted_data_idx));
Columns 1 through 37 18 44 49 106 110 140 155 191 223 228 241 258 295 298 301 305 306 362 386 392 406 452 462 468 489 496 510 512 523 560 583 612 666 673 688 696 721 Columns 38 through 49 749 797 817 831 857 888 905 929 948 967 993 998
@_ The output Bindata shape is 1x951, Then there should be 49 indexes but deleted_values_idx gives 1x31 indexes why?
@_ @_ i want the indexes of deleted value from original dataset. for example the sample 18 has value 10340 which is deleted, I want that indexes
Voss
Voss le 18 Avr 2022
@Med Future Sorry I misunderstood the request. I have modified my comment above to show the indices of the deleted data.
@_ i have attached the dataset,
As you can see the value of 4340 is not remove . Can you modified the above code?
A histogram of all data in newone:
S = load('newone.mat');
Values = S.ans;
h=histogram(Values);
I want to check if 4340 is in there:
find(Values == 4340)
ans = 1×16
23 105 190 310 325 541 564 593 620 639 689 708 789 849 864 871
It is.
Now, applying the above code to newone:
h=histogram(Values);
sumofbins=max(h.Values);
size_MP=round(50/100*sumofbins);
ValueofHistogram= h.Values;
Bindata=h.Data;
Binedges=h.BinEdges;
Binedges(end) = Inf;
deleted_data_idx = false(size(Bindata));
for i=1: length(ValueofHistogram)
if ValueofHistogram(i)<size_MP;
deleted_data_idx(Bindata >= Binedges(i) & Bindata < Binedges(i+1)) = true;
end
end
Bindata(deleted_data_idx) = [];
xl = xlim();
h.Data = Bindata;
xlim(xl); % restore axes xlim, if you want to
disp(find(deleted_data_idx));
28 40 75 91 125 168 198 218 237 257 279 285 286 341 353 371 372 377 389 415 474 502 516 565 573 664 741 777 803 812 815 915
4340 is still in there:
find(Bindata == 4340)
ans = 1×16
23 101 184 297 312 518 541 568 595 614 663 682 761 818 833 840
@Stephen john: Here is a question for you: Why would I modify the code to remove 4340, when the specification was to remove data that falls into bins whose height is less than 50% of the maximum bin height? As you can see, 4340 is in the tallest bin. Are you changing the specification of the question?
@_ No changing the question
here is the data i think because of BInsize and BinEdges there are some values which are remaining and not be removed How can i removed that
@_ in newone data your code is fine working but the bin width is 4000 to 5000 so the values which should be removed are still remaining at 4340.
Voss
Voss le 18 Avr 2022
@Stephen john Why should the values at 4340 in newone.mat be removed? Please explain how to determine whether a data point is removed or not removed, in general.
I thought the process was to remove points that fall into bins smaller than 50% of the largest bin. The bin from 4000 to 5000 is the largest bin, so its data is not removed. If data points of value 4340 should in fact be removed, then obviously I am not understanding what the process should be. Please advise.
@_ let me explain it to you The count values of 4340 is very much less it does not make shape/pattern which i want to detect. the value 4340 make a extra pattern which has not complete shape just some points. is there any method in which i can remove the value which have less number of counts
Try my well commented last comment below, at the end of my answer. I think it will do what you want now. It gives you the indexes in your original data where the counts are less than 50% of the max count. It then uses those indexes to delete those infrequently occurring data from the original data set.

Connectez-vous pour commenter.

Catégories

Produits

Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by