Remove duplicate variables depending on a variable in a second column

2 vues (au cours des 30 derniers jours)
Marty Dutch
Marty Dutch le 24 Sep 2015
Commenté : Marty Dutch le 25 Sep 2015
Dear experts, I have a list of variables where I need to remove duplicate variables based on the variable in column 2. Variables with a '1' in column 2 are of better quality than variables with a '0'.
1) In case of duplicate variables, I want to keep the variables that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column.
2) In case of duplicate variables which all have a zero in the second column then it needs to keep randomly only one variable. See example below: I need to keep one variable of BG1027 (random choice).
I hope it is clear. Im puzzling how to do this. This is the code I came up with so far with help from Kirby Fear.
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Get names of duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random.
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end
end

Réponse acceptée

WAT
WAT le 24 Sep 2015
Give something like this a try:
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
[uniqNames, ia, ic] = unique(ppn(:,1));
ia = [ia; 1+length(ic)];
ppn_out = {}; % initialize output
for i = 1:length(uniqNames);
sub = ppn(ia(i):ia(i+1)-1,:); % find only entries with uniqNames(i)
sub = sub(find(cell2mat(sub(:,2)) == max(cell2mat(sub(:,2)))),:); % find only those entries with the maximal value in col 2
ppn_out = [ppn_out; sub(randi(size(sub,1)),:)]; % select one entry at random, put it in ppn_out
end
  3 commentaires
WAT
WAT le 24 Sep 2015
Modifié(e) : WAT le 24 Sep 2015
That's odd, it's also skipping BG1026 for you. It seems to be behaving fine for me, I wonder if there's something goofy in the unique() command? (I'm on R2013a or R2015a and it works fine on both)
Try getting rid of all the semicolons, it's short enough that it should be easy to follow what the code is doing.
Marty Dutch
Marty Dutch le 25 Sep 2015
Yes, youre right. I tried running it on 2013b and in 2015a and in both it seems to work fine now... Thanks for the response. It does not worked in the 2012b version.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by