Remove rows in an array containing a non-matching element
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a datafile data.txt:
gene12 489 483 838
gene82 488 763 920
gene31 974 837 198
gene45 489 101 378
gene59 89 827 138
I have another data file genelist.txt that lists just genes I'm interested in for my study:
gene45
gene59
gene61
I want to modify the first dataset by removing all rows where the gene isn't found in the second list so basically end up with this array:
gene45 489 101 378
gene59 89 827 138
How do I go about doing this?
0 commentaires
Réponse acceptée
Guillaume
le 11 Avr 2017
Probably the easiest:
geneswithdata = readtable('data.txt'); %load file as a table
geneswithdata.Properties.VariableNames{1} = 'genes'; %rename first column for clarity (optional).
%I would also rename all the other columns
genesonly = readtable('genelist.txt'); %load as a table
genesonly.Properties.VariableNames = {'genes'}; %rename columns. Common columns must have the same name
filteredgenes = innerjoin(genesonly, geneswithdata);
Done.
Using ismember that last line could be done as:
found = ismember(geneswithdata, genesonly);
filteredgenes = geneswithdata(found, :);
Using intersect (rather than setdiff) it could be done as:
[~, tokeep] = intersect(geneswithdata, genesonly);
filteredgenes = geneswithdata(tokeep, :);
3 commentaires
Guillaume
le 12 Avr 2017
By default, readtable considers the first line as a header line that is to be used to name the variables. To tell it to not do that:
readtable(___, 'ReadVariableNames', false)
Plus de réponses (1)
Voir également
Catégories
En savoir plus sur Tables dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!