concatenating and comparing two datsets
Afficher commentaires plus anciens
Hi,
I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.
Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.
- for the first dataset I want to concatenate the first 8 columns.
- For the second data sets I want to do the same as step.1
- If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.
I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.
Any help greatly appreciated.
Thanks.
SSR
Réponse acceptée
Plus de réponses (1)
Something like this should work:
dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx'); %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx'); %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')
edit: rereading the description, it sounds like a right outerloin, not a left one
3 commentaires
Sheetal Shirsat
le 16 Août 2019
Modifié(e) : Guillaume
le 16 Août 2019
Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.
As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:
%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);
The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:
%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))]; %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end); %and copy relevant elements
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets
"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?
We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.
Catégories
En savoir plus sur Logical dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!