concatenating and comparing two datsets

Question

0 votes

Hi,

I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.

Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.

for the first dataset I want to concatenate the first 8 columns.
For the second data sets I want to do the same as step.1
If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.

I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.

Any help greatly appreciated.

Thanks.

SSR

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Jon le 16 Août 2019

Modifié(e) : Jon le 16 Août 2019

Ouvrir dans MATLAB Online

1 vote

Assuming that the 9th column of both data sets is the same, I think you can do this just with an innerjoin

tbl1 = readtable('data1.xlsx')
tbl2 = readtable('data2.xlsx')
% join them to make third table which adds additional column to 
% second table where character columns match
tbl3 = innerjoin(tbl1,tbl2)

There are probably some use cases that Guillaume is covering in his answer that may be important, but maybe this simple approach works for what you are doing.

21 commentaires
Afficher 19 commentaires plus anciens Masquer 19 commentaires plus anciens

Jon le 20 Août 2019

Modifié(e) : Jon le 20 Août 2019

Ouvrir dans MATLAB Online

If you have more than one matching row then neither of our codes will work. It's just that Guillaume traps the problem with the assert (which is always good practice). I just wanted to give you a sketch of how the approach might be implemented, my objective wasn't to give you fully bullet proof code. You of course will need to do that for a production version.

You will have to modify the approach, by making a loop, and performing those same basic actions for each element of idx returned by the initial find.

idx = find(strcmp(car1.make,'subaru')&strcmp(car1.model,'forester'))

Basically just set up a for loop and where I now have idx you would instead have idx(k) where k is the for loop index. I think you should be able to work out the details.

I don't know if there is any advantage to deleteing the old rows and putting the new ones on the end as Guillaume does versus inserting them as I do, but either way accomplishes the same basic requirement.

Actually, it is probably simpler to take Guillaumes approach and add them to the end, but wait until you are all done with the loop before deleting out the old rows. Otherwise the indices that locate the rows that need to be changed could become invalid after the first loop iteration as everything will have shifted.

Jon le 20 Août 2019

Sorry Guilaaume. I read your response too quickly, and just saw the assert, and didn't notice that the later code handles multiple matching rows.

Guillaumes approach is much cleaner than my suggestion of putting it in a loop. Definitely do it that way!

I think you should be all set now.

Sheetal Shirsat le 21 Août 2019

This is Perfect !.

Thank you so very much Guillaume and Jon for your time and help.. ! Much appreciated !.

Best regards,

Shital

Connectez-vous pour commenter.

Answer 2

Guillaume le 16 Août 2019

Modifié(e) : Guillaume le 16 Août 2019

Ouvrir dans MATLAB Online

0 votes

Something like this should work:

dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx');  %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx');  %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')

edit: rereading the description, it sounds like a right outerloin, not a left one

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Guillaume le 16 Août 2019

Modifié(e) : Guillaume le 16 Août 2019

Ouvrir dans MATLAB Online

Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.

As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:

%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);

The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:

%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))];  %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end);  %and copy relevant elements

Guillaume le 19 Août 2019

Modifié(e) : Guillaume le 19 Août 2019

I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets

"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?

We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.

Connectez-vous pour commenter.

concatenating and comparing two datsets

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

21 commentaires
Afficher 19 commentaires plus anciens Masquer 19 commentaires plus anciens

Plus de réponses (1)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Community Treasure Hunt

concatenating and comparing two datsets

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

21 commentaires Afficher 19 commentaires plus anciens Masquer 19 commentaires plus anciens

Plus de réponses (1)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

21 commentaires
Afficher 19 commentaires plus anciens Masquer 19 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien