Hi,
I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.
Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.
  1. for the first dataset I want to concatenate the first 8 columns.
  2. For the second data sets I want to do the same as step.1
  3. If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.
I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.
Any help greatly appreciated.
Thanks.
SSR

 Réponse acceptée

Jon
Jon le 16 Août 2019
Modifié(e) : Jon le 16 Août 2019
Assuming that the 9th column of both data sets is the same, I think you can do this just with an innerjoin
tbl1 = readtable('data1.xlsx')
tbl2 = readtable('data2.xlsx')
% join them to make third table which adds additional column to
% second table where character columns match
tbl3 = innerjoin(tbl1,tbl2)
There are probably some use cases that Guillaume is covering in his answer that may be important, but maybe this simple approach works for what you are doing.

21 commentaires

Guillaume
Guillaume le 16 Août 2019
Modifié(e) : Guillaume le 16 Août 2019
True, there's probably no point in merging the variables as requested.If merging does make a difference on the keys (because the parts of an identical key is in different columns for different rows, e.g {'a', 'ab'} and {'aa', 'b'} makes the same key when merged), then I'd say the original design was deeply flawed.
My reading is that it's a right outerjoin (not a left as i initially wrote) that's wanted, not a an innerjoin.
You may also want to be explicit about the key columns (1:8) in case column 9 shares the same name in both tables (if there's no header it'll be Var9 for both).
Sheetal Shirsat
Sheetal Shirsat le 16 Août 2019
Hi
Thanks for your help but I am not able to use innerjoin , because its a cell array.
regards,
Shital
Guillaume
Guillaume le 16 Août 2019
Modifié(e) : Guillaume le 16 Août 2019
because its a cell array
What is "it"?
You told us "it" is an excel file. Both answers import the excel file as a table.
In any case, it's trivial to convert a cell array into a table.
Jon
Jon le 16 Août 2019
Here is a very simple example which I think illustrates what you are trying to accomplish.
If this isn't equivalent to what you are trying to do then please explain the differences.
Here is the result of running my small attached demo:
tbl1 =
4×4 table
make model year mileage
__________ __________ ____ _______
'chrysler' 'caravan' 2019 50000
'toyota' 'sienna' 2014 83000
'subaru' 'forester' 2013 90000
'toyota' 'corrola' 2018 45000
tbl2 =
3×3 table
make model year
____________ _________ ____
'chrysler' 'caravan' 2019
'toyota' 'corrola' 2018
'cheverolet' 'camaro' 2017
tbl3 =
2×4 table
make model year mileage
__________ _________ ____ _______
'chrysler' 'caravan' 2019 50000
'toyota' 'corrola' 2018 45000
Sheetal Shirsat
Sheetal Shirsat le 16 Août 2019
Modifié(e) : Sheetal Shirsat le 16 Août 2019
Hi
Thanks for your reply,
In the demo that you have shown,
  1. Your Table2 should also have a mileage column.
  2. The final Table3, should contain sum of mileages from 'chrysler' 'caravan' 2019 for Table1 and Table2 ,
  3. Table3 should have 'cheverolet' 'camaro' 2017 mileage from only Table1 (as there is no data for this in table2).
thats is to say I want Table3 to contain same rows as Table1.
The innerjoin function doesnot give me the sum and it doesnot give me those variables which are in Tabel1 but not in Table2.
is there an alternative function that can be used ?
many thanks and regards,
Shital
Jon
Jon le 16 Août 2019
OK I misunderstood, what you meant by "add" in your description "I want to add 10th column of first data to the 9th column of second dataset." I thought you wanted to create an additional column of data. I didn't realize you really meant "sum".
I would have to think more about this, but skimming quickly it looks like @Guillaume probably has already answered your question.
Sheetal Shirsat
Sheetal Shirsat le 16 Août 2019
Hi Jon,
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets . And thats becase of the different rows in dataset1 and dataset2.
Any help is greatly appreciated!
thanks and regards,
Shital
Hi Shital,
If I am finally understanding what it is you are trying to do, I think this may be what you are looking for. Please use the attached example input tables and run this:
car1 = readtable('car1.xlsx')
car2 = readtable('car2.xlsx')
% find where rows match for selected columns
[liA,locB] = ismember(car2(:,1:2),car1(:,1:2),'rows')
% modify the second table so that it records total mileage of both tables
% for rows in second table that have a match in the first table
car2.mileage(liA) = car2.mileage(liA) + car1.mileage(locB(liA))
If this does what you want, I think you can fairly easily adapt it to your specific situation. In particular the column ranges would go 1:8 instead of 1:2 and your last column is not named mileage, so you would have to change that.
Sheetal Shirsat
Sheetal Shirsat le 20 Août 2019
Hi Jon,
This is Perfect !!
This is exactly what I was trying to do..
Thank you so very much for your time and help in this.. Greatly appreciate this..
best regards,
Shital
Jon
Jon le 20 Août 2019
Great to hear that this is working for you.
Sheetal Shirsat
Sheetal Shirsat le 20 Août 2019
Modifié(e) : Sheetal Shirsat le 20 Août 2019
Hi Jon,
Thanks for this,
Just a query !
If I have to change the name of only one of the variables in your Car1.xls file, say Subaru Forester to Subaru Forester_LongwheelBase can I do it using an if statement ?
thanks so much.
Best regards,
Shital
Jon
Jon le 20 Août 2019
Modifié(e) : Jon le 20 Août 2019
Sorry, I don't understand what you are asking. The names of the variables are make and model
The values of the variables are things like 'Chrysler', 'Subaru' for the make and 'Caravan' and 'Forester' for the model.
So do you really want to change the name of a variable or do you want to change the value of one of the variables?
What is the situation where you would use the if statement? Presumably some condition that is only true some of the time where you need to branch. I can't tell from your description where you would have a branching condition or how this fits into the overall problem.
Sheetal Shirsat
Sheetal Shirsat le 20 Août 2019
Modifié(e) : Sheetal Shirsat le 20 Août 2019
Hi,
So basically what I am looking to do is:
  1. Use an if statement to filter the 'Make' Subaru from Car1.xls file,
  2. Change the Model name from Forester to Forester_LongWheelBase and Forester_ShortWheelBase, and assign say 60% of the value under Mileage for Forester_LongWheelBase and 40% to Forester_ShortWheelBase ;
  3. This would mean that car1 would now ahve additional rows because of the Longwheelbase and ShortWheelBase seggregation.
Thanks for everything !
Shital
"Use an if statement to filter"
You rarely have to use if in matlab, and certainly not for filtering.
issubaru = strcmp(car1.Make, 'Subaru');
returns a logical vector telling you which rows have 'Subaru' for Make. Editing these rows is then:
car1.Model(issubaru) = 'Wathever you want';
I don't understand how you choose between long and short wheel base.
Jon
Jon le 20 Août 2019
Modifié(e) : Jon le 20 Août 2019
I am understanding that you want turn the one row with subaru forester into two rows subaru forester_shortWheelBase and subaru forester_longWheelBase and split the mileage for the two with 60% going to one and 40% going to the other. If so, you could do that like this.
As Guillaume suggests, using the strcmp is central, to locating the row to be manipulated.
% find the row with forester subaru
idx = find(strcmp(car1.make,'subaru')&strcmp(car1.model,'forester'))
% insert identical row
car1 = vertcat(car1(1:idx,:),car1(idx,:),car1(idx+1:end,:))
% modify the model names for the two rows
car1.model{idx} = 'forester_longWheelBase'
car1.model{idx+1} = 'forester_shortWheelBase'
% modify the mileage for the two rows
car1.mileage(idx) = 0.6*car1.mileage(idx) % round parenthesis for numerical entries
car1.mileage(idx+1) = 0.4*car1.mileage(idx+1)
Guillaume
Guillaume le 20 Août 2019
Modifié(e) : Guillaume le 20 Août 2019
Note that Jon's code only work if there's only one 'forester' entry in the whole table. I would have an assert to avoid wreaking havoc if it's not the case:
idx = find(...)
assert(numel(idx) == 1, 'less or more than one forester in the table');
The following would work regardless of the number of 'forester' entries:
isforester = strcmp(car1.make, 'subaru') & strcmp(car1.model, 'forester');
%however since it's unlikely that a forester is not a subaru you could just do:
%isforester = strcmp(car1.model, 'forester')
longbase = car1(isforester, :);
longbase{:, 'model'} = {'forester_longwheelbase'}; %scalar assignment to multiple rows
longbase.mileage = 0.6 * longbase.mileage;
shortbase = car1(isforester, :);
shortbase{:, model} = {'forester_shortbase'}; %scalar assignment to multiple rows
shortbase.mileage = 0.4 * shortbase.mileage;
car1 = [car1(~isforester, :); longbase; shortbase]
Sheetal Shirsat
Sheetal Shirsat le 20 Août 2019
Modifié(e) : Sheetal Shirsat le 20 Août 2019
Thanks Guillaume, You are right !
Yes, I was trying Jon's Code, but car1.model{idx} = 'forester_longWheelBase' only worked if there is only one row with Subaru Forester.
In case of multiple rows, say for example Subaru Forester 2013 and Subaru Forester 2018, then the code doesn't work .
I will try using your code and see if its working
Thanks Jon and Guillaume for your time
Shital
Jon
Jon le 20 Août 2019
Modifié(e) : Jon le 20 Août 2019
If you have more than one matching row then neither of our codes will work. It's just that Guillaume traps the problem with the assert (which is always good practice). I just wanted to give you a sketch of how the approach might be implemented, my objective wasn't to give you fully bullet proof code. You of course will need to do that for a production version.
You will have to modify the approach, by making a loop, and performing those same basic actions for each element of idx returned by the initial find.
idx = find(strcmp(car1.make,'subaru')&strcmp(car1.model,'forester'))
Basically just set up a for loop and where I now have idx you would instead have idx(k) where k is the for loop index. I think you should be able to work out the details.
I don't know if there is any advantage to deleteing the old rows and putting the new ones on the end as Guillaume does versus inserting them as I do, but either way accomplishes the same basic requirement.
Actually, it is probably simpler to take Guillaumes approach and add them to the end, but wait until you are all done with the loop before deleting out the old rows. Otherwise the indices that locate the rows that need to be changed could become invalid after the first loop iteration as everything will have shifted.
Guillaume
Guillaume le 20 Août 2019
Modifié(e) : Guillaume le 20 Août 2019
Huh? Unless I've overlooked something, my code works regardless of the number of 'forester' (0, 1 or more)
edit: Do'h! Of course, I used the wrong indexing for the model assignment. I even tested the correct indexing beforehand and then forgot all about it. Fixed now.
Jon
Jon le 20 Août 2019
Sorry Guilaaume. I read your response too quickly, and just saw the assert, and didn't notice that the later code handles multiple matching rows.
Guillaumes approach is much cleaner than my suggestion of putting it in a loop. Definitely do it that way!
I think you should be all set now.
Sheetal Shirsat
Sheetal Shirsat le 21 Août 2019
This is Perfect !.
Thank you so very much Guillaume and Jon for your time and help.. ! Much appreciated !.
Best regards,
Shital

Connectez-vous pour commenter.

Plus de réponses (1)

Guillaume
Guillaume le 16 Août 2019
Modifié(e) : Guillaume le 16 Août 2019
Something like this should work:
dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx'); %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx'); %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')
edit: rereading the description, it sounds like a right outerloin, not a left one

3 commentaires

Sheetal Shirsat
Sheetal Shirsat le 16 Août 2019
Modifié(e) : Guillaume le 16 Août 2019
Hi,
I get an error for "Dataset content does not match your description" using the above code..
I am using something like this
[~, ~, raw1] = xlsread('ROE_Volume.xlsx') ;
to read my data 1 and 2.. Both the datasets have the same column descriptions , except the number of rows. Datset2 has less rows than Dataset1.
I am using "ismember" for comparing the two datasets description , but when using the logical (in my case n==1) to be true I cannot add numbers from column 10 of dataset1 to column 9 of dataset2.
Please help.
Thanks.
Shital
Guillaume
Guillaume le 16 Août 2019
Modifié(e) : Guillaume le 16 Août 2019
Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.
As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:
%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);
The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:
%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))]; %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end); %and copy relevant elements
Guillaume
Guillaume le 19 Août 2019
Modifié(e) : Guillaume le 19 Août 2019
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets
"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?
We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.

Connectez-vous pour commenter.

Produits

Version

R2017b

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by