MATLAB Answers

How do I isolate rows from a table, containing the same column value?

3 views (last 30 days)
I received a very large spreadsheet document with data of students, per student the columns exist of name, school, date of birth etc. I imported this spreadsheat as a table in matlab, and set all columns as strings.
The goal is to extract rows of students who went to the same highschool. Let's say this value is in collumn 3 and is alphabetically ordered.
I tried this in matlab by iterating and trying to save the students in a new table when the schools match. I found a solution where I have to copy one school in my code and then it gives a clean table with all students who went to that school.
But since there are 3000 different schools in the document it would be nice let a loop iterate over all schools and do all the work in one time.
This is what I thought would have worked but it didnt't:
MASSADOC = sortrows(MASSADOC,'school','ascend');
T=table;
Names=strings;
for i =1:15310%there are 15310 rows in my document
Names(i)=MASSADOC.Omschrijvingin(i); %to get a string array with all schools
end
U=unique(Names); %a string array with all schools 1 time in it
for i = 1:size(U)
x=U(i);
for j = 1:15310
if MASSADOC.school(j)==U(i)
T(j,:)=MASSADOC(j,:);
else
end
writetable(T, x+'.xlsx','Sheet',1);
end
end

  3 Comments

Mikel Spillemaekers
Mikel Spillemaekers on 5 Dec 2019
Massadoc2 is just a part of a copy of MASSADOC. So in the code MASSADOC should be Massadoc2 in this case.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 5 Dec 2019
Edited: Guillaume on 5 Dec 2019
There is no need to sort the table beforehand:
[group, schoolname] = findgroups(MASSADOC.school); %get unique schools and assign unique group to each one
for g = 1:numel(schoolname) %iterate over each school/group
writetable(MASSADOC(group == g, :), sprintf('%s.xlsx', schoolname{g})); %and save the rows that match the group
end

  6 Comments

Show 3 older comments
Guillaume
Guillaume on 5 Dec 2019
It's very rare that you need if in matlab, and typically the way beginners use if makes the code more complicated.
Here, the simplest way to do this is first to create a new table with only the 'Belgium' schools then use the same code as above with that table:
destinationfolder = 'C:\somewhere\somefolder';
Belgium_massadoc = MASSADOC(strcmp(MASSODOC.SomeVariable, 'Belgium', :); %keep only the rows for which SomeVariable is 'Belgium'
[group, schoolname] = findgroups(Belgium_massadoc.school); %get unique schools and assign unique group to each one
for g = 1:numel(schoolname) %iterate over each school/group
writetable(Belgium_massadoc(group == g, :), fullfile(destionationfolder, sprintf('%s.xlsx', schoolname{g}))); %and save the rows that match the group
end
However, if you want to do that for each country a slightly different approach would be better (basically same code as the original but with a different findgroups).
Mikel Spillemaekers
Mikel Spillemaekers on 6 Dec 2019
This works like a charm! Thank you very much. I think I'm too much thinking in a Python or Java way with the usage of If and For.
Do you have any idea how big excel files can go that matlab can handle? or does this just depend on you computer power?
Guillaume
Guillaume on 6 Dec 2019
Ultimately, it all depends on your computer but most likely, Excel will struggle on a big excel sheet before matlab does since excel needs to keep in memory not only the cell values, but their formatting, formulae and other properties which matlab doesn't store.
In addition, in matlab you can always resort to datastore and tall arrays to deal with data that would never fit in memory.

Sign in to comment.

More Answers (0)

Sign in to answer this question.