Reading multiple sheets from an excel file

793 vues (au cours des 30 derniers jours)
JMG
JMG le 24 Juil 2021
Commenté : dpb le 25 Juil 2021
I have an excel spreadsheet and the data is seperated into a different sheet for each year but I want to have all of the years worth of data in one table so I was wondering what the best way to approach this is? Right now my code only gets the data from 2007 and thats it.
opt=detectImportOptions('BC1 2007-2020.xlsx');
opt.VariableTypes(3)={'double'};
BC1=readtable('BC1 2007-2020.xlsx',opt);
BC1.Date=datetime(BC1{:,1},'InputFormat','dd/MM/yy HH:mm:ss');
BC1=table2timetable(BC1);
plot(BC1.Date,BC1.WaterLevel_Meters);

Réponse acceptée

dpb
dpb le 24 Juil 2021
Modifié(e) : dpb le 24 Juil 2021
You'll just have to loop over the sheets; use sheetnames to return the list of sheets in the given file and iterate over it--presuming the sheets are in the desired order in the workbook, they'll be returned in chronological order; if not you can always sort() it first.
Presuming the sheets are also in the same format, you can use the same import object for each, changing only the sheet name/number although naming the sheet explicitly in the readtable call overrides the import options value.
I'll note that you could also incorporate the datetime variable type and input format into the options object thus saving the explicit conversion code.
Since it's not known a priori (I presume) how many records are in each sheet, the simplest coding is to simply dynamically catentate the new table onto the existing one...
fn='BC1 2007-2020.xlsx'; % using a fully-qualified filename here would be good practice
tBC=[];
opt=detectImportOptions(fn);
% wanted/need options code/fixup here...
shts=sheetnames(fn);
for i=1:numel(shts)
tBC=[tBC;readtable(fn,opt,'Sheet',shts(i))];
end
...
Do the conversion to timetable from all only once at the end as well as the time conversion if choose to not use the import options route for import for it...
While dynamic allocation is not the most elegant solution, it is by far the simplest and as long as the sizes and numbers of the sheets to be imported isn't too large, the overhead will be not too painful -- particularly if only doing the reading once or a few times and mostly using the resulting data. Of course, once read, one can save the table as a .mat file and retrieve the whole thing very quickly; only when new data are added would it be necessary to reread the raw file again.
  5 commentaires
JMG
JMG le 24 Juil 2021
Thank you I actually just found the document with the data all formatted together so I no longer need to do any of this! I appreciate your help though and now I know for future reference.
dpb
dpb le 24 Juil 2021
I downloaded and looked at the file --
IMO, you should fix the broken file -- the date,time are sometimes in column A, sometimes A+B and there's other meta-data in the water level header line in some not in others but either way, it'll break if read the names.
That you can work around by not reading the variable names at all and just using the column numbers/generic Var1...VarN names then set the final variable names in the table properties struct.
You can write code to read either date, time or datetime as one string, but I still think it would be better to fix these data and then ensure create future in the same format going forward instead.
But, the other way would be to use detectImportOptions in turn on each file inside the loop and see how many variables it returns and then handle the different cases.
That also means have to convert to datetime for each file in turn using the appropriate code to convert the datetime and add the duration for those with both (or join the two strings) first. I didn't notice if the time/date formats were the same across sheets or not -- that could be another sticky wicket if somebody did those differently, too.

Connectez-vous pour commenter.

Plus de réponses (1)

Image Analyst
Image Analyst le 24 Juil 2021
Try this:
baseFileName = 'BC1 2007-2020.xlsx'; % using a fully-qualified filename here would be good practice
tblAllSheetData = [];
% opt=detectImportOptions(baseFileName);
% wanted/need options code/fixup here...
sheetNames = sheetnames(baseFileName)
for k = 1 : numel(sheetNames)
thisTable = readtable(baseFileName,'Sheet',sheetNames{k});
fprintf('Read a table of %d rows and %d columns from a sheet names %s.\n',...
height(thisTable), width(thisTable), sheetNames{k});
% Stitch onto the end
tblAllSheetData = vertcat(tblAllSheetData, thisTable);
end
but it will fail when it gets to sheet 2008, which has 5 columns instead of only 3. How do you want to handle the situation where some sheets don't have the same number of columns as other sheets? Getting to the root of the problem . . . why DON'T all sheets have the same format?
  2 commentaires
Image Analyst
Image Analyst le 25 Juil 2021
@JMG, are you going to answer these questions?
dpb
dpb le 25 Juil 2021
https://www.mathworks.com/matlabcentral/answers/885144-reading-multiple-sheets-from-an-excel-file#comment_1654434 comment above to my response of essentially same content that he located an existing file containing all the data so the Q? became moot...

Connectez-vous pour commenter.

Catégories

En savoir plus sur Data Import from MATLAB dans Help Center et File Exchange

Tags

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by