Error encountered when using Datastores on non-rectangular data sets

9 vues (au cours des 30 derniers jours)
Alexander
Alexander le 14 Août 2023
Commenté : dpb le 15 Août 2023
Hello, I have been trying to use the following code to create a datastore for some data that has 4 rows of non-data headers at the top of the file, however I keep getting an error.
function bigDataTest(fileLoc)
% This function takes in a folder and extracts all the data into MATLAB
files = dir(fileLoc)
location = fullfile({files(3:end).folder}, {files(3:end).name});
ds = datastore(location, 'type', 'spreadsheet', 'FileExtensions', '.XLSB', 'NumHeaderLines', 4)
end
When I try to run this, I get an error that says
"Error using datastore. Unable to determine range. Range must be of the form 'A1' (cell), 'A:B' (column-select), '1:5' (row-select), 'A1:B5' (rectangle-select), or a valid named range in the sheet."
I'm assuming that this is because datastore needs a rectangular dataset to run, but the set (with the four headers removed) is rectangular. Any help is appreciated!
Thanks.

Réponses (1)

dpb
dpb le 14 Août 2023
function bigDataTest(fileLoc)
% This function takes in a folder and extracts all the data into MATLAB
ds=spreadsheetDatastore(fileLoc,"FileExtensions",".xlsb","NumHeaderLines",4);
...
end
As is, the function is useless; it doesn't return anything to the caller; ergo, it can't affect anything.
You have to either return the datastore object and use it elsewhere to actually access the data or as indicated by the ellipses I left, go ahead and use it inside the function as indicated by the comment.
Then, of course, it either has to be fully contained and do everything including outputting whatever results are expected or return that data to the caller or it's still pointless...
  3 commentaires
dpb
dpb le 15 Août 2023
The above seems to work just fine here, I tried it for a local folder containing a collection of spreadsheets as follows
>> fnF='C:\SCCC Foundation\Financial';
>> fn=fullfile(fnF,'2019');
>> fs=spreadsheetDatastore(fullfile(fn,'DistF*.xlsx'),"NumHeaderLines",4)
fs =
SpreadsheetDatastore with properties:
Files: {
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_01_Jan.xlsx';
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_02_Feb.xlsx';
'C:\SCCC Foundation\Financial\2019\DistFunds_2019_03_Mar.xlsx'
... and 9 more
}
Folders: {
'C:\SCCC Foundation\Financial\2019'
}
AlternateFileSystemRoots: {}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 4
VariableNamingRule: 'modify'
ReadVariableNames: true
VariableNames: {'Var1', 'Var2', 'Var3' ... and 17 more}
VariableTypes: {'double', 'double', 'double' ... and 17 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 17 more}
SelectedVariableTypes: {'double', 'double', 'double' ... and 17 more}
ReadSize: 'file'
OutputType: 'table'
RowTimes: []
Write-specific Properties:
SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"]
DefaultOutputFormat: "xlsx"
>>
For a specific set of files and then with the whole folder
>> fs=spreadsheetDatastore(fullfile(fn),"FileExtensions",'.xlsx',"NumHeaderLines",4)
fs =
SpreadsheetDatastore with properties:
Files: {
'C:\SCCC Foundation\Financial\2019\2019 Budget.xlsx';
'C:\SCCC Foundation\Financial\2019\2019 CompGainLoss.xlsx';
' ...\Financial\2019\2019 IncStmt Bal Sheet_2019_12_Dec.xlsx'
... and 46 more
}
Folders: {
'C:\SCCC Foundation\Financial\2019'
}
AlternateFileSystemRoots: {}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 4
VariableNamingRule: 'modify'
ReadVariableNames: true
VariableNames: {'Income', 'Var2', 'Var3' ... and 37 more}
VariableTypes: {'char', 'double', 'double' ... and 37 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Income', 'Var2', 'Var3' ... and 37 more}
SelectedVariableTypes: {'char', 'double', 'double' ... and 37 more}
ReadSize: 'file'
OutputType: 'table'
RowTimes: []
Write-specific Properties:
SupportedOutputFormats: ["txt" "csv" "xlsx" "xls" "parquet" "parq"]
DefaultOutputFormat: "xlsx"
>>
It doesn't appear the the datastore object really cares about the file content itself; it just constructs the datastore object as told; it'll then be up to the functions that use the object to be consistent with its definitions in reading the file and then is when something that isn't the same would show up.
So, don't think it's the above code that is the problem by itself.
Unless there's something that is version specific and an earlier release doesn't support the 'NumHeaderLines' option for a spreadsheet file. I'm using R2021b(R3) here...
dpb
dpb le 15 Août 2023
The error message is indicative of one from readtable and friends; I've seen that or very similar if attempting to use 'NumHeaderLines' on spreadsheet files; the doc says it is only available for text files, not spreadsheets and .xlsb is a spreadsheet format.
I was surprised to see that the spreadsheet datastore accepted it with the spreadsheet extension.
Show us the whole error message in context with the code that generated it; the datastore creation itself won't give me that error although if I try to set it arbitrarily for a set of files it discovers the files don't all then fit a given number of header lines and it then can't find variable names or other parsing problems.
It could/would help to see a typical file you're trying to process and how you're then trying to read it, not just this one line in isolation.

Connectez-vous pour commenter.

Produits


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by