How to join multiple Excel files of varying lengths

Hello
So I have a folder containing a large number of Excel files. Each file has one sheet only, and the data is located in the 1st column. However the number of rows of data varies between the files. I would like to join all of these files into a single variable in Matlab, such that the number of columns is equal to the number of files, and the number of rows in each column corresponds to the data in the file. Empty cells can be filled with NaN.
The objective is that I have a number of such folders, each represents certain physical measurement. I would then like to conduct mathematical operations between the variables once I get all files/folders on to Matlab.
Thank you

5 commentaires

dpb
dpb le 5 Oct 2022
This will be straightforward, start with returning the dir() struct with the file names and read each sheet into a cell array by column. Once you have that cell array for all files, you can then find the longest file amongst the lot and augment each to that length. That done, then cell2mat will convert the whole thing to an array and you're done...
Give it a shot; post what you have and where you run into a problem.
There are many Answers on the forum processing multiple files and a section in the doc with some examples as well to get you going...
So I tried using the following code:
files = dir('*.csv');
for i=1:length(files)
combined_array(:,i)=load(files(i).name, '-ascii');
end
However, Matlab terminates the loop as soon as it encounters a file which has a different number of rows compared to the previous iteration. I have about 30-40 files in each folder, so not possible to do it manually.
dpb
dpb le 5 Oct 2022
Modifié(e) : dpb le 5 Oct 2022
Because in
combined_array(:,i)=load(files(i).name, '-ascii');
combined_array is NOT a cell array as needs be to hold disparate sized arrays...
I wouldn't call .csv files "Excel files" even though you may open them using Excel; use readmatrix to return the data to each...
combined_array{:,i}=readmatrix(files(i).name);
NOTA BENE "the curlies" {} around the index expression.
The above after that step would be equivalent to @Cris LaPierre's suggestion of a datastore and findall; the cell array; that is a cute trick I don't think about often.
The Q? then is how you want to process the results; if you were operating only on each independently, then that would be a good place to leave off; if, however, as your original Q? said, you want to operate across the various columns together, then the step to augment to a full 2D array (or truncate to the shortest instead, maybe?) is going to be more convenient final result I'll wager.
HINT cellfun and an anonymous function will turn out to be useful in next step, no doubt...
L=cellfun(@numel,combined_array); % determine length of each array in cell array
Thank you both for your comments. I'll post my results soon.
Hello dpb
Thanks for your comments, it worked.

Connectez-vous pour commenter.

 Réponse acceptée

Here's something based on using fileDataStore. This assumes your data is numeric, as it uses readmatrix to load the data from your spreadsheets.
dsFiles = fileDatastore("file*.xlsx","ReadFcn",@readmatrix);
data = readall(dsFiles);
D = cellfun(@length,data)
L = max(D)
for a = 1:length(D)
C = nan(L-D(a),1);
out(:,a) = [data{a};C];
end

Plus de réponses (1)

Mathieu NOE
Mathieu NOE le 6 Oct 2022
Modifié(e) : Mathieu NOE le 6 Oct 2022
hello
try this :
fileDir = pwd; % current directory (or specify which one is the working directory)
outfile = 'OUT.xlsx'; % output file name
S = dir(fullfile(fileDir,'*.xlsx')); % get list of data files in directory
S = natsortfiles(S); % sort file names into natural order , see :
%(https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
out_data = [];
max_nn = 0;
for k = 1:length(S)
filename = S(k).name % to actually show filenames are sorted (see command window)
out = readmatrix( fullfile(fileDir, filename),"Range",'A:A'); % extract column 1 (A) of each xlsx files
nn = numel(out);
oo = size(out_data,1);
% case 1 : out_data longer (strictly) than new data (out)
if oo<nn % need to padd some NaN's to out_data before doing the final horizontal concatenation
out_data = [out_data; NaN(nn-oo,size(out_data,2))]; % vertical concatenation
end
% case 2 : out_data shorter (strictly) than new data (out)
if oo>nn % need to padd some NaN's to out before doing the final horizontal concatenation
out = [out; NaN(oo-nn,1)]; % vertical concatenation
end
% case 3 : out_data same length as new data (out)
% nothing to do !!
out_data = [out_data out]; % final horizontal concatenation
end
% store out_data in excel file
writematrix(out_data,fullfile(fileDir,outfile));

2 commentaires

Hi Mathieu
The program cannot determine this command 'natsortfiles'. Do you know what could be the issue?
Regards
hello Murtaza
as I wrote in my comments , this is a function usefull to make sure the files are sorted out correctly (what matlab does not by default)
so look at the File Exchange (FEX in short) and get this excellent submission :
all the best

Connectez-vous pour commenter.

Produits

Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by