Effacer les filtres
Effacer les filtres

Building tall table from tall arrays generates error

3 vues (au cours des 30 derniers jours)
Harry Cho
Harry Cho le 14 Mar 2023
Commenté : Harry Cho le 15 Mar 2023
clear
dataFile = 'data.csv';
ds = tabularTextDatastore(dataFile, FileExtensions='.csv');
ds.ReadVariableNames = true;
ds.Delimiter = ',';
ds.SelectedVariableNames = ["hash", "count"];
ds.SelectedFormats = {'%s', '%f'};
data = tall(ds);
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to the parallel pool (number of workers: 2).
[g, THash] = findgroups(data.hash);
TCount = splitapply(@(x) {x}, data.count, g);
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 1: 0% complete - Pass 1 of 1: 100% complete - Pass 1 of 1: Completed in 1.9 sec Evaluation completed in 2.8 sec
count = gather(TCount);
Evaluating tall expression using the Parallel Pool 'Processes': - Pass 1 of 3: 0% complete - Pass 1 of 3: 100% complete - Pass 1 of 3: Completed in 0.54 sec - Pass 2 of 3: 0% complete - Pass 2 of 3: 100% complete - Pass 2 of 3: Completed in 0.46 sec - Pass 3 of 3: 0% complete - Pass 3 of 3: 100% complete - Pass 3 of 3: Completed in 0.58 sec Evaluation completed in 2.3 sec
T1 = table(hash, count);
%% This is the intended code but doesn't work
TT = table(THash,TCount);
Error using tall/table
Incompatible non-scalar tall array arguments. Each of the tall arrays must be the same size in the first dimension, must be derived from a single tall array, and must not have been indexed
differently in the first dimension (indexing operations include functions such as VERTCAT, SPLITAPPLY, SORT, CELL2MAT, SYNCHRONIZE, RETIME and so on).
write(fullfile(pwd,'data'),TT,FileType="parquet");

Réponses (1)

Oguz Kaan Hancioglu
Oguz Kaan Hancioglu le 15 Mar 2023
Your code wasn't work because "gather(TCount)" returns cell array for each element. Therefore you are trying to write double array in to one single cell. You can find the length of each array into the cell. I hope this solves your problem.
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
cellsz = cellfun(@size,count,'uni',false);
newCount = cellfun(@(x) x(1),cellsz,'UniformOutput',false)
T1 = table(hash, newCount);
  1 commentaire
Harry Cho
Harry Cho le 15 Mar 2023
Thank you for the reply. Unfortunately I have to collect cell array, in which each cell has different length of double array. My question is why it works in-memory table T1 but not in tall table TT.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Matrices and Arrays dans Help Center et File Exchange

Produits


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by