Efficient access and manipulation of arrays in nested cells

Question

0 votes

I have nested cells in the form mycell{i}{j,k} with arrays in each of those. I have not found examples that work to perform operations like getting the stats (e.g., max) of all the arrays without a loop to return something like cellstat(i,j,k). Another example is that I'm performing a fit with each array and it would be nice to gather all of one of the goodness stats into a single dimension array or to take stats of a goodness stat across i so I can see it at each j,k.

I think with an example of each of those, I could figure out anything else that comes up. Thanks!

**********************

Adding an example:

data = rand(2e5,1); % one data set, I have many
datay = rand(2e5,1); % y-coordinate of the data
dataz = rand(2e5,1); % z-coordinate of the data

The first task with this data, is to create a grid of y,z pairs and sort each data set into those. Since rand is [0,1], say the grid is every 0.1. This only has to be done once, but I suppose how the data are stored could affect the speed of future steps.

After that, I'm doing a windowed fit on the points that are sorted into each y,z bin for each dataset. There may be some trial and error here, and, while I can test on subsets, it would be helpful if the data are structured in a way that makes the fitting routine as fast as possible. Would any more information be useful?

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Dan Houck le 1 Avr 2025

Ouvrir dans MATLAB Online

bins = 1:100; % this is how many datasets there are, so data = cell(100,1) and each data{i} = rand(2e5,1)
yrange = 0:0.1:1;
zrange = 0:0.1:1;
% assume preallocation of yz_data, but not shown
for m = 1:length(bins)
    for i = 1:length(data{i})
        for k = 1:length(yrange)-1  
            for l = 1:length(zrange)-1
            yz_data{i}{k,l} = [yz_data{m}{k,l};data{m}{i}(datay{m}{i} > yrange(k) & datay{m}{i} < yrange(k+1) & dataz{m}{i} > zrange(l) & dataz{m}{i} < zrange(k+1))];
            end
        end
    end
end

This is what I did. I think what I'm trying to ask (sorry for confusion) is if there's a storage scheme that will speed up future access.

dpb le 2 Avr 2025

Ouvrir dans MATLAB Online

OK, I let the "grid" and the initial stucture stuff confuse me...@Voss got back before I did and answered the basics; as he points out, there's no reason to create excessively complex storage structures; use the data you have the way it comes. I'd still be looking into how the data are initially created and what are the multiple cases for further consolidation, but if there really are 10E5 points per dataset, it's probably not a practical thing to actually combine until summarize results.

The only thing compared to @Voss's approach you might compare how

N=10;
edges=linspace(0,1,NY+1);
iyz=discretize([datay dataz],edges);

does compared to histcounts2. It returns the indices by column in one output array and uses the same binning in both directions so isn't quite as flexible but it might be a little faster, although given the tasks so far, I don't see performance as being a big issue if you don't make things more difficult than need be... :J>

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Voss le 1 Avr 2025

Ouvrir dans MATLAB Online

0 votes

data = rand(2e5,1); % one data set, I have many
datay = rand(2e5,1); % y-coordinate of the data
dataz = rand(2e5,1); % z-coordinate of the data

"The first task with this data, is to create a grid of y,z pairs and sort each data set into those. Since rand is [0,1], say the grid is every 0.1.... how the data are stored could affect the speed of future steps"

Store the bin index of each data point, so you know what bin each data point belongs to. (It's not necessary to make a new copy of the data with a different structure.)

NY = 10;
NZ = 10;
yedges = linspace(0,1,NY+1);
zedges = linspace(0,1,NZ+1);
[~,~,~,yidx,zidx] = histcounts2(datay,dataz,yedges,zedges);

"After that, I'm doing a windowed fit on the points that are sorted into each y,z bin for each dataset."

Maybe something like as follows. groupsummary uses the bin indices found in the previous step:

function out = your_fit_function(d,y,z)
    [f,gof] = fit([y,z],d,'poly11');
    out = {{f,gof}};
end
[C,BG] = groupsummary({data,datay,dataz},[zidx,yidx],@your_fit_function);

Now you have an sfit object and goodness-of-fit struct, returned from fit, for each grid cell:

C{1}
ans = 1x2 cell array
    {1x1 sfit}    {1x1 struct}
C{1}{:}
ans = 
     Linear model Poly11:
     ans(x,y) = p00 + p10*x + p01*y
     Coefficients (with 95% confidence bounds):
       p00 =      0.5103  (0.4767, 0.5439)
       p10 =    -0.09779  (-0.5436, 0.348)
       p01 =     -0.1282  (-0.559, 0.3026)
ans = struct with fields:
           sse: 170.9652
       rsquare: 2.5376e-04
           dfe: 2035
    adjrsquare: -7.2879e-04
          rmse: 0.2898

And you can do what you want with those:

for ii = 1:3%numel(C)
    fprintf(1,'region %0.1f<y<%0.1f, %0.1f<z<%0.1f:\n\n', ...
        yedges(BG{2}(ii)),yedges(BG{2}(ii)+1),zedges(BG{1}(ii)),zedges(BG{1}(ii)+1));
    fprintf(1,'  fit object:\n');
    disp(C{ii}{1})
    fprintf(1,'  goodness:\n');
    disp(C{ii}{2})
    fprintf(1,' \n');
end
region 0.0<y<0.1, 0.0<z<0.1:
  fit object:
     Linear model Poly11:
     (x,y) = p00 + p10*x + p01*y
     Coefficients (with 95% confidence bounds):
       p00 =      0.5103  (0.4767, 0.5439)
       p10 =    -0.09779  (-0.5436, 0.348)
       p01 =     -0.1282  (-0.559, 0.3026)
  goodness:
           sse: 170.9652
       rsquare: 2.5376e-04
           dfe: 2035
    adjrsquare: -7.2879e-04
          rmse: 0.2898
 
region 0.1<y<0.2, 0.0<z<0.1:
  fit object:
     Linear model Poly11:
     (x,y) = p00 + p10*x + p01*y
     Coefficients (with 95% confidence bounds):
       p00 =       0.505  (0.434, 0.576)
       p10 =     -0.1254  (-0.5669, 0.316)
       p01 =     0.04957  (-0.3817, 0.4809)
  goodness:
           sse: 162.3938
       rsquare: 1.8595e-04
           dfe: 1961
    adjrsquare: -8.3374e-04
          rmse: 0.2878
 
region 0.2<y<0.3, 0.0<z<0.1:
  fit object:
     Linear model Poly11:
     (x,y) = p00 + p10*x + p01*y
     Coefficients (with 95% confidence bounds):
       p00 =      0.5457  (0.4333, 0.6581)
       p10 =     -0.2367  (-0.6725, 0.1991)
       p01 =     0.09185  (-0.3504, 0.5341)
  goodness:
           sse: 164.6248
       rsquare: 6.5738e-04
           dfe: 1993
    adjrsquare: -3.4548e-04
          rmse: 0.2874
 

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Walter Roberson le 31 Mar 2025

Ouvrir dans MATLAB Online

0 votes

Example:

function gof = getgof(PAGE)
   [~, gof] = fit(PAGE somehow);
end
gof_stats = cellfun(@getgof, mycell, 'uniform', 0);
gof_stats = vertcat(gof_stats{:});

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Matt J le 1 Avr 2025

Modifié(e) : Matt J le 1 Avr 2025

0 votes

There is no way to iterate over cells (nested or otherwise) without a loop, or something equivent in performance to a loop (cellfun, arrayfun, cell2mat, etc...).

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Matt J le 1 Avr 2025

Modifié(e) : Matt J le 1 Avr 2025

Can you give an example without a loop, e.g., cellfun?

How would an example of cellfun help you? You said you are looking for something more efficient than a loop, and as I have said, nothing is more efficient than a loop when dealing with cell arrays.

dpb le 1 Avr 2025

Modifié(e) : dpb le 1 Avr 2025

To amplify on @Matt J's comment; at its heart all the cell-, array-, struct- functions are looping constructs internally that are "syntactic sugar" in replacing the for ... end loop with the single source code line. But, the performance of these cannot exceed that of JIT-compiled looping code and given that they have not been subject to all the optimizations Mathworks has made to for loops over the years including multi-threading, they all will be at least some slower than a "deadahead" for loop.

Functionally, a cellfun is a wrapper for an arrayfun -- it passes the derferenced cell to the function instead; you could construct the same with arrayfun if you did the dereferencing in the argument list for it. See this <recent post> for a general discussion and some pertinent remarks from TMW Staff members on differences.

MORAL: Do NOT assume that fewer lines of source code equate to faster execution speed.

Connectez-vous pour commenter.

Answer 4

dpb le 1 Avr 2025

Modifié(e) : dpb le 1 Avr 2025

0 votes

The other alternative to investigate is to turn the metadata you're segregating/tracking by cell indices into real data in a flat table or array. Ideally, those would be recognizable things like test number, date, whatever..., but they could for starters just be the indices. Then the power of <grouping variables> and or grpstats and/or varfun could be brought to bear on the problem. Large datasets can be dealt with tall arrays and/or memory mapping...findgroups

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Walter Roberson le 1 Avr 2025

I believe I could reorganize the data into a table

Accessing a range of table rows is notably less efficient than accessing a range of rows of a numeric array.

dpb le 1 Avr 2025

Modifié(e) : dpb le 1 Avr 2025

"... turn the metadata you're segregating/tracking by cell indices into real data in a flat table or array." (emphasis added...dpb)

The table is awfully convenient for display and is generally "fast enough" ...but, agreed, findgroups and splitapply to do the calculations will be faster on an array than will be varfun or grpstats on a table.

I was interpreting the Q? about speed as including the existing cell array structure as well, not just the comparison of an array to a table. Dereferencing a cell itself is generally quick, but by the time one calls cellfun() a number of times and then has to reconstruct/collect the results, who knows how it might compare?

But, it's pretty tough to attack @Dan Houck's real problem without an example to poke at...others may be able to write air code that might be applicable to his actual situation, but I'm not that clairvoyant and as @John D'Errico was complaining the other day, the Crystal Ball TB is notably dark these days.

Connectez-vous pour commenter.

Efficient access and manipulation of arrays in nested cells

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (3)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

Efficient access and manipulation of arrays in nested cells

8 commentaires Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (3)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

4 commentaires Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

4 commentaires Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

8 commentaires
Afficher 6 commentaires plus anciens Masquer 6 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens

4 commentaires
Afficher 2 commentaires plus anciens Masquer 2 commentaires plus anciens