My nested For Loops takes 40 seconds for 20,000 records. Anyway to vectorize or improve?

Question

Mark McGrath le 15 Juin 2020

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/548823-my-nested-for-loops-takes-40-seconds-for-20-000-records-anyway-to-vectorize-or-improve

Clôturé : MATLAB Answer Bot le 20 Août 2021

 % Consolidate Results by Security and Date
    uniqueDate = unique(d.RiskData(:,2));
    uniqueSec = unique(d.RiskData(:,1));
    numSecs = length(unique(d.RiskData(:,1)));
    numDates = length(unique(d.RiskData(:,2)));
    
    d.RiskDataNew{(numDates * numSecs) + 1,size(d.RiskData,2)} =[];
    
    d.RiskDataNew(1,1) = {'SECURITY'};
    d.RiskDataNew(1,2) = {'DATE'};
    for i = 1:numFields
        d.RiskDataNew(1,i + 2) = fldList(i);
    end
    
    
    %d.RiskDataNew = {};
    tic
    n = 1;
    for x = 1:numSecs 
      for y = 1:numDates 
        idx = find(contains(d.RiskData(1:end,1),uniqueSec(x)) & ...
                   contains(d.RiskData(1:end,2),uniqueDate(y)));
        if ~isempty(idx)
          n = n + 1;
          d.RiskDataNew(n,:)=d.RiskData(idx(1),:);
          if length(idx) > 1
            numFields = size(d.RiskData,2); 
            for j = 2:length(idx)
              for k = 3:numFields % first two fields are defaults: Security and Date
                if (isempty(d.RiskDataNew{n,k}))
                  d.RiskDataNew{n,k} = d.RiskData{idx(j),k};
                end
              end
            end
          end
        end
      end
    end
    toc

2 commentaires
Afficher AucuneMasquer Aucune

dpb le 15 Juin 2020

find is superfluous here and somewhat costly...

Hard to decipher what is going on -- how about an explanation of what your'e trying to do and a sample of the raw data to work from?

Is it mandatory to use struct? They're expensive relative to just plain data arrays.

Walter Roberson le 16 Juin 2020

Ouvrir dans MATLAB Online

I have not tested, but I have a suspicion that ismember is faster than contains().

contains(d.RiskData(1:end,1),uniqueSec(x))

That sub-expression can be calculated at the for x level. You can possibly even do

    for x = 1:numSecs 
        xidx = find(contains(d.RiskData(1:end,1),uniqueSec(x)));
      for y = xidx
        idx = find(contains(d.RiskData(1:end,2),uniqueDate(y)));

Cette question est clôturée.

Answer 1

Sayyed Ahmad le 16 Juin 2020

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/548823-my-nested-for-loops-takes-40-seconds-for-20-000-records-anyway-to-vectorize-or-improve#answer_451803

Ouvrir dans MATLAB Online

You have to avoiding the nested loops.

may be you can use the bsxfun to avoiding some loops.

An example:

A = rand(50); % 50-by-50 matrix of random values between 0 and 1
% method 1: slow and lots of lines of code
tic
meanA = mean(A); % mean of every matrix column: a row vector
% pre-allocate result for speed, remove this for even worse performance
result = zeros(size(A));
for j = 1:size(A,1)
    result(j,:) = A(j,:) - meanA;
end
toc
clear result % make sure method 2 creates its own result
% method 2: fast and only one line of code
tic
result = bsxfun(@minus,A,mean(A));
toc

the Answer wold be

Elapsed time is 0.015153 seconds.
Elapsed time is 0.007884 seconds.

see the following links for more details.

https://riptutorial.com/matlab

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Answer 2

Mark McGrath le 16 Juin 2020

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/548823-my-nested-for-loops-takes-40-seconds-for-20-000-records-anyway-to-vectorize-or-improve#answer_451974

Modifié(e) : Mark McGrath le 16 Juin 2020

Ouvrir dans MATLAB Online

Hello Thanks for soem of the responses.

I am trying to consolidate data and can use a table with summary grouping but i need it to be dynamic as i do not know all the fields ahead of time other than security and date (so can't use). So i am iterating through to find all fields, distinct securities and distinct dates. For security and date i want all fields to be on one row.

The example below happens to show security and date with 3 extra fields. I want the three extra fields to be rolled up into one line. BUT, those 3 extra fields still need to be dynamic as i may have 20 extra fields. Depends on scenario.

testData = {'IBM','01/04/2020',[],'23',1;
        'IBM','01/06/2020',[],'20',2;
        'IBM','01/08/2020',[],'25',3;
        'IBM','01/12/2020',[],'26.23',4;
        'IBM','01/20/2020',[],'28.21',5;
        'IBM','01/22/2020',[],'56',6;
        'IBM','01/04/2020','0.1475',[],7;
        'IBM','01/06/2020','0.56',[],8;
        'IBM','01/08/2020','0.785',[],9;
        'IBM','01/12/2020','0.654',[],10;
        'IBM','01/20/2020','0.8941',[],11;
        'IBM','01/21/2020','0.7841',[],12;
        'IBM','01/30/2020','0.7777',[],13;
        'IBM','01/31/2020','0.3333',[],14;
        'CSCO','01/04/2020',[],'23',[];
        'CSCO','01/06/2020',[],'20',[];
        'CSCO','01/08/2020',[],'25',17;
        'CSCO','01/12/2020',[],'26.23',[];
        'CSCO','01/20/2020',[],'28.21',[];
        'CSCO','01/22/2020',[],'56',20;
        'CSCO','01/04/2020','0.1475',[],21;
        'CSCO','01/06/2020','0.56',[],22;
        'CSCO','01/08/2020','0.785',[],23;
        'CSCO','01/12/2020','0.654',[],24;
        'CSCO','01/20/2020','0.8941',[],25;
        'CSCO','01/21/2020','0.7841',[],26;
        'CSCO','01/30/2020','0.5555',[],27;
    };

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Mark McGrath le 17 Juin 2020

Hello,

Thanks for helping out.

For ease - lets drop the 5th record (as it could be multiple) and use the first 4 records. In this scenario, there should only be one record for column 3 otr column 4 per date that needs to be aggregated to one row using cell arrays and not table summary group.

The rsults fro a couple would be:

IBM,01/04/2020,.1475,23

IBM,01/06/2020,.56,20

CSCO,01/04/2020,.1475,23

and so forth. I wrote the logic to allow fields 3.....n. The first 2 fields security and date will always be there but starting on 3rd field, itr can be 3, or in this case 3 to 4, or 3 to n.

dpb le 17 Juin 2020

Are all the "fields", numeric? Then just a cell array would hold them -- the problem will still be processing a different number of elements per cell will kill about any vector operations as will a variable number of struct fields or the like.

The most straightforward way altho a little more memory-costly would be to either define a maximum N and preallocate or determine the max number in the dataset and allocate that size of array using NaN or other way to indicate the missing values.

Or, you could then use a table or timetable with the necessary number of variable columns -- code could then be written for that scenario as well that would be generic based on the table size/number variables/columns.

My nested For Loops takes 40 seconds for 20,000 records. Anyway to vectorize or improve?

2 commentaires
Afficher AucuneMasquer Aucune

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Voir également

Tags

Produits

Version

Community Treasure Hunt

My nested For Loops takes 40 seconds for 20,000 records. Anyway to vectorize or improve?

2 commentaires Afficher AucuneMasquer Aucune

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Voir également

Tags

Produits

Version

Community Treasure Hunt

2 commentaires
Afficher AucuneMasquer Aucune

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien