Efficient script to isolate one sub-dataset k-times.

1 vue (au cours des 30 derniers jours)
Vic
Vic le 3 Mar 2024
Commenté : Vic le 7 Mar 2024
Hi everyone,
The idea is to divide the main dataset into k sub-datasets and delete 1 bin each time and remerge the other sub-datasets. In a nutshell, k bins will create k different sub-datasets. Since the number of bins mays not be a multiple of the number of row in the matrix (Bin k has often less rows), I had to use cell arrays.
Here is an illustration of the general idea for k = 2.
Question:
How can I remove the loop or make this code more efficient?
Here is my script.
------------------------------------------------------
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Thanks for your inputs
  2 commentaires
Voss
Voss le 5 Mar 2024
Modifié(e) : Voss le 5 Mar 2024
Two observations:
  1. The last row of Variables is included as the last row of every element of Bin_Variables2 (because Bin_size(end) is always included).
  2. When size(Variables,1) is a multiple of Bin_numb, I expect you'd want each element of Bin_Variables2 to be the same size, but that's not what happens.
To illustrate:
Variables = rand(242,7);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Observation 1: last row always the same:
fprintf('%36s%s\n','Last row of Variables: ',sprintf('%6.4g ',Variables(end,:)));
Last row of Variables: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
for ii = 1:numel(Bin_Variables2)
fprintf('%36s%s\n',sprintf('Last row of Bin_Variables2{%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
Last row of Bin_Variables2{1}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{2}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{3}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{4}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{5}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{6}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{7}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{8}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{9}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{10}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{11}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
Observation 2: unequally sized result matrices even though 242 is a multiple of 11:
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
bin_sizes = 1×11
220 220 220 220 220 220 220 220 220 220 221
Vic
Vic le 7 Mar 2024
@Voss Thanks for these observations. @Manikanta Aditya & @Dyuman Joshi Thanks for your help. I haven't thought about the logical array. This is an elegant way to solve it.
Here is my current script.
Variables = rand(245,7);
Bin_numb = 11;
Bin_size = 1:floor(length(Variables)/Bin_numb):length(Variables);
if length(Variables)-Bin_size(end) <= 12
Bin_size(end) = length(Variables);
end
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(length(Variables), 1);
idx(Bin_size(i):Bin_size(i+1)) = false;
Bin_Variables2{i} = Variables(idx, :);
end
for ii = 1:numel(Bin_Variables2)
fprintf('%1s%s\n',sprintf('Last row {%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
length(Variables)-bin_sizes
Bin_size
Unrecognized function or variable 'Variables'.
Invalid expression. Check for missing or extra characters.
I forced a if condition to change Bin_size(end) = length(Variables) if size(Variables,1) is not a multiple of Bin_numb. Therefore, the last bin has floor(length(Variables)/Bin_numb) + mod(length(Variables),Bin_numb) rows (22+3) and I get this:
bin_sizes =
222 222 222 222 222 222 222 222 222 222 220
length(Variables)-bin_sizes =
23 23 23 23 23 23 23 23 23 23 25
It works.
As of the last row always being the same; it seems to be fine now but I still have some doubts about bin N-1 and its size.
Last row {1}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {2}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {3}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {4}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {5}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {6}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {7}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {8}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {9}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {10}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {11}: 0.1865 0.9516 0.07304 0.0887 0.697 0.9751 0.5142

Connectez-vous pour commenter.

Réponse acceptée

Manikanta Aditya
Manikanta Aditya le 4 Mar 2024
Déplacé(e) : Dyuman Joshi le 4 Mar 2024
Just check out this code snippet which I can propose to make the code more efficient by using logical indexing instead of a loop:
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(size(Variables, 1), 1);
idx(Bin_size(i):Bin_size(i+1)-1) = false;
Bin_Variables2{i} = Variables(idx, :);
end
In this code, 'idx' is a logical array that is true for the rows of Variables that you want to keep. This approach avoids the need to concatenate arrays, which can be slow in MATLAB because it involves memory allocation. Instead, you’re just creating a logical index and using it to select the rows you want.
  2 commentaires
Dyuman Joshi
Dyuman Joshi le 4 Mar 2024
Modifié(e) : Dyuman Joshi le 4 Mar 2024
@Manikanta Aditya, This looks good, though I would suggest to use size(Bin_size,1) instead of length(Bin_size).
" ... by using logical indexing instead of a loop:"
You are still using a loop.
@Vic, an important part of the code above is Preallocation, which is a good programming practice in MATLAB resulting in improved code performance.
Manikanta Aditya
Manikanta Aditya le 4 Mar 2024
Thanks @Dyuman Joshi for the reply back. My bad I didn't see the statement about the loop.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Just for fun dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by