I heed your help please. I made a random data for example T1 = randn (1000,1); T2= randn (1000,1); .... T100=randn (1000,1); and I want check whether there is any repetition for T's if so then remove it. How can I do that ?? Thanks in advance :)
Regards, Ahmed

11 commentaires

doc unique
Student for ever
Student for ever le 4 Jan 2018
Thank you so much :)
Jan
Jan le 4 Jan 2018
Do you mean repetition inside each vector, or between elements of all different vectors?
Student for ever
Student for ever le 4 Jan 2018
Hi Jan, Actually, it was a result coming from parallel computation time-series and sometimes it happens that there is a repetition on these timeseries
Image Analyst
Image Analyst le 4 Jan 2018
Are you setting a seed? Do you know what a seed is?
Jan
Jan le 5 Jan 2018
Modifié(e) : Jan le 5 Jan 2018
@Ahmed: This does not answer my question. Do you want to avoid repetitions of elements inside each T, or should different T do not have the same value at the same index, or should the elements of each T not appear anywhere in any other T, or should the vectors T be different, but single values can be identical? Which kind of "repetitions" has to be avoided in your problem? Should the corresponding T vector be removed, or replaced by new data, or combined, or re-ordered?
Are you talking about time series or data created by randn?
Thank you Jan
yes, should the vectors T be different, but single values can be identica.
## Which kind of "repetitions" has to be avoided in your problem? Should the corresponding T vector be removed?? ... YES
Student for ever
Student for ever le 7 Jan 2018
Are you talking about time series or data created by randn?
about time series created independently
Student for ever
Student for ever le 7 Jan 2018
Image Analyst@ what you mean by your question
Star Strider
Star Strider le 7 Jan 2018
@Ahmed — See the documentation on rng (link), and more generally, the discussion on Generate Random Numbers That Are Repeatable (link).
Student for ever
Student for ever le 7 Jan 2018
Modifié(e) : Jan le 7 Jan 2018
Dear Jan,
I am new in matlab :), May my question is not clear, but your answer it is so close of what I want to do I think. I have 200 timeseries, which were come from parallel computation and I just want make sure that 200 are not repeated (what I mean, if I make plot for them they should give me different graphs). So, I put all the 200 timeseries as a matrix, it will be 200 column , then I just want check these columns not the same.

Connectez-vous pour commenter.

 Réponse acceptée

Jan
Jan le 4 Jan 2018
Modifié(e) : Jan le 8 Jan 2018
Do not create a list of variables called T1, T2, ... See https://www.mathworks.com/matlabcentral/answers/57445-faq-how-can-i-create-variables-a1-a2-a10-in-a-loop. Use a cell or multidimensional array instead.
I assume your problem is to have no repeated values inside each vector and between all vectors. Then you need 1000*100 different random numbers at first:
ready = false;
while ~ready
Pool = rand(1, 100000);
ready = (length(unique(Pool)) == length(Pool));
end
T = reshape(Pool, 1000, 100);
Maybe this is faster:
ready = all(diff(sort(Pool)));
[EDITED] If all you want is to create a unique set of vectors, and randn was just an example to create test data for the forum:
[T, Idx] = unique(T, 'rows')
[EDITED] And for unique columns:
T = unique(T.', 'rows').'

Plus de réponses (2)

Birdman
Birdman le 4 Jan 2018
Firstly, generate random data as follows:
T=randn(1000,100);
Secondly, as Adam said, use unique function to check repetitions.
Tun=unique(T,'stable');
stable command helps to protect the initial order of values.

5 commentaires

Jan
Jan le 4 Jan 2018
Modifié(e) : Jan le 4 Jan 2018
Then Tun is a vector. The initial order of the elements is not important for random data. Then omitting the 'stable' flag decreases the processing time.
Student for ever
Student for ever le 4 Jan 2018
Thank you so much :)
Birdman
Birdman le 4 Jan 2018
Modifié(e) : Birdman le 4 Jan 2018
Yes Jan, exactly. But of course the initial vector T can also be overwritten. It is up to the user.
You are welcome Ahmed.
Edit: Jan, using 'stable' flag is just a habit for me. I do not want to lose the order of data with the stuff that I am working, therefore I use it but of course it can be removed if wanted.
Jan
Jan le 4 Jan 2018
A "habit"? :-) I'd suggest to use time consuming methods only, if they are needed for the results.
Birdman
Birdman le 4 Jan 2018
It is needed for result, exactly.

Connectez-vous pour commenter.

John BG
John BG le 5 Jan 2018
Hi Ahmed
so far, the supplied answers increase the probability to generate all-different, random Ts.
Each of the answers improves generation randomness, yet if you really want to make sure that all T sequences are different, once generated, let's say you don't really have control on the randomness of the data and the the suggested randn(1000,1) is you model, then there's no other way than comparing them by pairs.
1.
Let be N the amount of T sequences
N=5
2.
then all possible pairs of T sequences are
L=combinator(N,2,'c')
=
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
3.
As Jan Simon mentions, sometimes it's more practical to put all data in a structure that can be indexed, instead of working with N different sequence names.
Let be T all your input Ti sequences compiled into a single matrix
T=randi([1 10],N)
T =
8 2 3 9 3
3 5 8 10 9
7 10 3 6 3
7 4 6 2 9
2 6 7 2 3
4.
Checking there are no 2 equal sequences
D=[0 0];
for k=1:1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
D=[D;L(k,:)];
end
end
5.
Removing repeated sequences
if size(D,1)>1
D(1,:)=[];
T(D(:,1),:)=[]; % removing one of the repeated identical pairs
end
T
.
Ahmed, I have overwritten some sequences on purpose, so the counter D shows spotted repeated sequences and these simple lines remove all repetition without losing data (when more than one repetition of same given sequence) and it works.
If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?
To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link
thanks in advance for time and attention
John BG

12 commentaires

Jan
Jan le 5 Jan 2018
Modifié(e) : Jan le 8 Jan 2018
If a loop is wanted for any reasons, the iterative growing of arrays should be avoided, because it is extremely inefficient. Step 4 could be:
D = zeros(size(L, 1), 2); % Pre-allocation!!!
iD = 0;
for k = 1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
iD = iD + 1;
D(iD, :) = L(k, :);
end
end
D = D(1:iD, :); % Crop unneeded elements
Or even leaner by storing the indices k only:
dup = false(size(L, 1), 2); % Pre-allocation!!!
for k = 1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
dup(k) = true;
break;
end
end
L = L(dup, :);
Two loops are easy here, such that calling combinator is not needed:
nT = size(T, 1);
keep = true(nT, 1); % Pre-allocation!!!
for i1 = 1:nT
Ti1 = T(i1, :);
for i2 = i1 + 1:nT
if isequal(Ti1, T(i2, :))
keep(i1) = false;
break; % No need to proceed the search
end
end
end
T = T(keep, :);
But the set of unique vectors can be obtained much easier by a single built-in function:
[T, Idx] = unique(T, 'rows')
Checking delays for 100 strings shows that unique is the fastest option:
N=100
M=1000
% T=randi([1000 9999],N,M);
T=repmat(randi([1000 9999],1,M),N,1);
tic
D=[0 0];
L=combinator(N,2,'c');
for k=1:1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
D=[D;L(k,:)];
end
end
if size(D,1)>1
D(1,:)=[];
T(D(:,1),:)=[]; % removing one of the repeated identical pairs
end
toc
100: Elapsed time is 0.068783 seconds.
1000: Elapsed time is 7.953736 seconds.
single string repeated 100 times: Elapsed time is 0.112921 seconds.
tic
L=combinator(N,2,'c');
D = zeros(size(L, 1), 2); % Pre-allocation!!!
iD = 0;
for k = 1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
iD = iD + 1;
D(iD, :) = L(k, :);
end
end
D = D(1:iD, :);
toc
100: Elapsed time is 0.075002 seconds.
1000: Elapsed time is 7.884542 seconds.
single string repeated 100 times: Elapsed time is 0.085103 seconds.
tic
L=combinator(N,2,'c');
dup = false(size(L, 1), 2); % Pre-allocation!!!
for k = 1:size(L,1)
if isequal(T(L(k,1),:),T(L(k,2),:))
dup(k) = true;
break;
end
end
L = L(dup, :);
toc
100: Elapsed time is 0.062778 seconds.
1000: Elapsed time is 7.863167 seconds.
single string repeated 100 times: Elapsed time is 0.030683 seconds.
tic
nT = size(T, 1);
keep = true(nT, 1); % Pre-allocation!!!
for i1 = 1:nT
for i2 = i1 + 1:nT
if isequal(T(i1, :), T(i2, :))
keep(i1) = false;
break; % No need to proceed the search
end
end
end
T = T(keep, :);
toc
100: Elapsed time is 0.068909 seconds.
1000: Elapsed time is 7.784486 seconds.
single string repeated 100 times: Elapsed time is 0.034376 seconds.
tic
[T2, Idx] = unique(T, 'rows');
toc
100: Elapsed time is 0.023476 seconds.
1000: Elapsed time is 0.061907 seconds.
single string repeated 100 times: Elapsed time is 0.024031 seconds.
When increasing the amount of strings, unique outperforms any other solution, regarding time delay.
Regards
John BG
Student for ever
Student for ever le 7 Jan 2018
Thank you all guys
Student for ever
Student for ever le 7 Jan 2018
Dear John BG, Actually, I am new in matlab :). May my question is not clear, but your answer it is so close of what I want to do I think. I have 200 timeseries, which were come from parallel computation and I just want make sure that 200 are not repeated (what I mean, if I make plot for them they should give me different graphs). So, I put all the 200 timeseries as a matrix like your T here, instead you have 5 columns, for me it will be 200 column , then I just want check these columns not the same.
John BG
John BG le 8 Jan 2018
Modifié(e) : John BG le 8 Jan 2018
Hi Ahmed
ok, columns
N=200
M=1000
% T=randi([1000 9999],N,M); % test
% T=repmat(randi([1000 9999],1,M),M,N); % test
T=randi([1000 9999],M,N);
tic
D=[0 0];
L=combinator(N,2,'c');
for k=1:1:size(L,1)
if isequal(T(:,L(k,1)),T(:,L(k,2)))
D=[D;L(k,:)];
end
end
if size(D,1)>1
D(1,:)=[];
T(:,D(:,1))=[]; % removing repetitions
end
toc
Elapsed time is 0.092280 seconds.
John BG
John BG le 8 Jan 2018
Modifié(e) : John BG le 8 Jan 2018
Also
Please correct me if wrong but:
1.- if you had wanted to use command unique you would have already done it, yet command unique requires all series to have same length.
2.- if the lengths of the series are variable then unique, just as now suggested, cannot be used, and even the suggested loops need further refinement.
How would you like to proceed, single command unique is ok?
Or the lengths of the 200 samples vary from sample to sample?
does the following solve the question? it's quite fast
D=[0 0];
L=combinator(N,2,'c');
for k=1:1:size(L,1)
if isequal(T(:,L(k,1)),T(:,L(k,2)))
D=[D;L(k,:)];
end
end
if size(D,1)>1
D(1,:)=[];
T(:,D(:,1))=[]; % removing repetitions
end
Student for ever
Student for ever le 8 Jan 2018
Hi John, Yes, all timeseries have same length, and I have already 200 timeseries, I am not going to use randn any more. So, I have already the matrix, which contains 200 columns with 131072x1 double length for all time series and I just want to check those 200 columns are not the same, if so then remove repetition.
Jan
Jan le 8 Jan 2018
@Ahmed: Then see my answer. Just transpose the input.
John BG
John BG le 8 Jan 2018
Hi Ahmed
correct, what's the point of simulating the data with randn if you can work directly on the data.
Ahmed, would you please be so kind to confirm that you have accepted the command unique answer?
Jan
Jan le 8 Jan 2018
Modifié(e) : Jan le 8 Jan 2018
@John BG: Why should Ahmed confirm this? As you know, only the OP can accept an answer in the first week. This fact was mentioned just some days ago: https://www.mathworks.com/matlabcentral/answers/375136-solving-system-of-equations#comment_520771 . You can take a look into "More > Recent Activity" also: See 8 Jan 2018 at 11:30.
You have started too many discussions about accepting answers already.
Stephen23
Stephen23 le 8 Jan 2018
Modifié(e) : Stephen23 le 8 Jan 2018
The help clearly states that "Answers can only be accepted by someone other than the author of the question after 7 days of inactivity from the author".
Student for ever
Student for ever le 9 Jan 2018
Modifié(e) : Student for ever le 9 Jan 2018
Thanks all for helping, your comments its really useful for me. @John BG, I already accept Jan's answer.

Connectez-vous pour commenter.

Catégories

Tags

Aucun tag saisi pour le moment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by