How do I find similar columns in a matrix?

7 vues (au cours des 30 derniers jours)
Raúl Alonso Merino
Raúl Alonso Merino le 1 Fév 2019
Commenté : Star Strider le 6 Fév 2019
Hello! So i have a matrix of 32x129600, I mean, there are 129600 columns of 32 elements each. What I want is to find the columns that are similar, let's say the difference between their elements is less than 0.001, how can I do this?
Thank you so much for your answers.
  2 commentaires
madhan ravi
madhan ravi le 1 Fév 2019
illustrate with a short example
Raúl Alonso Merino
Raúl Alonso Merino le 4 Fév 2019
Sorry for the delay, let's say I have the matrix of 32x129600 being 129600 columns like these:
4.41062884160757 4.41196217494090 4.41212293144208 4.41388179669031
1.86364066193853 1.86469030732861 1.86571158392435 1.86776359338061
0.444756501182033 0.445465721040189 0.446458628841608 0.448189125295508
0 0 0 0
0.443820330969267 0.442430260047281 0.440368794326241 0.428586288416076
1.86229787234043 1.85912056737589 1.83472340425532 1.82652482269504
4.40844444444444 4.37501654846336 4.36463356973995 4.32198581560284
8.29286052009456 8.24457683215130 8.19818439716312 8.13262411347518
10.2203971631206 10.2081040189125 10.1855697399527 10.1534373522459
11.3760472813239 11.3614657210402 11.3377777777778 11.3021938534279
12.0883782505910 12.0723026004728 12.0445390070922 12.0104775413712
12.3284539007092 12.3124917257683 12.2850874704492 12.2464775413712
12.0873191489362 12.0701654846336 12.0421465721040 12.0041607565012
11.3757068557920 11.3597068557920 11.3314042553191 11.2919432624113
10.2197825059102 10.2027044917258 10.1758203309693 10.1380709219858
8.28737588652482 8.24057683215130 8.18247754137116 8.08170212765958
4.40765957446809 4.37169739952719 4.32725295508274 4.27895981087471
1.86131442080378 1.85562174940898 1.82788652482270 1.79739007092199
0.443546099290780 0.440737588652482 0.427962174940898 0.422061465721040
0 0 0 0
0.445068557919622 0.446364066193853 0.458373522458629 0.462193853427896
1.86421749408983 1.86642080378251 1.88921040189125 1.89646335697400
4.41186761229314 4.41580141843972 4.42049172576832 4.45572576832151
8.29808037825059 8.30067139479906 8.34285579196217 8.35034515366430
10.2278770685579 10.2300141843972 10.2317352245863 10.2389881796690
11.3817304964539 11.3826004728132 11.3858912529551 11.3876595744681
12.0936453900709 12.0933427895981 12.0932671394799 12.0930023640662
12.3330118203310 12.3328605200946 12.3330212765957 12.3311016548463
12.0913947990544 12.0924255319149 12.0905815602837 12.0882080378251
11.3808037825059 11.3790354609929 11.3786666666667 11.3758865248227
10.2252482269504 10.2248416075650 10.2240283687943 10.2222505910165
8.28993853427896 8.28993853427896 8.28993853427896 8.28993853427896
How can I compare them and finding which one are similar between them finding that the difference between the same element of a row is for example less than 0.001. I mean, there is for example in the same row 8.53, 9.53 and 8.54 so the more similar ones would be 8.53 and 8.54, therefore both columns can be considered as similar. I do not know if I have explained myself.

Connectez-vous pour commenter.

Réponse acceptée

Star Strider
Star Strider le 1 Fév 2019
Modifié(e) : Star Strider le 4 Fév 2019
I would use the pdist (link) function. It will compare the columns of your data using one of the built-in distance metrics, or one you can define.
EDIT — (4 Feb 2019 at 21:50)
Using the matrix you posted (That I will call ‘M’), the pdist call would be:
dr = pdist(M','cityblock'); % Transpose ‘M’ To Compare Columns
Result = squareform(dr)
and the results for this matrix are then:
Result =
0 0.3065 0.8071 1.4538
0.3065 0 0.5030 1.1494
0.8071 0.5030 0 0.6467
1.4538 1.1494 0.6467 0
To find the rows and columns of those values that meet your criterion:
[Row,Col] = find(Result <= 0.001 & Result > 0);
that here returns an empty matrix.
  11 commentaires
Raúl Alonso Merino
Raúl Alonso Merino le 6 Fév 2019
Modifié(e) : Raúl Alonso Merino le 6 Fév 2019
Thank you, I will test it and answer when it finishes, thanks a lot. Well yes, by preallocating it, it says it needs a matrix of 129600x129600x32 which would be 4004.5 GB of memory so ...
I will probably have to make it by parts, choosing some columns each time. But thank you.
Star Strider
Star Strider le 6 Fév 2019
My pleasure.

Connectez-vous pour commenter.

Plus de réponses (1)

KSSV
KSSV le 1 Fév 2019
You can have a llok on ismember, ismemebrtol. Also you can use isequal for logical check whether the lements matching exactly. You may also plot a histogram and see. You can run a loop, get the difference and check. There are multiple methods to achieve what you want.
  1 commentaire
Raúl Alonso Merino
Raúl Alonso Merino le 4 Fév 2019
Could you please add an example code of how to do this? I'm quite new in Matlab and I don't really know how, sorry. Let's say I have 129600 columns with 32 rows, how to compare the 129600 elements of one row between them and set a minimum difference of 0.001 to see if they are inside that difference to be considered similar values.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Matrix Indexing dans Help Center et File Exchange

Produits


Version

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by