Data normalization using robust scaling

Hello all, I am trying to implement "Robust Scaling" but I am confused. Should I use "all" argument for "median" and "iqr" functions?
Thanks for the help.
DataSet = readtable('Datasets/Test.csv');
DataSet = table2array(DataSet); % Row:7195 x Colums:22
RScaling = (DataSet - median(DataSet))./iqr(DataSet)

 Réponse acceptée

Voss
Voss le 4 Juin 2024

1 vote

If you want to normalize all columns the same way (i.e., using the median and inter-quartile range of the entire data set), then use "all".
If you want to normalize each column separately (i.e., using each column's own median and inter-quartile range), then do not use "all". And in this case, it's best to use the dim argument set to 1, to explicitly say you want the median and iqr by column, in order to properly handle the situation that your data set has only one row.

4 commentaires

MB
MB le 4 Juin 2024
Modifié(e) : MB le 4 Juin 2024
Thank you for your answer. So, I can normalize each column separately or all columns together. I want to explore the effects of various normalization techniques on clustering. I've experimented with the methods defined in the "normalize" function without specifying the "dim" argument. If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately."
RScaling = (DataSet - median(DataSet, 1))./iqr(DataSet, 1)
Voss
Voss le 4 Juin 2024
Modifié(e) : Voss le 4 Juin 2024
You're welcome!
"If I understand correctly, this normalizes each column separately. "If A is a matrix, then normalize operates on each column of A separately.""
That's right. For a matrix that's not a vector, the default dim is 1, so you don't have to specify it (but it doesn't hurt to specify it). However, if you ever had the situation where your data set had one row, then you would need to specify dim as 1 if you want to normalize by column. Therefore, it's a good idea to always include the dim as 1. That's all I was suggesting.
Example: Matrix:
data = [1 2 3; 4 5 6] % non-vector matrix
data = 2x3
1 2 3 4 5 6
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % normalize each column
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % same
ans = 2x3
-0.7071 -0.7071 -0.7071 0.7071 0.7071 0.7071
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row
ans = 2x3
-1 0 1 -1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
Row vector (matrix with one row):
data = [1 2 3] % row vector
data = 1x3
1 2 3
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data) % without the dim specified, this normalizes all together this time
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,1) % normalize each column
ans = 1x3
NaN NaN NaN
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
normalize(data,2) % normalize each row (same as all together in this case)
ans = 1x3
-1 0 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
MB
MB le 4 Juin 2024
Many thanks.
Voss
Voss le 4 Juin 2024
You're welcome!

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Version

R2024a

Question posée :

MB
le 4 Juin 2024

Commenté :

le 4 Juin 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by