Increasing efficiency of one-hot encoding

2 vues (au cours des 30 derniers jours)
Ruaridh Mon-Williams
Ruaridh Mon-Williams le 14 Jan 2020
I have a dataset - 50 variables and an output. There are 17 categories for this dataset. I want to do feature selection on this dataset to determine which variables are significant. I am using the fsrnca function + one-hot encoding (so adding a matrix of size no.observations*17, with 1s and 0s to deal with the categories and concatenating this maxtrix to X so X' = [X_categories X] & y remains as it is. I am wondering if there is a faster way of doing this (than this standard one-hot encoding approach) (run-time is very slow as very high dimensionality). Hope this makes sense. Thanks!
  3 commentaires
darova
darova le 16 Jan 2020
And where is the code?
Athul Prakash
Athul Prakash le 28 Jan 2020
Kindly provide your code so that others can investigate which step is slowing you down.

Connectez-vous pour commenter.

Réponses (1)

Walter Roberson
Walter Roberson le 28 Jan 2020
catnum = uint8(TheCategorical(:).');
numcat = max(catnum);
OH = zeros(NumberOfObservations, numcat);
OH(sub2ind(size(OH), 1:NumberOfObservations, catnum)) = 1;
Or
catnum = uint8(TheCategorical(:).');
OH = sparse(1:NumberOfObservations, catnum, 1);

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by