How I categorize a features?
    6 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
    HelpAStudent
 le 14 Mai 2022
  
    
    
    
    
    Commenté : the cyclist
      
      
 le 14 Mai 2022
            Hi! I have a dataset like the histogram here: with some data around 0, some other around 1, 2, 3, 4 and 5. 
I would like to make the features categorical as the amount at witch are they roughly equal in value. 
This is the histogram of the features: 

Please help me 
1 commentaire
Réponse acceptée
  the cyclist
      
      
 le 14 Mai 2022
        
      Modifié(e) : the cyclist
      
      
 le 14 Mai 2022
  
      Do you mean that you have numerical values, and you want to treat those as categorical instead? You can convert numeric to categorical using the categorical function.
x = 1:5
c = categorical(x)
You said "roughly" equal in value, so maybe you need to do some rounding first?
x = [1.1 2.2 2.9 3.8 5.1]
c = categorical(round(x))
1 commentaire
  the cyclist
      
      
 le 14 Mai 2022
				When I wrote this answer, I hadn't noticed that your values are not 1,2,3,4,5, but rather 10^-3 times that. So, you'll need to round differently:
x = [1.1 2.2 2.9 3.8 5.1]*1.e-3
rx = round(x,3)
c = categorical(rx)
Plus de réponses (1)
  Image Analyst
      
      
 le 14 Mai 2022
        You can add a tiny bit of noise then recompute the histogram edges such that the bins will be equal percentages (heights).  Like this:
data = [zeros(1, 1580), ones(1, 50), 2*ones(1, 70), 2*ones(1, 50), 3*ones(1, 40), 4*ones(1, 25), 4.7*ones(1, 10)]/1000;
subplot(2, 1, 1);
[counts, edges] = histcounts(data);
bar(edges(1:end-1), counts);
grid on;
title('Uneven Bars', 'FontSize', 20);
% Now add a tiny bit of noise and sort
noisyData = data + 0.000001 * rand(size(data));
sortedData = sort(noisyData);
% Get cdf
c = cumsum(sortedData);
c = rescale(c, 0, 100); % Convert to percent.
% Find 6 bins
numBins = 6;
indexes = round(linspace(1, length(data), numBins+1))
edges2 = sortedData(indexes)
subplot(2, 1, 2);
counts2 = histcounts(noisyData, edges2)
bar(edges2(1:end-1), counts2);
grid on;
title('Even Bars', 'FontSize', 20);
1 commentaire
  the cyclist
      
      
 le 14 Mai 2022
				I'll point out here that @Image Analyst seems to have interpreted your phrase "as the amount at witch are they roughly equal in value" to mean you want the bar heights to be equal.
I interpreted that differently, and took it to to mean that you wanted your data values to be equal (rather than "roughly equal"), which is why our two approaches are very different.
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
