How do I create a new table variable based on the concatenation of two or more categorical variables?

2 vues (au cours des 30 derniers jours)
I am looking to simply concatenate three variables in a table (2310 rows), where those variables are categorical. I tried:
my_table.new_variable = strcat(my_table.variable1, "_", my_table.variable2, "_", my_table.variable3);
but it does not like the delimiter or the categories, and gave an error.
Error using strcat
Inputs must be character vectors, cell arrays of character vectors, or string arrays.
I tried
a = [my_table.variable1, my_table.variable2, my_table.variable3];
my_table.new_variable = strjoin(a , "_");
but these are categorical and maybe that is why this error (maybe same as above)? a is 2310x3 categorical
Error using strjoin
First input must be a string array or cell array of character vectors.
What I have are three categorical variables that I use for grouping, and I want to create charts and summary statistics by various combinations of the grouping variables. An example would be
my_table.make - [ Ford, Toyota, BMW]
my_table.types - [SUV, Sedan, Compact ]
I want
my_table.configuration - [ Ford_SUV, Ford_Sedan, BMW_Compact, ... etc ].
However, my data set does not have all combinations, which is why I first thought of strcat(). Maybe I am missing a feature in the use of groups in charts and stats.

Réponses (1)

Steven Lord
Steven Lord le 6 Jan 2023
Do you want these new identifiers (Ford_SUV, Ford_Sedan, BMW_Compact, etc.) to be new categories in a new categorical variable in your table or do you want them to be text labels ("Ford_SUV", "Ford_Sedan", "BMW_Compact", etc.)?
If the latter:
c = categorical(["Red", "Blue", "Green"])
c = 1×3 categorical array
Red Blue Green
s = string(c(1)) + "_" + string(c(3))
s = "Red_Green"
Or for a vector case:
whichColors = randi(numel(c), 5, 3);
sv = string(c(whichColors))
sv = 5×3 string array
"Green" "Red" "Blue" "Blue" "Red" "Blue" "Blue" "Red" "Blue" "Green" "Blue" "Green" "Green" "Red" "Blue"
y = join(sv(:, [1 3]), "_")
y = 5×1 string array
"Green_Blue" "Blue_Blue" "Blue_Blue" "Green_Green" "Green_Blue"
If the latter it gets a little trickier to work with, depending on what you're trying to do. You won't be able to easily refer to the new categorical array using the categories from the old one even if one of the new categories was created from an instance of that category in the original categorical array.
c2 = categorical(y)
c2 = 5×1 categorical array
Green_Blue Blue_Blue Blue_Blue Green_Green Green_Blue
joinedCategories = categories(c2)
joinedCategories = 3×1 cell array
{'Blue_Blue' } {'Green_Blue' } {'Green_Green'}
originalCategories = categories(c)
originalCategories = 3×1 cell array
{'Blue' } {'Green'} {'Red' }
y == originalCategories(1)
ans = 5×1 logical array
0 0 0 0 0
  1 commentaire
Ted H
Ted H le 6 Jan 2023
Modifié(e) : Ted H le 6 Jan 2023
I am trying to achieve the former. "new identifiers (Ford_SUV, Ford_Sedan, BMW_Compact, etc.) to be new categories in a new categorical variable in your table"
the join recommendation yeilds this error message
Error using join
First argument must be text.
I want to run summary statistics for a Ford SUV and compare to a BMW Compact, or create a box whiskers plot with both of these (and the other 9 configurations) on the same chart. I can easily get this information for Ford, or for SUV, etc. I would like to combine.
I suppose one way would be to create a non-catetorical variable based on the categorical, then use strjoin(). But I can't be the first person to what to look at combinations of categories (not just filtering/indexing) by multiple categories). So I was thinking there is any easy way to do this. strjoin() should work, but many matlab tools become unavailable when a table variable becomes categorical.

Connectez-vous pour commenter.


En savoir plus sur Data Type Conversion dans Help Center et File Exchange




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by