Why does categories(TableA.Var2) have more elements than the rows in TableA?

3 vues (au cours des 30 derniers jours)
I do not understand the behaviour I'm seeing when I use categories(TableA.Var2) with a table. I'm getting more categories than the amount of rows in the table and I'm also seeing values in the categories list that do not appear in the column of the table that I'm looking at.
I read a table from an Excel file and then want to create a list of the unique Fault Labels. When I use unique(ModuleFC.FaultLabel) I get a list of the unique Fault Labels. I'm very confused why my attempt to use FaultList = categories(ModuleFC.FaultLabel) gives me more elements than the amount of rows in the table.
function FaultCodeGroups
%% Import data from spreadsheet
% Script for importing data from the following spreadsheet:
%
% Workbook: C:\FaultCodesTable.xlsx
% Worksheet: FaultCodesTable
%
% To extend the code for use with different selected data or a different
% spreadsheet, generate a function instead of a script.
% Auto-generated by MATLAB on 2020/01/16 12:14:02
%% Import the data
[~, ~, raw] = xlsread('C:\FaultCodes.xlsx','FaultCodes');
raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''};
stringVectors = string(raw(:,[1,2,3]));
stringVectors(ismissing(stringVectors)) = '';
%% Replace non-numeric cells with NaN
R = cellfun(@(x) ~isnumeric(x) && ~islogical(x),raw); % Find non-numeric cells
raw(R) = {NaN}; % Replace non-numeric cells
%% Create table
FaultCodesTable = table;
%% Allocate imported array to column variable names
FaultCodesTable.Model = categorical(stringVectors(:,1));
FaultCodesTable.FaultLabel = categorical(stringVectors(:,2));
FaultCodesTable.Module = categorical(stringVectors(:,3));
%% Clear temporary variables
clearvars data raw stringVectors R;
% Get a list of all the different car models
CarList = categories(FaultCodesTable.Model);
% Get a list of all the modules
ModuleList = categories(FaultCodesTable.Module);
%% Extract the fault codes for each car line
for CLCount = 1:numel(CarList)
% Extract table with only the fault codes of the currently selected car line
CarLineFaultCodes = FaultCodesTable(FaultCodesTable.Model == CarList{CLCount},:);
% Delete FaultCodesTable to ensure its Faultlabel column does not exist in memory
% This is only for debugging. Can not use this in actual for loop
FaultCodesTable = [];
%% Extract the fault codes for each module
for MCount = 1:numel(ModuleList)
% Extract the fault codes for the currently selected module
ModuleFC = CarLineFaultCodes(CarLineFaultCodes.Module == ModuleList{MCount},:);
% Delete CarLineFaultCodes to ensure its Faultlabel column does not exist
% This is only for debugging. Can not use this in actual for loop
CarLineFaultCodes = [];
% ModuleFC has 996 rows
% Get a list of the unique Fault Labels in ModuleFC
UniqueFaults = unique(ModuleFC.FaultLabel);
numel(UniqueFaults) % 59 unique Fault Labels
% Get categories in FaultLabel column of ModuleFC
FaultList = categories(ModuleFC.FaultLabel);
UniqueFL = unique(FaultList);
numel(UniqueFL) % 1127 ,but ModuleFC has 996 rows ???
end
end
end

Réponse acceptée

Steven Lord
Steven Lord le 16 Jan 2020
Your CarLineFaultCodes table only contains a subset of the rows of FaultCodesTable, and your ModuleFC table only contains a subset of the rows of CarLineFaultCodes. But CarLineFaultCodes and ModuleFC were created from FaultCodesTable via indexing. Therefore the FaultLabel variable in those table arrays were created from the FaultLabel variable in the original table and can take as a value any of the category values the FaultLabel variable in that original table could take.
Indexing into a categorical variable to create a new variable with a subset of the entries doesn't trim the list of categories the new variable can take to just those actually present in that subset. Doing so would be inefficient if the subset was large (we'd need to compute the unique set of categories present.) It also could cause problems if you wanted to concatenate that subset with a different subset that contained a category that had been trimmed (especially if the original categorical array isprotected.)
If you really do want to trim the list of categories, you can use removecats to do so. But you will need to explicitly do so, MATLAB will not do it for you automatically.
  4 commentaires
Peter Perkins
Peter Perkins le 16 Jan 2020
Just to expand on what Steve said:
The main purpose of categorical (over, say, an array of strings) is to maintain a list of all the possible values that your data could take on. Just because your current data set only contains things that happened on Mon, Tue, and Fri doesn't mean that you stoped caring about Wed and Thu, it just means that the data you have in hand happens to not have anything on Wed and Thu. So if you want to compute, say, the total sales by day of week, it would probably be useful to know that the totals for Web and Thu are 0.
Sometimes not, but mostly you want to hang on to knowledge of the possible values even if your current data don't happen to have any instances of them.
It's easy to drop "unused" categories, just call removecats. The mirror image is that when you are reading data from a file and converting to categorical, it is often a good idea to specify all the possible categories, in case your data don't hit all the possibilities.
Hannes Truter
Hannes Truter le 17 Jan 2020
Thank you very much Eric and Peter for taking time to explain categorical in even more detail and giving examples. I definitely had the wrong idea of how categorical worked and how it should be used correctly. With your explanations I should be able to modify my code to get out the data I want.
I am very impressed by the MATLAB support by staff members. It is very rare nowadays to see software companies providing real support. Many rely on user forums to do it for them.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Get Started with MATLAB dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by