What is the reference category in the output for a Fitlme with categorical variables and three-way interaction terms?

10 views (last 30 days)
Below table summarizes the output of a mixed linear model with random intercept and slope run on structured panel data ('tbl_early'), where the model specifies as:
lme_PrimaryHU = fitlme(tbl_early, 'logRoL ~ 1 + logLoL + logAnnLioL + Dur + PPI + AvgEffTax_1 + HU + logEP*logAP + EQ*PrimaryHU*relInsLoss_1 + Wstorm*relInsLoss_1 +Storm*PrimaryHU*relInsLoss_1 +(FFR|ID)')
'Dur' has 4 levels and therefore I understood that the output shows three levels with estimates that relate to the fourth, i.e. the reference level ('Dur_one'). From the results one could interpret that Dur_onehalf trades at a discount if compared to Dur_one, all else equal.
'HU', 'Storm', 'EQ' and 'Wstorm' are binary variables, they are not mutually exclusive (cross-sectional analysis) and there is no case in the data in which all of them would be 0. Thus the question is, which of these variables Matlab chose as reference case. !Note that some of the peril variables are used in two- or three-way interaction terms that appear a bit lower in the table! 'PrimaryHU' is a binary variable that controlls for a certain condition which impacts the potential effects from relInsLoss or 'HU', 'Storm', 'EQ' and 'Wstorm' (e.g. 'EQ' alone is positive but not significant at p<0.1, 'EQ*relInsLoss' is negative and still not significant, 'EQ*primaryHU' is negative and significant, 'EQ*relInsLoss*PrimaryHU' is positive and significant). All remaining variables are continuous.
Two-way interactions used:
  • 'logAP*logEP'
  • 'Wstorm*relInsLoss'
Three-way interactins used:
  • 'Storm*PrimaryHU*relInsLoss'
  • 'EQ*PrimaryHU*relInsLoss
Other interaction terms or underlying variables' seperate estimates should be a product of using above interaction terms.
Many thanks for any help in advance!

Accepted Answer

Peng Li
Peng Li on 10 May 2020
The table you copied isn't the default display from matlab, so it's difficult to tell anything from there. It's like an ANOVA output since items (including interaction items) that are categorical each corresponding to only one line.
As you mentioned, for categorical variables, regression will give explicitely which level that record is for, and the level that without an output row is the reference level. Dichotomous variable is just a specific case of categorical variable. For example if you have sex (0/1), it usually gives sex[1] xx, xx, xx, xx..., that means 0 is used as a reference. Same strategy is used to display interaction items that involve categorical variables.
You have to explicitly make them categorical as well by, e.g., tbl.sex = categorical(tbl.sex); otherwise by default it is used as a continous variable, and thus 0 is always the default reference value.
In the equation you used, FFR doesn't appear as a fixed effect. If you only want a subject specific intercept, use (1|ID) otherwise make sure that that's what you really want.
Peng Li
Peng Li on 28 May 2020
Hi Rob,
Sorry that I overlooked this thread. This is replying your last question: you could explicitly drop the interaction item in your equation by adding "- Storm*PrimaryHU*relInsLoss_1", or you can add each item seperately. Again, to make this simpler:
y ~ x1*x2
y ~ x1 + x2 + x1:x2
these two are identical, both being with the interaction between x1 and x2.
y ~ x1 + x2
y ~ x1*x2 - x1:x2
these two are identical, both being without are the interaction item.
y ~ x1*x2*x3 - x1:x2:x3
means all main effects plus two way interactions between each pairs. The three way interaction item is abandoned by using - x1:x2:x3.
Check this
Hope this helps!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by