Effacer les filtres
Effacer les filtres

fitlme different to lmer in R

49 vues (au cours des 30 derniers jours)
Rik Henson
Rik Henson le 25 Juin 2024 à 14:33
Commenté : Rik Henson le 26 Juin 2024 à 12:14
I am trying to illustrate Simpsons Paradox using fitlme:
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
23.7340 51.0000 4.0000
23.6630 52.0000 4.0000
22.4108 53.0000 4.0000
21.0697 54.0000 4.0000
20.3200 55.0000 4.0000
28.0275 61.0000 5.0000
28.0699 62.0000 5.0000
27.3365 63.0000 5.0000
25.8270 64.0000 5.0000
24.5141 65.0000 5.0000];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'})
figure,plot(tab.Age, tab.y,'o')
%tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
but I get a positive coefficient for the fixed effect of Age, when I was expecting a negative one:
Linear mixed-effects model fit by REML
Model information:
Number of observations 25
Fixed effects coefficients 2
Random effects coefficients 5
Covariance parameters 2
Formula:
y ~ 1 + Age + (1 | Participant)
Model fit statistics:
AIC BIC LogLikelihood Deviance
120.67 125.21 -56.334 112.67
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
'(Intercept)' -4.4658 1.3833 -3.2283 23 0.0037183 -7.3274 -1.6042
'Age' 0.49496 0.030545 16.204 23 4.4887e-14 0.43177 0.55815
Random effects covariance parameters (95% CIs):
Group: Participant (5 Levels)
Name1 Name2 Type Estimate Lower Upper
'(Intercept)' '(Intercept)' 'std' 2.4099e-16 NaN NaN
Group: Error
Name Estimate Lower Upper
'Res Std' 2.1706 1.6259 2.898
If I run what I think is equivalent model using "lmer" in R on exactly the same data, I get a negative coefficient for Age, as expected:
summary(lmer('y ~ Age + (1|Participant)', data = tab)) # "tab" is dataframe version of tab above
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ Age + (1 | Participant)
Data: tab
REML criterion at convergence: 88.2
Scaled residuals:
Min 1Q Median 3Q Max
-1.8806 -0.5016 0.1545 0.7367 1.3270
Random effects:
Groups Name Variance Std.Dev.
Participant (Intercept) 673.2727 25.9475
Residual 0.4153 0.6445
Number of obs: 25, groups: Participant, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 65.94094 12.24102 5.387
Age -1.13831 0.09058 -12.567
Correlation of Fixed Effects:
(Intr)
Age -0.318
What am I doing wrong with fitlme?
  2 commentaires
the cyclist
the cyclist le 25 Juin 2024 à 18:04
I don't have an answer for you; instead, I'm going to make it even more puzzling.
If I include only the first three participants, I get the sign you expect.
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
% 23.7340 51.0000 4.0000
% 23.6630 52.0000 4.0000
% 22.4108 53.0000 4.0000
% 21.0697 54.0000 4.0000
% 20.3200 55.0000 4.0000
% 28.0275 61.0000 5.0000
% 28.0699 62.0000 5.0000
% 27.3365 63.0000 5.0000
% 25.8270 64.0000 5.0000
% 24.5141 65.0000 5.0000
];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
m =
Linear mixed-effects model fit by REML Model information: Number of observations 15 Fixed effects coefficients 2 Random effects coefficients 3 Covariance parameters 2 Formula: y ~ 1 + Age + (1 | Participant) Model fit statistics: AIC BIC LogLikelihood Deviance 54.254 56.514 -23.127 46.254 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 41.176 8.9474 4.602 13 0.00049589 21.846 60.506 {'Age' } -0.89328 0.11223 -7.9596 13 2.3634e-06 -1.1357 -0.65083 Random effects covariance parameters (95% CIs): Group: Participant (3 Levels) Name1 Name2 Type Estimate Lower Upper {'(Intercept)'} {'(Intercept)'} {'std'} 14.105 5.2253 38.073 Group: Error Name Estimate Lower Upper {'Res Std'} 0.61861 0.40713 0.93996
the cyclist
the cyclist le 25 Juin 2024 à 18:07
Also, if I increase the size of your dataset (still including participants 4 & 5):
dat = repmat(dat,2,1);
and run the model, I also get the sign you expect.
I wonder if MATLAB is somehow getting stuck in a local minimum solution, on the original dataset.

Connectez-vous pour commenter.

Réponses (1)

the cyclist
the cyclist le 25 Juin 2024 à 18:26
Modifié(e) : the cyclist le 25 Juin 2024 à 19:59
Disclaimer: I don't fully understand the specifics on why you are seeing what you are.
Thinking about my comment about being stuck in a local minimum, I remembered that one can randomize the start. Here, I do that, run your code 500 times, and see what age coefficient I get. Clearly, things are a bit dicy.
The model with the negative age coefficient, is definitely better, with a much lower AIC.
rng default
dat = [
8.7050 21.0000 1.0000
7.3362 22.0000 1.0000
6.2369 23.0000 1.0000
4.8375 24.0000 1.0000
4.9990 25.0000 1.0000
13.5644 31.0000 2.0000
12.5219 32.0000 2.0000
12.3329 33.0000 2.0000
11.4778 34.0000 2.0000
10.0626 35.0000 2.0000
18.9573 41.0000 3.0000
17.9516 42.0000 3.0000
15.1706 43.0000 3.0000
16.1398 44.0000 3.0000
15.1729 45.0000 3.0000
23.7340 51.0000 4.0000
23.6630 52.0000 4.0000
22.4108 53.0000 4.0000
21.0697 54.0000 4.0000
20.3200 55.0000 4.0000
28.0275 61.0000 5.0000
28.0699 62.0000 5.0000
27.3365 63.0000 5.0000
25.8270 64.0000 5.0000
24.5141 65.0000 5.0000
];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
NS = 200;
[aic,ageCoefficient] = deal(zeros(NS,1));
for ns = 1:NS
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML','StartMethod','random');
ageCoefficient(ns) = m.Coefficients.Estimate(2);
aic(ns) = m.ModelCriterion.AIC;
end
figure
histogram(ageCoefficient)
xlabel('Age coefficient')
ylabel('Frequency')
figure
histogram(aic)
xlabel('AIC')
ylabel('Frequency')
  3 commentaires
Rik Henson
Rik Henson le 26 Juin 2024 à 7:14
Thank you so much for your time @the cyclist. I will explore the convergence problems you discovered (and perhaps see whether R is fitting differently).
Rik Henson
Rik Henson le 26 Juin 2024 à 12:14
I've played with the Optimizer options (and set FitMethod to REML, to match R's default), but still get the same problem of a local minimum. If I double the number of trials per participant (from 5 to 10), then I consistently get the correct negative coefficient for Age. So I guess there is still something different between Matlab's optimiser and R's, and the latter is just more robust to small data sets? Thanks again for confirming that I wasn't just being stupid (or perhaps I still am...;-).

Connectez-vous pour commenter.

Catégories

En savoir plus sur Preprocessing Data dans Help Center et File Exchange

Produits


Version

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by