Load the sample data.
This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:
Flag to indicate whether the batch used the new process (newprocess
)
Processing time for each batch, in hours (time
)
Temperature of the batch, in degrees Celsius (temp
)
Categorical variable indicating the supplier (A
, B
, or C
) of the chemical used in the batch (supplier
)
Number of defects in the batch (defects
)
The data also includes time_dev
and temp_dev
, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.
Fit a generalized linear mixed-effects model using newprocess
, time_dev
, temp_dev
, and supplier
as fixed-effects predictors. Include a random-effects term for intercept grouped by factory
, to account for quality differences that might exist due to factory-specific variations. The response variable defects
has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as 'effects'
, so the dummy variable coefficients sum to 0.
The number of defects can be modeled using a Poisson distribution
This corresponds to the generalized linear mixed-effects model
where
is the number of defects observed in the batch produced by factory during batch .
is the mean number of defects corresponding to factory (where ) during batch (where ).
, , and are the measurements for each variable that correspond to factory during batch . For example, indicates whether the batch produced by factory during batch used the new process.
and are dummy variables that use effects (sum-to-zero) coding to indicate whether company C
or B
, respectively, supplied the process chemicals for the batch produced by factory during batch .
is a random-effects intercept for each factory that accounts for factory-specific variation in quality.
glme =
Generalized linear mixed-effects model fit by ML
Model information:
Number of observations 100
Fixed effects coefficients 6
Random effects coefficients 20
Covariance parameters 1
Distribution Poisson
Link Log
FitMethod Laplace
Formula:
defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory)
Model fit statistics:
AIC BIC LogLikelihood Deviance
416.35 434.58 -201.17 402.35
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)'} 1.4689 0.15988 9.1875 94 9.8194e-15 1.1515 1.7864
{'newprocess' } -0.36766 0.17755 -2.0708 94 0.041122 -0.72019 -0.015134
{'time_dev' } -0.094521 0.82849 -0.11409 94 0.90941 -1.7395 1.5505
{'temp_dev' } -0.28317 0.9617 -0.29444 94 0.76907 -2.1926 1.6263
{'supplier_C' } -0.071868 0.078024 -0.9211 94 0.35936 -0.22679 0.083051
{'supplier_B' } 0.071072 0.07739 0.91836 94 0.36078 -0.082588 0.22473
Random effects covariance parameters:
Group: factory (20 Levels)
Name1 Name2 Type Estimate
{'(Intercept)'} {'(Intercept)'} {'std'} 0.31381
Group: Error
Name Estimate
{'sqrt(Dispersion)'} 1
Perform an -test to determine if all fixed-effects coefficients are equal to 0.
stats =
ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL'
Term FStat DF1 DF2 pValue
{'(Intercept)'} 84.41 1 94 9.8194e-15
{'newprocess' } 4.2881 1 94 0.041122
{'time_dev' } 0.013016 1 94 0.90941
{'temp_dev' } 0.086696 1 94 0.76907
{'supplier' } 0.59212 2 94 0.5552
The -values for the intercept, newprocess
, time_dev
, and temp_dev
are the same as in the coefficient table of the glme
display. The small -values for the intercept and newprocess
indicate that these are significant predictors at the 5% significance level. The large -values for time_dev
and temp_dev
indicate that these are not significant predictors at this level.
The -value of 0.5552 for supplier
measures the combined significance for both coefficients representing the categorical variable supplier
. This includes the dummy variables supplier_C
and supplier_B
as shown in the coefficient table of the glme
display. The large -value indicates that supplier
is not a significant predictor at the 5% significance level.