Can we use the output of a regression ML algorithm as an input for another ML algorithm?

5 vues (au cours des 30 derniers jours)
I've used a regression decision tree for prediction and got around 92% correlation between the predicted and actual values.
Then, the actual values and other features were used to train another regression DT to predict another parameter, and a 90% correlation was achieved.
However, when I tried to use the predicted values from the first model to train different ML algorithms including DT, I got less than a 40% correlation.
So, why do I have bad results when the actual values from the first model are used to feed the second model as inputs?
I've read some ideas about doing a cascaded DT, but I am not sure if this would help and if it's doable for regression DT or not.
Thanks

Réponses (1)

Ayush Aniket
Ayush Aniket le 8 Mai 2025
The drop in performance when using predicted values from the first regression decision tree as inputs to the second model is a common issue known as error propagation.
Why it happens?
  • The first model's predictions are not perfect—they contain errors. When you use these predicted values as features for a second model, the second model learns from noisy, less informative inputs, leading to lower predictive accuracy.
  • Actual values contain the true signal, whereas predicted values are approximations. This loss of information reduces the second model's ability to capture underlying relationships.
  • Training on actual values and then switching to predicted values introduces a mismatch. The second model expects high-quality inputs but receives less reliable ones during prediction.
You can try the approaches below:
  • When training the second model, use cross-validated predictions from the first model (not the actuals), so the second model learns to handle the prediction noise.
% Example: Use cross-validation to generate predicted values for training
cvmodel = crossval(firstTreeModel);
predictedVals = kfoldPredict(cvmodel);
% Use predictedVals as features for the second model
Refer the following documentation link to read about RegressionPartitionedModel: https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedmodel.html
  • Analyze the residuals (errors) of the first model. If they are large or systematic, the second model will struggle.
  • Instead of only using the predicted value from the first model, consider including other original features to provide more information to the second model.

Produits


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by