Classification by logistic regression

Question

0 votes

I am new learner in the field of classification, and am stuck with a problem while implementing logistic regression:

My data set consists of about 300 measurement, with 20 features. I implemented logistic regression model using glmfit and got the probability (Y) values. Next, I use the model output (Y) to generate ROC curve, which gives me sensitivity and specificity of the model/technique.

(1) I am using the entire data set for training and testing. Is that correct? If not, how can I validate my model? Is there a way to know if I am not overfitting by using all the features?

(2)I have tried to implement k-fold cross-validation(k =10), by running logistic regression and getting the sensitivity/specificity for test set 10 times. But my concern is that I am creating a new model for each of the 10 training sets, so in the end I do not have a single classifier.

Thanks,

Vikrant

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Ilya le 28 Déc 2011

0 votes

Because logistic regression is a simple linear model and because you have 10 times as many observations as predictors, the classification error measured on the training set should not be far off the true value. Even so, it is best to validate your model on data not used for training. 300 observations are not a lot, so you would likely be better off cross-validating the classification error and ROC curve.

10-fold stratified cross-validation is a good rule of thumb. This is what you get from function CROSSVAL by default. Several runs of 10-fold cross-validation would be even better.

Hosmer-Lemeshow goodness of fit test is often used for logistic regression models. It is described in many places.

You can use SEQUENTIALFS (with cross-validation) to see if you need all predictors.

Logistic regression and cross-validation are described in many textbooks, by the way.

2 commentaires
Afficher Aucune Masquer Aucune

Vikrant le 28 Déc 2011

Thanks Ilya. I'll explore some of the functions you suggested.

If you can point out some references where I can find implementation details, that would be great!

Once again, thank you for the reply!

Ilya le 29 Déc 2011

It is best to gain some understanding of the theory and then look at demos and documentation examples in the Statistics Toolbox.

The doc page for glmfit has a few references at the bottom. Cross-validation is discussed, for example, in Elements of Statistical Learning by Hastie, Tibshirani & Friedman. I don't have a good reference for sequential feature selection, but examples on the doc page for sequentialfs should suffice. For small data with not too many predictors, I would recommend backward elimination.

Connectez-vous pour commenter.

Classification by logistic regression

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires
Afficher Aucune Masquer Aucune

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

Classification by logistic regression

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

2 commentaires Afficher Aucune Masquer Aucune

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

2 commentaires
Afficher Aucune Masquer Aucune