Linear Regression

What is stepAIC in R?

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily means to improve the model performance, however it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

If we are given two models then we will prefer the model with lower AIC value. Hence we can say that AIC provides a means for model selection. AIC is only a relative measures among multiple models.

AIC is similar adjusted R-squared as it also penalizes for adding more variables to the model. absolute value of AIC does not have any significance. We only compare AIC value whether it is increasing or decreasing by adding more variables. Also in case of multiple models, the one which has lower AIC value is preferred.

So lets see how stepAIC works in R. We will use the mtcars data set. First remove the feature “x” by setting it to null as it contains only car models name which does not carry much meaning in this case. Also then remove the rows which contains null values in any of the columns using na.omit function. It is required to handle null values otherwise stepAIC method will give error. Then build the model and run stepAIC. for this we need MASS and CAR packages.

First parameter in stepAIC is the model output and second parameter is direction means which feature selection techniques we want to use and it can take the following values:

  • “both” (for stepwise regression, both forward and backward selection);
  • “backward” (for backward selection) and
  • “forward” (for forward selection).

At the very last step stepAIC has produced the optimal set of features {drat, wt, gear, carb}. stepAIC also removes the Multicollinearity if it exists, from the model which I will explain in the next coming article.

So in previous post Feature Selection Techniques in Regression Model we have learnt how to perform Stepwise Regression, Forward Selection and Backward Elimination techniques in detail. StepAIC is an automated method that returns back the optimal set of features.

Recommended Articles:

Advertisements

2 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s