## What is stepAIC in R?

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily means to improve the model performance, however it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

If we are given two models then we will prefer the model with lower AIC value. Hence we can say that AIC provides a means for model selection. AIC is only a relative measures among multiple models.

AIC is similar adjusted R-squared as it also penalizes for adding more variables to the model. absolute value of AIC does not have any significance. We only compare AIC value whether it is increasing or decreasing by adding more variables. Also in case of multiple models, the one which has lower AIC value is preferred.

So lets see how stepAIC works in R. We will use the mtcars data set. First remove the feature “x” by setting it to null as it contains only car models name which does not carry much meaning in this case. Also then remove the rows which contains null values in any of the columns using na.omit function. It is required to handle null values otherwise stepAIC method will give error. Then build the model and run stepAIC. for this we need MASS and CAR packages.

First parameter in stepAIC is the model output and second parameter is direction means which feature selection techniques we want to use and it can take the following values:

- “both” (for stepwise regression, both forward and backward selection);
- “backward” (for backward selection) and
- “forward” (for forward selection).

At the very last step stepAIC has produced the optimal set of features {drat, wt, gear, carb}. stepAIC also removes the Multicollinearity if it exists, from the model which I will explain in the next coming article.

If you are an aspiring data scientist or an experienced professional who is trying to make his career in Data Science, then you must visit E-network. Where we focus on high-quality interactive mock interview sessions and help you to QuickStart your Data Science and Machine Learning journey by Preparing a learning roadmap, providing study material, suggesting Best training institutes and provide practice problems with their solutions and many more…

Feel free to contact us for more details and discussions.

Recommended Articles:

Pingback: What is Multicollinearity? – TECH tunnel

Pingback: Logistic Regression with an example in R – TECH tunnel

Pingback: A Complete Guide to Principal Component Analysis – PCA in Machine Learning – TECH Tunnel

Pingback: Step by Step Approach to Principal Component Analysis using Python – TECH Tunnel

Pingback: Data Science and Machine Learning Articles | Yearly round-up 2019 – Data Science, Machine Learning & Artificial Intelligence