In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily means to improve the model performance, however it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

Data Visualization using plotly, matplotlib, seaborn and squarify

Article on Insight discovery from the Large Data set.

If we are given two models then we will prefer the model with lower AIC value. Hence we can say that AIC provides a means for model selection. AIC is only a relative measures among multiple models.

AIC is similar adjusted R-squared as it also penalizes for adding more variables to the model. absolute value of AIC does not have any significance. We only compare AIC value whether it is increasing or decreasing by adding more variables. Also in case of multiple models, the one which has lower AIC value is preferred.

So lets see how stepAIC works in R. We will use the mtcars data set. First remove the feature “x” by setting it to null as it contains only car models name which does not carry much meaning in this case. Also then remove the rows which contains null values in any of the columns using na.omit function. It is required to handle null values otherwise stepAIC method will give error. Then build the model and run stepAIC. for this we need MASS and CAR packages.

First parameter in stepAIC is the model output and second parameter is direction means which feature selection techniques we want to use and it can take the following values:

- “both” (for stepwise regression, both forward and backward selection);
- “backward” (for backward selection) and
- “forward” (for forward selection).

At the very last step stepAIC has produced the optimal set of features {drat, wt, gear, carb}. stepAIC also removes the Multicollinearity if it exists, from the model which I will explain in the next coming article.

Feel free to contact us for more details and discussions.

Recommended Articles:

- Basic-Statistics-For-Data-Science-Part-1
- What is Linear Regression? Part: 1
- What is Linear Regression? Part: 2
- Co-variance and Correlation
- What is the Coefficient of Determination | R Square
- Feature Selection Techniques in Regression Model

## 5 comments