Data Science Duniya

How to Use LinkedIn to Drive Traffic to Your Blog

July 8, 2019 Chetna Tripathi

Linkedin is a professional networking platform. where employers and employees can connect to each other. LinkedIn had 630 million registered members in 200 countries as

What is Logistic Regression?

June 17, 2019 Ashutosh Tripathi

Logistic regression is used for binary classification problem which has only two classes to predict. However with little extension and some human brain, it can easily be used for multi class classification problem. In this post I will be explaining about binary classification. I will also explain about the reason behind maximizing log likelihood function.

What is Multicollinearity?

June 13, 2019 Ashutosh Tripathi

Multicollinearity occurs in a multi linear model where we have more than one predictor variables. So Multicollinearity exist when we can linearly predict one predictor variable (note not the target variable) from other predictor variables with significant degree of accuracy. It means two or more predictor variables are highly correlated. But not the vice versa means if there is low correlation among predictors then also multicollinearity may exist.

What is stepAIC in R?

June 10, 2019 Ashutosh Tripathi

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily means to improve the model performance, however it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

Feature Selection Techniques in Regression Model

June 7, 2019 Ashutosh Tripathi

Feature selection is a way to reduce the number of features and hence reduce the computational complexity of the model. Many times feature selection becomes very useful to overcome with overfitting problem. Feature selection helps us in determining the smallest set of features that are needed to predict the response variable with high accuracy. if we ask the model, does adding new features, necessarily increase the model performance significantly? if not then why to add those new features which are only going to increase model complexity.