What is Bagging in Ensemble Learning

Ensemble Learning says, if we can build multiple models then why to select the best one why not top 2, again why not top 3 and why not top 10. Then if you find top 10 deploy all 10 models. And when new data comes, make a prediction from all 10 models and combine the predictions and finally make a joint prediction. This is the key idea of ensemble learning.

How to start career in Data Science and Machine Learning

It does not matter how much experience you have, actually anybody can start or switch to data science and machine learning. The only important this is, how much eager you are for it. What it means to you. If you are very much keen to work in this field then nobody can stop you. There might be some short term hurdles however if you are focused enough and know your goals regarding where you want to see yourself after certain years, then you will definitely be successful in overcoming those hurdles.

A Complete Guide to K-Nearest Neighbors Algorithm – KNN using Python

k-Nearest Neighbors or kNN algorithm is very easy and powerful Machine Learning algorithm. It can be used for both classification as well as regression that is predicting a continuous value. The very basic idea behind kNN is that it starts with finding out the k-nearest data points known as neighbors of the new data point for which we need to make the prediction. And then if it is regression then take the conditional mean of the neighbors y-value and that is the predicted value for new data point. If it is classification then it takes the mode (majority value) of the neighbors y value and that becomes the predicted class of the new data point.

A Complete Guide to Principal Component Analysis – PCA in Machine Learning

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.

What is Logistic Regression?

Logistic regression is used for binary classification problem which has only two classes to predict. However with little extension and some human brain, it can easily be used for multi class classification problem. In this post I will be explaining about binary classification. I will also explain about the reason behind maximizing log likelihood function.

What is Multicollinearity?

Multicollinearity occurs in a multi linear model where we have more than one predictor variables. So Multicollinearity exist when we can linearly predict one predictor variable (note not the target variable) from other predictor variables with significant degree of accuracy. It means two or more predictor variables are highly correlated. But not the vice versa means if there is low correlation among predictors then also multicollinearity may exist.

What is stepAIC in R?

In R, stepAIC is one of the most commonly used search method for feature selection. We try to keep on minimizing the stepAIC value to come up with the final set of features. “stepAIC” does not necessarily means to improve the model performance, however it is used to simplify the model without impacting much on the performance. So AIC quantifies the amount of information loss due to this simplification. AIC stands for Akaike Information Criteria.

Feature Selection Techniques in Regression Model

Feature selection is a way to reduce the number of features and hence reduce the computational complexity of the model. Many times feature selection becomes very useful to overcome with overfitting problem. Feature selection helps us in determining the smallest set of features that are needed to predict the response variable with high accuracy. if we ask the model, does adding new features, necessarily increase the model performance significantly? if not then why to add those new features which are only going to increase model complexity.

What is Linear Regression? Part:1

Linear Regression is a field of study which emphasizes on the statistical relationship between two continuous variables known as Predictor and Response variables. Predictor variable is most often denoted as x and also known as Independent variable. Response variable is most often denoted as y and also known as Dependent variable.

Covariance and Correlation

Covariance and Correlation are very helpful while understanding the relationship between two continuous variables. Covariance tells whether both variables vary in same direction (positive covariance) or in opposite direction (negative covariance). Whereas Correlation explains about the change in one variable leads how much proportion change in second variable.