In the last post, we have discussed the Bagging technique and learnt how Bagging helps us in reducing the model variance. In this post, we will learn one more technique of Ensemble learning which is Boosting. So let me ask you a question. Suppose you have tried all the possible models and none of them performing as expected. So now what you will do? I will go with Boosting. Got the point?
Let me explain it in details. Before deep-diving into the Boosting let me tell you that like Bagging, Boosting is not used to reduce model variance however, it is used to improve the accuracy of the model. Also, this is one of the major difference between Bagging and Boosting.
Boosting is very simple. Just start with the weak learner also is known as the base learner. Weak learner I mean, one of those models which you have tried and not given the desired accuracy. Next Steps perform many iterations by assigning more weights to those observations which were classified wrongly in the previous iteration. In this way, we keep on boosting the model on each step. There is also the possibility that the observations which were correctly classified in previous steps might be misclassified in the next step. However, this will be taken care once you have performed many iterations and combine all the weak learners and do the final prediction.
I hope now, you must have got the basic idea on What is Boosting, why and when to use it. So let’s move ahead and discuss each of the point associated with boosting in detail.
- Boosting helps to improve the accuracy of any given machine learning algorithm.
- It is algorithm independent so we can apply it with any learning algorithms.
- It is not used to reduce the model variance.
- Iterations happen sequential not parallel.
- Boosting involves many iterations to strengthen the model accuracy, hence it becomes computationally costly, so it is important to understand when to apply it.
So I would say apply only when your base model is not giving the desired level of accuracy. With the desired level I mean, you need to know when you have to stop training. Means if you are getting 90% accuracy and the business case is ok with 90% so you can neglect the 10% misclassified observations. This will be more clear once you have enough experience of working in ML domain with business understanding. So if you are facing it little difficult to understand, you are good to ignore it. It will not have much effect on your algorithm understanding.
- Start with one of the ML algorithm. And train model on training data. Here if you see the above diagram, weak classifier 1 is trained on training data which has uniformally distributed weight to each observation. This means every observation in training data has equal preference.
- Check the observations (also known as rows / items) which are misclassified.
- Assign more weight to misclassified observations. This means, when we train the next time, algorithm will pay more attention to those classifier which were misclassified in previous step.
- Again train the model on new distribution. This will generate the weak classifier 2. Refer the above diagram.
- Iterate steps 2 to 4 enough no of times. This no is not fixed, and hence we call this one of the hyper parameter for boosting algorithm.
- We have to combine all these weak classifier and generate the final prediction. So after certain no of iteration we combine and check if accuracy is met then we stop otherwise we generate more weak classifier.
How should the weak rules be combined into a single rule?
Take a (weighted) majority vote of their predictions is natural and effective way of combining and getting the final prediction.
A more intuitive visual representation of how Boosting works.
This all about on concept level description what is boosting and how it helps to improve the model accuracy. Please share your thoughts on the post using the comment section.
If you are an aspiring data scientist or an experienced professional who is trying to make his career in Data Science, then you must visit E-network. Where we focus on high-quality interactive mock interview sessions and help you to Quick-start your Data Science and Machine Learning journey by Preparing a learning road-map, providing study material, suggesting Best training institutes and provide practice problems with their solutions and many more…
If you like my Posts on Machine Learning, Please connect with me on
Follow my blog: https://ashutoshtripathi.com/
Medium Articles: https://email@example.com