Guys, I have consolidated all my ML and DS articles. In case you have missed it, here are the links in one place.Read more
Category Archives: Data Science
In the last post, we have discussed the Bagging technique and learnt how Bagging helps us in reducing the model variance. In this post, we will learn one more technique of Ensemble learning which is Boosting. So let me ask you a question. Suppose you have tried all the possible models and none of them performing as expected. So now what you will do? I will go with Boosting. Got the point?Read more
In general, any of the machine learning problems we try to find the best possible optimal model for a given problem. That means finding the best possible model within the given model family, for example, finding the best possible decision tree or finding the best possible KNN model. And if we have more time then we can try all model families available, and come up with the best possible regression model, best possible KNN model, best possible SVM model etc. And among these again select the best possible model, which will be either KNN, SVM or any other.Read more
It does not matter how much experience you have, actually anybody can start or switch to data science and machine learning. The only important this is, how much eager you are for it. What it means to you. If you are very much keen to work in this field then nobody can stop you. There might be some short term hurdles however if you are focused enough and know your goals regarding where you want to see yourself after certain years, then you will definitely be successful in overcoming those hurdles.Read more
Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.Read more
As the name suggests, Conditional Probability is the probability of an event under some given condition. And based on the condition our sample space reduces to the conditional element.
For example, find the probability of a person subscribing for the insurance given that he has taken the house loan. Here sample space is restricted to the persons who have taken house loan.Read more
Probability in itself is a huge topic to study. Applications of probability are found everywhere whether it is medical science, share market trading, sports, gaming Industry and many more. However in this post my focus is on to explain the topics which are needed to understand data science and machine learning concepts.Read more
Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point. And Variance tells us the average distance of all data points from the mean point. Standard deviation is just the square root of the variance. As variance is calculated in squared unit (explained below in the post) and hence to come up a value having unit equal to the data points, we take square root of the variance and it is called as Standard Deviation.Read more
Principal Component Analysis or PCA is used for dimensionality reduction of the large data set. In my previous post A Complete Guide to Principal Component Analysis – PCA in Machine Learning , I have explained what is PCA and the complete concept behind the PCA technique. This post is in continuation of previous post, However if you have the basic understanding of how PCA works then you may continue else it is highly recommended to go through above mentioned post first.Read more
Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.Read more
The Science of collecting, organizing, presenting, analyzing and interpreting the data is statistics. It is one of the most important disciplines or methods to get a deeper insight into data. Statistical analysis is implemented to manipulate, summarize and investigate data so that useful information can be obtained.
Take away from this post:
- Types of Statistics: Descriptive vs Inferential
- Basic terminology like Population vs Sample
- Types of Variables: Numerical vs Categorical
- Measures of central tendencies: Mean, Median and Mode and their specific use cases
- Measures of dispersion/spread: Variance, standard deviation etc.