DataScience – Data Science Duniya

Conditional Probability with examples For Data Science

August 15, 2019 Ashutosh Tripathi

Conditional Probability helps Data Scientists to get better results from the given data set and for Machine Learning Engineers, it helps in building more accurate models for predictions.

Probability Basics for Data Science

August 12, 2019 Ashutosh Tripathi

Probability is used to predict the likelihood of the future event.

Statistics is used to analyse the past events

Also,

Probability tells us what will happen in a given ideal world?

While Statistics tells about how ideal is the world?

Probability is the basics of Inferential Statistics.

A Complete Guide to K-Nearest Neighbors Algorithm – KNN using Python

August 5, 2019 Ashutosh Tripathi

k-Nearest Neighbors or kNN algorithm is very easy and powerful Machine Learning algorithm. It can be used for both classification as well as regression that is predicting a continuous value. The very basic idea behind kNN is that it starts with finding out the k-nearest data points known as neighbors of the new data point for which we need to make the prediction. And then if it is regression then take the conditional mean of the neighbors y-value and that is the predicted value for new data point. If it is classification then it takes the mode (majority value) of the neighbors y value and that becomes the predicted class of the new data point.

A Complete Guide to Principal Component Analysis – PCA in Machine Learning

July 11, 2019 Ashutosh Tripathi

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.