Data Science and Machine Learning Articles | Yearly round-up 2019

# Category: Data Science

## What is Boosting in Ensemble Learning

Boosting helps to improve the accuracy of any given machine learning algorithm. It is algorithm independent so we can apply it with any learning algorithms. It is not used to reduce the model variance.

Boosting involves many sequential

iterations to strengthen the model accuracy, hence it becomes computationally costly.

## What is Bagging in Ensemble Learning

Ensemble Learning says, if we can build multiple models then why to select the best one why not top 2, again why not top 3 and why not top 10. Then if you find top 10 deploy all 10 models. And when new data comes, make a prediction from all 10 models and combine the predictions and finally make a joint prediction. This is the key idea of ensemble learning.

## How to start career in Data Science and Machine Learning

It does not matter how much experience you have, actually anybody can start or switch to data science and machine learning. The only important this is, how much eager you are for it. What it means to you. If you are very much keen to work in this field then nobody can stop you. There might be some short term hurdles however if you are focused enough and know your goals regarding where you want to see yourself after certain years, then you will definitely be successful in overcoming those hurdles.

## Bayes’ Theorem with Example for Data Science Professionals

Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.

## Conditional Probability with examples For Data Science

Conditional Probability helps Data Scientists to get better results from the given data set and for Machine Learning Engineers, it helps in building more accurate models for predictions.

## Probability Basics for Data Science

Probability is used to predict the likelihood of the future event.

Statistics is used to analyse the past events

Also,

Probability tells us what will happen in a given ideal world?

While Statistics tells about how ideal is the world?

Probability is the basics of Inferential Statistics.

## Variance, Standard Deviation and Other Measures of Variability and Spread

Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point.

## Step by Step Approach to Principal Component Analysis using Python

Principal Component Analysis or PCA is used for dimensionality reduction of the large data set. Using PCA we can speed-up the ML algorithms by reducing the feature spaces.

## A Complete Guide to Principal Component Analysis – PCA in Machine Learning

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.

## Basic Statistics for Data Science – Part 1

Types of Statistics: Descriptive vs Inferential

Basic terminology like Population vs Sample

Types of Variables: Numerical vs Categorical

Measures of central tendencies: Mean, Median and Mode and their specific use cases

Measures of dispersion/spread: Variance, standard deviation etc.