Machine Learning Interview questions and answers part 2

This post is part 2 in the series of frequently asked Machine Learning Interview Questions and Answers. Machine Learning Frequently asked Interview Questions and Answers Part 2

What is Feature Scaling and why and where it is needed?
Normalization vs Standardization
What is the bias-variance trade-off?
Define the Overfitting problem and why it occurs?
What are the methods to avoid Overfitting in ML?

Machine Learning Interview Questions and Answers Part 2 | Data Science Duniya | Ashutosh Tripathi

1. What is Feature Scaling and why it is needed?

In general Data set contains different types of variables.
The significant issue with variables is that they might differ in terms of range of values.
So the feature with large range of values will start dominating against other variables.
Models could be biased towards those high ranged features.
So to overcome this problem, we do feature scaling.
The goal of applying Feature Scaling is to make sure features are on almost the same scale so that each feature is equally important and make it easier to process by most ML algorithms.

#	Country	Age	Salary	Purchased
1	France	44	73000	No
2	Spain	27	47000	Yes
3	Germany	30	53000	Yes
4	Spain	38	62000	No
5	Germany	40	57000	No
6	France	35	53000	No
7	Spain	48	78000	Yes

Data Set for Feature Scaling | Data Science Duniya

Age Range: 27-48, Salary Range: 47000-78000

Algorithms	Reasons for applying feature scaling
K-means	Use Euclidean Distance measure
K-nearest neighbors	Measure the distance between pairs of samples and these distances are influenced by the measurement units
Principal Component Analysis (PCA)	Get the features with maximum variance
Artificial Neural Network	Apply gradient descent
Gradient Descent	Theta calculation becomes faster after feature scaling and the learning rate in the update equation of Stochastic gradient descent is the same for every parameter.

Note: If an algorithm is not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.).

Methods of feature scaling:

Normalization
Standardization

2. Normalization and Standardization

2.1 Normalization

Normalization is also known as min-max normalization or min-max scaling.
Normalization re scales values in the range of 0-1

Age	Normalized Age	Salary	Normalized Salary
44	0.80952381	73000	0.838709677
27	0	47000	0
30	0.142857143	53000	0.193548387
38	0.523809524	62000	0.483870968
40	0.619047619	57000	0.322580645
35	0.380952381	53000	0.193548387
48	1	78000	1

Normalized Age and Salary Data | Data Science Duniya | Ashutosh Tripathi

2.2 Standardization

Standardization is also known as z-score Normalization.
In standardization, features are scaled to have zero-mean and one-standard-deviation.
It means after standardization features will have mean = 0 and standard deviation = 1.

Age	Standardized Age	Salary	Standardized Salary
44	0.954611636	73000	1.197306616
27	-1.514927162	47000	-1.278941158
30	-1.079126198	53000	-0.707499364
38	0.083009708	62000	0.149663327
40	0.373543684	57000	-0.326538168
35	-0.352791257	53000	-0.707499364
48	1.535679589	78000	1.673508111

Normalization vs Standardization

If you have outliers in your feature (column), normalizing your data will scale most of the data to a small interval, which means all features will have the same scale and hence it will not handle outliers well.
Standardization is more robust to outliers, and in many cases, it is preferable over Max-Min Normalization.
Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.
Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.

3. What is bias-variance trade-off?

3.1 Bias Error

Bias are the simplifying assumptions made by a model to make the target function easier to learn.
Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias.
Low Bias: Suggests less assumptions about the form of the target function.
High-Bias: Suggests more assumptions about the form of the target function.
Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.

3.2 Variance Error

Variance error is the amount of change in the estimates of the target with the change in training data.
Logically model will have some variance however it should not change too much from one training data set to the other.
It means model should be good at picking out the hidden underlying mapping between the input and output variables.
ML algorithms which have high variance those are influenced by the specifics of the training data.
Low Variance: Suggests small changes to the estimate of the target function with changes to the training data set. Examples: Linear Regression, Linear Discriminant Analysis and Logistic Regression.
High Variance: Suggests large changes to the estimate of the target function with changes to the training data set. Examples: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Generally, nonlinear machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use.

Bias-Variance Trade-Off

As we understood in previous slides, to have good predictive power or good estimates on target feature, algorithm should have low bias and low variance.
Linear machine learning algorithms often have a high bias but a low variance.
Nonlinear machine learning algorithms often have a low bias but a high variance.
The parameterization of machine learning algorithms is often a battle to balance out bias and variance.
If you increase the bias, it will start decreasing the variance.
And if you decrease the variance of model, it will increase the bias. •

Generally low bias and low variance is preferred.

bias-variance trade-off | Data Science Duniya — Source: http://scott.fortmann-roe.com/docs/BiasVariance.html

4. What is overfitting and why it occurs?

In one-liners, when the trained model is not able to generalize the learned behavior on the unseen data, then we say we have encountered the overfitting problem.
So what does it mean when we say generalize the learned behavior?
In machine learning, when the model performs well on training data but not giving the near to the same level of accuracy on test data then we say that it is the indication of an overfitted model.
One more case could be like, your model is performing well on both training as well as testing data but not performing well after the deployment on live data.
This you can name as the advance form of the overfitting. The reason for this advance level of overfitting is that when we split the original data into train and test they both have similar characteristics. And hence it performed almost similar in training and test data. But after deployment the real data differs in terms of the characteristics and hence due to overfitting, the model could not perform well.

Reasons why Overfitting happens

When a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs.
When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types.
In simple terms, when models start learning the noise instead of the actual characteristics then overfitting occurs.
The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

5. Different ways to avoid or overcome Overfitting

. Cross Validation

Split the training data into k-different samples and train the model into k-1 samples and test on the kth sample. Repeat it for all the k samples and in each iteration tune the model.

2. Feature Selection

Apply the feature selection techniques like PCA so that it removes the irrelevant features and hence reduce the noise, one of the causes for overfitting.

3. Early Stopping

While tuning your model, do testing against the test data, and see if it generalizes well. Stop at the point where it starts increasing the error on the test data.

4. Regularization

Regularization works by assigning different weights to the features based on their importance and impact on the problem we are solving. It also adds the penalty term to the cost function to manage the data bias.

5. Ensemble Techniques

Bagging
Boosting

Machine Learning Interview questions and answers part 2 | ML Faq