Machine Learning Interview questions and answers part 2 | ML Faq

This post is part 2 in the series of frequently asked Machine Learning Interview Questions and Answers. Machine Learning Frequently asked Interview Questions and Answers Part 2

  1. What is Feature Scaling and why and where it is needed?
  2. Normalization vs Standardization
  3. What is the bias-variance trade-off?
  4. Define the Overfitting problem and why it occurs?
  5. What are the methods to avoid Overfitting in ML?
Machine Learning Interview Questions and Answers Part 2 | Data Science Duniya | Ashutosh Tripathi

1. What is Feature Scaling and why it is needed?

  • In general Data set contains different types of variables.
  • The significant issue with variables is that they might differ in terms of range of values.
  • So the feature with large range of values will start dominating against other variables.
  • Models could be biased towards those high ranged features.
  • So to overcome this problem, we do feature scaling.
  • The goal of applying Feature Scaling is to make sure features are on almost the same scale so that each feature is equally important and make it easier to process by most ML algorithms.
#CountryAgeSalaryPurchased
1France4473000No
2Spain2747000Yes
3Germany3053000Yes
4Spain3862000No
5Germany4057000No
6France3553000No
7Spain4878000Yes
Data Set for Feature Scaling | Data Science Duniya

Age Range: 27-48, Salary Range: 47000-78000

AlgorithmsReasons for applying feature scaling
K-meansUse Euclidean Distance measure
K-nearest neighborsMeasure the distance between pairs of samples and these distances are influenced by the measurement units
Principal Component Analysis (PCA)Get the features with maximum variance
Artificial Neural NetworkApply gradient descent
Gradient DescentTheta calculation becomes faster after feature scaling and the learning rate in the update equation of Stochastic gradient descent is the same for every parameter.

Note: If an algorithm is not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.).

Methods of feature scaling:

  1. Normalization
  2. Standardization

2. Normalization and Standardization

2.1 Normalization

  • Normalization is also known as min-max normalization or min-max scaling.
  • Normalization re scales values in the range of 0-1
AgeNormalized AgeSalaryNormalized Salary
440.80952381730000.838709677
270470000
300.142857143530000.193548387
380.523809524620000.483870968
400.619047619570000.322580645
350.380952381530000.193548387
481780001
Normalized Age and Salary Data | Data Science Duniya | Ashutosh Tripathi

2.2 Standardization

  • Standardization is also known as z-score Normalization.
  • In standardization, features are scaled to have zero-mean and one-standard-deviation.
  • It means after standardization features will have mean = 0 and standard deviation = 1.
AgeStandardized AgeSalaryStandardized Salary
440.954611636730001.197306616
27-1.51492716247000-1.278941158
30-1.07912619853000-0.707499364
380.083009708620000.149663327
400.37354368457000-0.326538168
35-0.35279125753000-0.707499364
481.535679589780001.673508111

Normalization vs Standardization

  • If you have outliers in your feature (column), normalizing your data will scale most of the data to a small interval, which means all features will have the same scale and hence it will not handle outliers well.
  • Standardization is more robust to outliers, and in many cases, it is preferable over Max-Min Normalization.
  • Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.
  • Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.

3. What is bias-variance trade-off?

3.1 Bias Error

  • Bias are the simplifying assumptions made by a model to make the target function easier to learn.
  • Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias.
  • Low Bias: Suggests less assumptions about the form of the target function.
  • High-Bias: Suggests more assumptions about the form of the target function.
  • Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
  • Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.

3.2 Variance Error

  • Variance error is the amount of change in the estimates of the target with the change in training data.
  • Logically model will have some variance however it should not change too much from one training data set to the other.
  • It means model should be good at picking out the hidden underlying mapping between the input and output variables.
  • ML algorithms which have high variance those are influenced by the specifics of the training data.
  • Low Variance: Suggests small changes to the estimate of the target function with changes to the training data set. Examples: Linear Regression, Linear Discriminant Analysis and Logistic Regression.
  • High Variance: Suggests large changes to the estimate of the target function with changes to the training data set. Examples: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
  • Generally, nonlinear machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use.

Bias-Variance Trade-Off

  • As we understood in previous slides, to have good predictive power or good estimates on target feature, algorithm should have low bias and low variance.
  • Linear machine learning algorithms often have a high bias but a low variance.
  • Nonlinear machine learning algorithms often have a low bias but a high variance.
  • The parameterization of machine learning algorithms is often a battle to balance out bias and variance.
  • If you increase the bias, it will start decreasing the variance.
  • And if you decrease the variance of model, it will increase the bias. •

Generally low bias and low variance is preferred.

Bias-variance Trade-off
bias-variance trade-off | Data Science Duniya
Source: http://scott.fortmann-roe.com/docs/BiasVariance.html

4. What is overfitting and why it occurs?

  • In one-liners, when the trained model is not able to generalize the learned behavior on the unseen data, then we say we have encountered the overfitting problem.
  • So what does it mean when we say generalize the learned behavior?
  • In machine learning, when the model performs well on training data but not giving the near to the same level of accuracy on test data then we say that it is the indication of an overfitted model.
  • One more case could be like, your model is performing well on both training as well as testing data but not performing well after the deployment on live data.
  • This you can name as the advance form of the overfitting. The reason for this advance level of overfitting is that when we split the original data into train and test they both have similar characteristics. And hence it performed almost similar in training and test data. But after deployment the real data differs in terms of the characteristics and hence due to overfitting, the model could not perform well.  
overfitting | Data Science Duniya

Reasons why Overfitting happens

  • When a statistical model describes random error or noise instead of underlying relationship ‘overfitting’ occurs. 
  • When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types.
  • In simple terms, when models start learning the noise instead of the actual characteristics then overfitting occurs.
  • The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

5. Different ways to avoid or overcome Overfitting

. Cross Validation

Split the training data into k-different samples and train the model into k-1 samples and test on the kth sample. Repeat it for all the k samples and in each iteration tune the model. 

2. Feature Selection

Apply the feature selection techniques like PCA so that it removes the irrelevant features and hence reduce the noise, one of the causes for overfitting.

3. Early Stopping

While tuning your model, do testing against the test data, and see if it generalizes well. Stop at the point where it starts increasing the error on the test data.

4. Regularization

Regularization works by assigning different weights to the features based on their importance and impact on the problem we are solving. It also adds the penalty term to the cost function to manage the data bias.

5. Ensemble Techniques

  • Bagging
  • Boosting

Recommended Articles:

5 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.