Feature selection is a way to reduce the number of features and hence reduce the computational complexity of the model. Many times feature selection becomes very useful to overcome with overfitting problem. Feature selection helps us in determining the smallest set of features that are needed to predict the response variable with high accuracy. if we ask the model, does adding new features, necessarily increase the model performance significantly? if not then why to add those new features which are only going to increase model complexity.
The Coefficient of Determination is the measure of the variance in response variable ‘y’ that can be predicted using predictor variable ‘x’. It is the most common way to measure the strength of the model.
Linear Regression is a field of study which emphasizes on the statistical relationship between two continuous variables known as Predictor and Response variables. Predictor variable is most often denoted as x and also known as Independent variable. Response variable is most often denoted as y and also known as Dependent variable.
Covariance and Correlation are very helpful while understanding the relationship between two continuous variables. Covariance tells whether both variables vary in same direction (positive covariance) or in opposite direction (negative covariance). Whereas Correlation explains about the change in one variable leads how much proportion change in second variable.
In any business there are some easy to measure variables like : Age, Gender, Income, Education Level etc. and there are some difficult to measure