While working in regression analysis, you should be familiar with some very basic but very impactful concepts. In machine learning interviews, you can always expects questions from regression analysis. Regression analysis also develop the basic understanding of machine learning model building as we mostly start our machine learning journey from regression analysis only.
The ID3 algorithm can be used to construct a decision tree for regression type problems by replacing Information Gain with Standard Deviation Reduction – SDR
A decision tree is built top down from a root node and involves partitioning the data into subsets that contain instances with similar values mean homogeneous data.
Here, standard deviation is used to calculate the homogeneity of a numerical sample (target variable).
What is Covariance coefficient?
Covariance tells you whether two random variables vary with respect to each other or not. And if they vary together then whether they vary in same direction or in opposite direction with respect to each other. So if both random variables vary in same direction then we say it is positive covariance, however if they vary in opposite direction then it is negative covariance.
ROC AUC curve helps you to determine the threshold of binary classification problems in machine learning. In Machine Learning classification problems are based on the probability value and its not always correct to have the threshold as 0.5. It depends on the type and domain of the problem. For example in a legal case you don’t want the false positive to be high or it should be at least as possible. so the threshold in this case would be very high. the term AUC that is Area under curve tells us the model goodness of fit. It is used to do the comparative analysis between different classifiers and identify which one is performing good.
In the field of Machine Learning, logistic regression is still the top choice for classification problems. It is simple yet efficient algorithm which produces accurate models in most of the cases. In its basic form, it uses the logistic function to calculate the probability score which helps to classify the binary dependent variable to its respective class. Logistic regression is the transformed form of the linear regression. In this post I have explained the end to end step involved in the classification machine learning problems using the logistic regression and also performed the detailed analysis of the model output with various performance parameters.
Logistic regression is used for binary classification problem which has only two classes to predict. However with little extension and some human brain, it can easily be used for multi class classification problem. In this post I will be explaining about binary classification. I will also explain about the reason behind maximizing log likelihood function.