Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point.
Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.
NLP can organize unstructured data and perform several automated tasks such as automatic summarization, sentiments analysis, speech recognition, etc.
Types of Statistics: Descriptive vs Inferential
Basic terminology like Population vs Sample
Types of Variables: Numerical vs Categorical
Measures of central tendencies: Mean, Median and Mode and their specific use cases
Measures of dispersion/spread: Variance, standard deviation etc.
The Coefficient of Determination is the measure of the variance in response variable ‘y’ that can be predicted using predictor variable ‘x’. It is the most common way to measure the strength of the model.
Storytelling or presenting insights is the most important part of data analytics. This is the selling point of all your hard work. Doesn’t matter how much hard work you have put in developing analytic model until you are able to get the attention of the target audience. Here in this particular article, my focus is on how we can use beautiful graphs to show the insights regarding employee attrition rate from IBM HR Attrition data. After all, a picture is worth to thousands of words.