Data Visualization is one of the important activity we perform when doing Exploratory Data Analysis. It helps in preparing business reports, visual dashboards, story telling etc important tasks. In this post I have explained how to ask questions from the data and in return get the self explanatory graphs. In this You will learn the use of various python libraries like plotly, matplotlib, seaborn, squarify etc to plot those graphs.
Data Science and Machine Learning Articles | Yearly round-up 2019
Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point.
Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.
NLP can organize unstructured data and perform several automated tasks such as automatic summarization, sentiments analysis, speech recognition, etc.
Types of Statistics: Descriptive vs Inferential
Basic terminology like Population vs Sample
Types of Variables: Numerical vs Categorical
Measures of central tendencies: Mean, Median and Mode and their specific use cases
Measures of dispersion/spread: Variance, standard deviation etc.