In the field of Machine Learning, logistic regression is still the top choice for classification problems. It is simple yet efficient algorithm which produces accurate models in most of the cases. In its basic form, it uses the logistic function to calculate the probability score which helps to classify the binary dependent variable to its respective class. Logistic regression is the transformed form of the linear regression. In this post I have explained the end to end step involved in the classification machine learning problems using the logistic regression and also performed the detailed analysis of the model output with various performance parameters.
In this post, we will develop a classification model where we’ll try to classify the movie reviews on positive and negative classes. I have used different machine learning algorithm to train the model and compared the accuracy of those models at the end. you can keep this post as a template to use various machine learning algorithms in python for text classification.
At the end we will validate the model by passing a random review to the trained model and understand the output class predicted by the model. You will learn how to create and use the pipeline for numerical feature extraction and model training together as a one function.
Word vectors – also called word embeddings – are mathematical descriptions of individual words such that words that appear frequently together in the language will have similar values. In this way we can mathematically derive context. As mentioned above, the word vector for “lion” will be closer in value to “cat” than to “dandelion”.
Data Visualization is one of the important activity we perform when doing Exploratory Data Analysis. It helps in preparing business reports, visual dashboards, story telling etc important tasks. In this post I have explained how to ask questions from the data and in return get the self explanatory graphs. In this You will learn the use of various python libraries like plotly, matplotlib, seaborn, squarify etc to plot those graphs.
Data science and big data are no longer strange terms that are limited to techie vocabulary. In fact, the continuous expansion of the digital world has made data crucial for businesses to grow and succeed. This is why both terms have become more commonly used in industries other than IT.
Sentence Segmentation or Sentence Tokenization is the process of identifying different sentences among group of words. Spacy library designed for Natural Language Processing, perform the sentence segmentation with much higher accuracy. Spacy provides different models for different languages. In this post we’ll learn how sentence segmentation works, and how to set user defined segmentation rules.