Sentence Segmentation or Sentence Tokenization is the process of identifying different sentences among group of words. Spacy library designed for Natural Language Processing, perform the sentence segmentation with much higher accuracy. Spacy provides different models for different languages. In this post we’ll learn how sentence segmentation works, and how to set user defined segmentation rules.
Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into the standard categories such as person names, locations, organizations, time expressions, quantities, monetary values, percentage, codes etc. Spacy comes with an extremely fast statistical entity recognition system that assigns labels to contiguous spans of tokens.
Parts of Speech tagging is the next step of the tokenization. Once we have done tokenization, spaCy can parse and tag a given Doc. spaCy is pre-trained using statistical modelling. This model consists of binary data and is trained on enough examples to make predictions that generalize across the language. Example, a word following “the” in English is most likely a noun.
spaCy is designed specifically for production use. It helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning. In this article you will learn about Tokenization, Lemmatization, Stop Words and Phrase Matching operations using spaCy.
spaCy is an open-source Python library that parses and “understands” large volumes of text.
spaCy is the best way to prepare text for deep learning.
It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python’s awesome AI ecosystem.
With spaCy, you can easily construct linguistically sophisticated statistical models for a variety of NLP problems.
Data Science and Machine Learning Articles | Yearly round-up 2019
Boosting helps to improve the accuracy of any given machine learning algorithm. It is algorithm independent so we can apply it with any learning algorithms. It is not used to reduce the model variance.
Boosting involves many sequential
iterations to strengthen the model accuracy, hence it becomes computationally costly.
Ensemble Learning says, if we can build multiple models then why to select the best one why not top 2, again why not top 3 and why not top 10. Then if you find top 10 deploy all 10 models. And when new data comes, make a prediction from all 10 models and combine the predictions and finally make a joint prediction. This is the key idea of ensemble learning.
It does not matter how much subject knowledge you have, until you do not get a chance to show case it. It will be hidden somewhere within you. If you cannot market yourself well then you will always be lost in the crowd. So why am I talking about all these things. Imagine you are searching for a job and you have all the credentials required, however you are not getting shortlisted even. So what to do? Where is the problem? Are you marketing yourself well? Let’s discuss all these points in details.
It does not matter how much experience you have, actually anybody can start or switch to data science and machine learning. The only important this is, how much eager you are for it. What it means to you. If you are very much keen to work in this field then nobody can stop you. There might be some short term hurdles however if you are focused enough and know your goals regarding where you want to see yourself after certain years, then you will definitely be successful in overcoming those hurdles.
Lot of research is being done in medical field, where researchers are working to develop AI models which can even develop the “Sense of smell”.It will help medical field to detect illness by smelling the human’s breath.They have achieved great success in detecting chemicals called aldehydes. Aldehydes are associated with human illnesses and stress.
Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.
Conditional Probability helps Data Scientists to get better results from the given data set and for Machine Learning Engineers, it helps in building more accurate models for predictions.
Probability is used to predict the likelihood of the future event.
Statistics is used to analyse the past events
Probability tells us what will happen in a given ideal world?
While Statistics tells about how ideal is the world?
Probability is the basics of Inferential Statistics.
Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point.
k-Nearest Neighbors or kNN algorithm is very easy and powerful Machine Learning algorithm. It can be used for both classification as well as regression that is predicting a continuous value. The very basic idea behind kNN is that it starts with finding out the k-nearest data points known as neighbors of the new data point for which we need to make the prediction. And then if it is regression then take the conditional mean of the neighbors y-value and that is the predicted value for new data point. If it is classification then it takes the mode (majority value) of the neighbors y value and that becomes the predicted class of the new data point.
Principal Component Analysis or PCA is used for dimensionality reduction of the large data set. Using PCA we can speed-up the ML algorithms by reducing the feature spaces.
Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.
Logistic regression is used for binary classification problem which has only two classes to predict. However with little extension and some human brain, it can easily be used for multi class classification problem. In this post I will be explaining about binary classification. I will also explain about the reason behind maximizing log likelihood function.
Multicollinearity occurs in a multi linear model where we have more than one predictor variables. So Multicollinearity exist when we can linearly predict one predictor variable (note not the target variable) from other predictor variables with significant degree of accuracy. It means two or more predictor variables are highly correlated. But not the vice versa means if there is low correlation among predictors then also multicollinearity may exist.