Author Archives: Ashutosh Tripathi

5 Ways Data Science Can Fuel Business Growth

Data science and big data are no longer strange terms that are limited to techie vocabulary. In fact, the continuous expansion of the digital world has made data crucial for businesses to grow and succeed. This is why both terms have become more commonly used in industries other than IT.

Read more

How to Perform Sentence Segmentation or Sentence Tokenization using spaCy | NLP Series | Part 5

Sentence Segmentation or Sentence Tokenization is the process of identifying different sentences among group of words. Spacy library designed for Natural Language Processing, perform the sentence segmentation with much higher accuracy. However, lets first talk about, how we as a human identify the start and end of the sentence? Mostly with the help of the punctuation, right? And in most of the cases we say a sentence ends with a dot ‘.’ character. So with this basic idea, we would say that we can split the string based on dot and get the different sentences. Do you think this logic would be enough to get all sentence tokens?

Read more

Named Entity Recognition NER using spaCy | NLP | Part 4

Named Entity Recognition is the most important or I would say the starting step in Information Retrieval. Information Retrieval is the technique to extract important and useful information from unstructured raw text documents. Named Entity Recognition NER works by locating and identifying the named entities present in unstructured text into the standard categories such as person names, locations, organizations, time expressions, quantities, monetary values, percentage, codes etc. Spacy comes with an extremely fast statistical entity recognition system that assigns labels to contiguous spans of tokens.

Read more

Parts of Speech Tagging and Dependency Parsing using spaCy | NLP | Part 3

Parts of Speech tagging is the next step of the tokenization. Once we have done tokenization, spaCy can parse and tag a given Doc. spaCy is pre-trained using statistical modelling. This model consists of binary data and is trained on enough examples to make predictions that generalize across the language. Example, a word following “the” in English is most likely a noun.

Read more

A Quick Guide to Tokenization, Lemmatization, Stop Words, and Phrase Matching using spaCy | NLP | Part 2

spaCy” is designed specifically for production use. It helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning. In this article you will learn about Tokenization, Lemmatization, Stop Words and Phrase Matching operations using spaCy.

Read more

Spacy Installation and Basic Operations | NLP Text Processing Library | Part 1

What is Spacy

spaCy is an open-source Python library that parses and “understands” large volumes of text.

(You can download the complete Notebook from here.)

Read more

Data Science and Machine Learning Articles | Yearly round-up 2019

Guys, I have consolidated all my ML and DS articles. In case you have missed it, here are the links in one place.

Principal-component-analysis-pca-in-machine-learning

PCA in ML
Read more

What is Boosting in Ensemble Learning

In the last post, we have discussed the Bagging technique and learnt how Bagging helps us in reducing the model variance. In this post, we will learn one more technique of Ensemble learning which is Boosting. So let me ask you a question. Suppose you have tried all the possible models and none of them performing as expected. So now what you will do? I will go with Boosting. Got the point?

Read more

What is Bagging in Ensemble Learning

In general, any of the machine learning problems we try to find the best possible optimal model for a given problem. That means finding the best possible model within the given model family, for example, finding the best possible decision tree or finding the best possible KNN model. And if we have more time then we can try all model families available, and come up with the best possible regression model, best possible KNN model, best possible SVM model etc. And among these again select the best possible model, which will be either KNN, SVM or any other.

Read more

How to create an Impressive Resume | Make it Simple

It does not matter how much subject knowledge you have, until you do not get a chance to show case it. It will be hidden somewhere within you. If you cannot market yourself well then you will always be lost in the crowd. So why am I talking about all these things. Imagine you are searching for a job and you have all the credentials required, however you are not getting shortlisted even. So what to do? Where is the problem? Are you marketing yourself well? Let’s discuss all these points in details.

Read more

How to start career in Data Science and Machine Learning

It does not matter how much experience you have, actually anybody can start or switch to data science and machine learning. The only important this is, how much eager you are for it. What it means to you. If you are very much keen to work in this field then nobody can stop you. There might be some short term hurdles however if you are focused enough and know your goals regarding where you want to see yourself after certain years, then you will definitely be successful in overcoming those hurdles.

Read more

Do you know AI can develop a “Sense of Smell” which can Detect Illnesses From Human Breath

Lot of research is being done in medical field, where researchers are working to develop AI models which can even develop the “Sense of smell”.

It will help medical field to detect illness by smelling the human’s breath. They have achieved great success in detecting chemicals called aldehydes. Aldehydes are associated with human illnesses and stress. It is also helpful in detecting cancer, diabetes, brain injuries by detecting the “woody, musky odor” emitted from Parkinson’s disease even before any other symptoms are identified. Artificially intelligent bots could identify gas leaks or other caustic chemicals, as well. IBM is even using AI to develop new perfumes.

For complete Article Please refer the link

If you are an aspiring data scientist or an experienced professional who is trying to make his career in Data Science, then you must visit E-network. Where we focus on high-quality interactive mock interview sessions and help you to Quick-start your Data Science and Machine Learning journey by Preparing a learning road-map, providing study material, suggesting Best training institutes and provide practice problems with their solutions and many more…

Feel free to contact us for more details and discussions.

Bayes’ Theorem with Example for Data Science Professionals

Bayes Theorem is the extension of Conditional probability. Conditional probability helps us to determine the probability of A given B, denoted by P(A|B). So Bayes’ theorem says if we know P(A|B) then we can determine P(B|A), given that P(A) and P(B) are known to us.

Read more

Conditional Probability with examples For Data Science

As the name suggests, Conditional Probability is the probability of an event under some given condition. And based on the condition our sample space reduces to the conditional element.

For example, find the probability of a person subscribing for the insurance given that he has taken the house loan. Here sample space is restricted to the persons who have taken house loan.

Read more

Probability Basics for Data Science

Probability in itself is a huge topic to study. Applications of probability are found everywhere whether it is medical science, share market trading, sports, gaming Industry and many more. However in this post my focus is on to explain the topics which are needed to understand data science and machine learning concepts.

Read more

Variance, Standard Deviation and Other Measures of Variability and Spread

Variance and Standard Deviation are the most commonly used measures of variability and spread. Variability and spread are nothing but the process to know how much data is being varying from the mean point. And Variance tells us the average distance of all data points from the mean point. Standard deviation is just the square root of the variance. As variance is calculated in squared unit (explained below in the post) and hence to come up a value having unit equal to the data points, we take square root of the variance and it is called as Standard Deviation.

Read more

A Complete Guide to K-Nearest Neighbors Algorithm – KNN using Python

k-Nearest Neighbors or kNN algorithm is very easy and powerful Machine Learning algorithm. It can be used for both classification as well as regression that is predicting a continuous value. The very basic idea behind kNN is that it starts with finding out the k-nearest data points known as neighbors of the new data point for which we need to make the prediction. And then if it is regression then take the conditional mean of the neighbors y-value and that is the predicted value for new data point. If it is classification then it takes the mode (majority value) of the neighbors y value and that becomes the predicted class of the new data point.

Read more

Step by Step Approach to Principal Component Analysis using Python

Principal Component Analysis or PCA is used for dimensionality reduction of the large data set. In my previous post A Complete Guide to Principal Component Analysis – PCA in Machine Learning , I have explained what is PCA and the complete concept behind the PCA technique. This post is in continuation of previous post, However if you have the basic understanding of how PCA works then you may continue else it is highly recommended to go through above mentioned post first.

Read more

A Complete Guide to Principal Component Analysis – PCA in Machine Learning

Principal Component Analysis or PCA is a widely used technique for dimensionality reduction of the large data set. Reducing the number of components or features costs some accuracy and on the other hand, it makes the large data set simpler, easy to explore and visualize. Also, it reduces the computational complexity of the model which makes machine learning algorithms run faster. It is always a question and debatable how much accuracy it is sacrificing to get less complex and reduced dimensions data set. we don’t have a fixed answer for this however we try to keep most of the variance while choosing the final set of components.

Read more

What is Logistic Regression?

Logistic regression is the most widely used machine learning algorithm for classification problems. In its original form it is used for binary classification problem which has only two classes to predict. However with little extension and some human brain, logistic regression can easily be used for multi class classification problem. In this post I will be explaining about binary classification. I will also explain about the reason behind maximizing log likelihood function.

Read more
« Older Entries