Parts of Speech tagging is the next step of the Tokenization. Once we have done Tokenization, spaCy can parse and tag a given Doc. spaCy is pre-trained using statistical modelling. This model consists of binary data and is trained on enough examples to make predictions that generalize across the language. Example, a word following “the” in English is most likely a noun.
It is always challenging to find the correct parts of speech due to the following reasons:
- Enabling machine to understand and process raw text is not easy.
- Same word plays differently in different context of a sentence.
- Sometime words which are completely different, tells almost the same meaning.
- Even splitting text into useful word-like units can be difficult in many languages.
- While it’s possible to solve some problems starting from only the raw characters, it’s usually better to use linguistic knowledge to add useful information.
- That’s exactly what spaCy is designed to do: you put in raw text, and get back a Doc object, that comes with a variety of annotations.
Reference spacy.io/
This is the Part 3 of NLP spaCy Series of articles. you can find the first two parts in the below links:
Part 1: spacy-installation-and-basic-operations-nlp-text-processing-library
Part 2: guide-to-tokenization-lemmatization-stop-words-and-phrase-matching-using-spacy
In this section we’ll cover coarse POS tags (noun, verb, adjective), fine-grained tags (plural noun, past-tense verb, superlative adjective and Dependency Parsing and Visualization of dependency Tree.
Coarse-grained Part-of-speech Tags
Every token is assigned a POS Tag from the following list:
POS | DESCRIPTION | EXAMPLES |
---|---|---|
ADJ | adjective | big, old, green, incomprehensible, first |
ADP | adposition | in, to, during |
ADV | adverb | very, tomorrow, down, where, there |
AUX | auxiliary | is, has (done), will (do), should (do) |
CONJ | conjunction | and, or, but |
CCONJ | coordinating conjunction | and, or, but |
DET | determiner | a, an, the |
INTJ | interjection | psst, ouch, bravo, hello |
NOUN | noun | girl, cat, tree, air, beauty |
NUM | numeral | 1, 2017, one, seventy-seven, IV, MMXIV |
PART | particle | ‘s, not, |
PRON | pronoun | I, you, he, she, myself, themselves, somebody |
PROPN | proper noun | Mary, John, London, NATO, HBO |
PUNCT | punctuation | ., (, ), ? |
SCONJ | subordinating conjunction | if, while, that |
SYM | symbol | $, %, §, ©, +, −, ×, ÷, =, :), 😝 |
VERB | verb | run, runs, running, eat, ate, eating |
X | other | sfpksdpsxmsa |
SPACE | space |
Fine-grained Part-of-speech Tags
Tokens are subsequently given a fine-grained tag as determined by morphology:
POS | Description | Fine-grained Tag | Description | Morphology |
---|---|---|---|---|
ADJ | adjective | AFX | affix | Hyph=yes |
ADJ | JJ | adjective | Degree=pos | |
ADJ | JJR | adjective, comparative | Degree=comp | |
ADJ | JJS | adjective, superlative | Degree=sup | |
ADJ | PDT | predeterminer | AdjType=pdt PronType=prn | |
ADJ | PRP$ | pronoun, possessive | PronType=prs Poss=yes | |
ADJ | WDT | wh-determiner | PronType=int rel | |
ADJ | WP$ | wh-pronoun, possessive | Poss=yes PronType=int rel | |
ADP | adposition | IN | conjunction, subordinating or preposition | |
ADV | adverb | EX | existential there | AdvType=ex |
ADV | RB | adverb | Degree=pos | |
ADV | RBR | adverb, comparative | Degree=comp | |
ADV | RBS | adverb, superlative | Degree=sup | |
ADV | WRB | wh-adverb | PronType=int rel | |
CONJ | conjunction | CC | conjunction, coordinating | ConjType=coor |
DET | determiner | DT | determiner | |
INTJ | interjection | UH | interjection | |
NOUN | noun | NN | noun, singular or mass | Number=sing |
NOUN | NNS | noun, plural | Number=plur | |
NOUN | WP | wh-pronoun, personal | PronType=int rel | |
NUM | numeral | CD | cardinal number | NumType=card |
PART | particle | POS | possessive ending | Poss=yes |
PART | RP | adverb, particle | ||
PART | TO | infinitival to | PartType=inf VerbForm=inf | |
PRON | pronoun | PRP | pronoun, personal | PronType=prs |
PROPN | proper noun | NNP | noun, proper singular | NounType=prop Number=sign |
PROPN | NNPS | noun, proper plural | NounType=prop Number=plur | |
PUNCT | punctuation | -LRB- | left round bracket | PunctType=brck PunctSide=ini |
PUNCT | -RRB- | right round bracket | PunctType=brck PunctSide=fin | |
PUNCT | , | punctuation mark, comma | PunctType=comm | |
PUNCT | : | punctuation mark, colon or ellipsis | ||
PUNCT | . | punctuation mark, sentence closer | PunctType=peri | |
PUNCT | ” | closing quotation mark | PunctType=quot PunctSide=fin | |
PUNCT | “” | closing quotation mark | PunctType=quot PunctSide=fin | |
PUNCT | opening quotation mark | PunctType=quot PunctSide=ini | ||
PUNCT | HYPH | punctuation mark, hyphen | PunctType=dash | |
PUNCT | LS | list item marker | NumType=ord | |
PUNCT | NFP | superfluous punctuation | ||
SYM | symbol | # | symbol, number sign | SymType=numbersign |
SYM | $ | symbol, currency | SymType=currency | |
SYM | SYM | symbol | ||
VERB | verb | BES | auxiliary “be” | |
VERB | HVS | forms of “have” | ||
VERB | MD | verb, modal auxiliary | VerbType=mod | |
VERB | VB | verb, base form | VerbForm=inf | |
VERB | VBD | verb, past tense | VerbForm=fin Tense=past | |
VERB | VBG | verb, gerund or present participle | VerbForm=part Tense=pres Aspect=prog | |
VERB | VBN | verb, past participle | VerbForm=part Tense=past Aspect=perf | |
VERB | VBP | verb, non-3rd person singular present | VerbForm=fin Tense=pres | |
VERB | VBZ | verb, 3rd person singular present | VerbForm=fin Tense=pres Number=sing Person=3 | |
X | other | ADD | ||
X | FW | foreign word | Foreign=yes | |
X | GW | additional word in multi-word expression | ||
X | XX | unknown | ||
SPACE | space | _SP | space | |
NIL | missing tag |

View token tags
Recall Tokenization We can obtain a particular token by its index position.
- To view the coarse POS tag use token.pos_
- To view the fine-grained tag use token.tag_
- To view the description of either type of tag use spacy.explain(tag)
spaCy encodes all strings to hash values to reduce memory usage and improve efficiency. So to get the readable string representation of an attribute, we need to add an underscore _ to its name: Note that token.pos and token.tag return integer hash values; by adding the underscores we get the text equivalent that lives in doc.vocab.

Note: In the above example to format the representation I have added: {10} this is nothing but to give spacing between each token. Just to have better look and feel. No other specific reason. This count start from the first character of the token. You can add any number instead of {10} to have spacing as you wish.
Working with POS Tags
- In the English language, it is very common that the same string of characters can have different meanings, even within the same sentence.
- For this reason, morphology is important.
- spaCy uses machine learning algorithms to best predict the use of a token in a sentence.
- Is “I read books on NLP” present or past tense?
- Is wind a verb or a noun?
Let’s understand all this with the help of below examples.

In the first example, spaCy assumed that read was Present Tense.
In the second example the present tense form would be I am reading a book, so spaCy assigned the past tense.
Counting POS Tags
The Doc.count_by()
method accepts a specific token attribute as its argument, and returns a frequency count of the given attribute as a dictionary object. Keys in the dictionary are the integer values of the given attribute ID, and values are the frequency. Counts of zero are not included.

It means tag which has key as 96 is appeared only once and ta with key as 83 has appeared three times in the sentence. This isn’t very helpful until you decode the attribute ID:

Create a frequency list of POS tags from the entire document
Since POS_counts
returns a dictionary, we can obtain a list of keys with POS_counts.items()
.
By sorting the list we have access to the tag and its count, in order.

k contains the key number of the tag and v contains the frequency number.
Counting fine-grained Tag

Why did the ID numbers get so big? In spaCy, certain text values are hardcoded into Doc.vocab and take up the first several hundred ID numbers. Strings like ‘NOUN’ and ‘VERB’ are used frequently by internal operations. Others, like fine-grained tags, are assigned hash values as needed.
Why don’t SPACE tags appear? In spaCy, only strings of spaces (two or more) are assigned tokens. Single spaces are not.
Fine-grained POS Tag Examples¶
These are some grammatical examples (shown in bold) of specific fine-grained tags. We’ve removed punctuation and rarely used tags:
POS | TAG | DESCRIPTION | EXAMPLE |
---|---|---|---|
ADJ | AFX | affix | The Flintstones were a pre-historic family. |
ADJ | JJ | adjective | This is a good sentence. |
ADJ | JJR | adjective, comparative | This is a better sentence. |
ADJ | JJS | adjective, superlative | This is the best sentence. |
ADJ | PDT | predeterminer | Waking up is half the battle. |
ADJ | PRP$ | pronoun, possessive | His arm hurts. |
ADJ | WDT | wh-determiner | It’s blue, which is odd. |
ADJ | WP$ | wh-pronoun, possessive | We don’t know whose it is. |
ADP | IN | conjunction, subordinating or preposition | It arrived in a box. |
ADV | EX | existential there | There is cake. |
ADV | RB | adverb | He ran quickly. |
ADV | RBR | adverb, comparative | He ran quicker. |
ADV | RBS | adverb, superlative | He ran fastest. |
ADV | WRB | wh-adverb | When was that? |
CONJ | CC | conjunction, coordinating | The balloon popped and everyone jumped. |
DET | DT | determiner | This is a sentence. |
INTJ | UH | interjection | Um, I don’t know. |
NOUN | NN | noun, singular or mass | This is a sentence. |
NOUN | NNS | noun, plural | These are words. |
NOUN | WP | wh-pronoun, personal | Who was that? |
NUM | CD | cardinal number | I want three things. |
PART | POS | possessive ending | Fred‘s name is short. |
PART | RP | adverb, particle | Put it back! |
PART | TO | infinitival to | I want to go. |
PRON | PRP | pronoun, personal | I want you to go. |
PROPN | NNP | noun, proper singular | Kilroy was here. |
PROPN | NNPS | noun, proper plural | The Flintstones were a pre-historic family. |
VERB | MD | verb, modal auxiliary | This could work. |
VERB | VB | verb, base form | I want to go. |
VERB | VBD | verb, past tense | This was a sentence. |
VERB | VBG | verb, gerund or present participle | I am going. |
VERB | VBN | verb, past participle | The treasure was lost. |
VERB | VBP | verb, non-3rd person singular present | I want to go. |
VERB | VBZ | verb, 3rd person singular present | He wants to go. |
Dependency Parsing
- Dependency parsing is the process of extracting the dependencies of a sentence to represent its grammatical structure.
- It defines the dependency relationship between headwords and their dependents.
- The head of a sentence has no dependency and is called the root of the sentence.
- The verb is usually the head of the sentence. All other words are linked to the headword.
The dependencies can be mapped in a directed graph representation:
- Words are the nodes.
- The grammatical relationships are the edges.
- Dependency parsing helps you know what role a word plays in the text and how different words relate to each other.
- It’s also used in shallow parsing and named entity recognition.

Here we’ve shown spacy.attrs.POS
, spacy.attrs.TAG
and spacy.attrs.DEP
.
Visualizing Parts of Speech
spaCy offers an outstanding visualizer called displaCy:


The dependency parse shows the coarse POS tag for each token, as well as the dependency tag if given:

Handling Large Text
displacy.serve()
accepts a single Doc or list of Doc objects. Since large texts are difficult to view in one line, you may want to pass a list of spans instead. Each span will appear on its own line:

Customizing the Appearance
Besides setting the distance between tokens, you can pass other arguments to the options parameter:
NAME | TYPE | DESCRIPTION | DEFAULT |
compact | bool | “Compact mode” with square arrows that takes up less space. | False |
color | unicode | Text color (HEX, RGB or color names). | #000000 |
bg | unicode | Background color (HEX, RGB or color names). | #ffffff |
font | unicode | Font name or font family for all text. | Arial |
For a full list of options visit https://spacy.io/api/top-level#displacy_options

This is all about text Parts of Speech Tagging using spaCy. Hope you enjoyed the post.
Thank You!
Post Credit: Jose Portila Udemy Video
4 comments