Getting Started with spaCy

Introduction to NLP

NLP is a subfield of AI that analyzes text, speech, and other forms of human-generated language data. NLP makes it possible for computers to extract keywords and phrases, understand the intent of language, translate that to another language, or generate a response.

NLP can be used to create search engine algorithms, chatbots, speech recognition applications, and user sentiment recognition applications.

Why Python in NLP?

Python is typically used to work with text data because:

  • Python is easy to learn.
  • It allows for easier development of quick NLP application prototypes.
  • Python is one of the most popular languages. It has huge community support, and installing new libraries with pip is effortless.
  • A significant number of open-source NLP libraries are available in Python. Many machine learning libraries such as PyTorch, TensorFlow and Apache Spark also provide Python APIs.
  • String and file operations with Python are effortless and straightforward.

A high-level overview of spaCy

spaCy is an open-source Python library for modern NLP. spaCy is shipped with pre-trained language models with word vectors for 60+ languages and it focuses on production and shipping code. spaCy is designed specifically for production use and helps you build applications that process and “understand” large volumes of text. It can be used to build information extraction or natural language understanding systems or to pre-process text for deep learning.

Installing spaCy

You can install spaCy with the following command using pip:

$ pip install spacy

You can import spaCy to your code editor using the following code:

import spacy

Installing language models

spaCy’s trained pipelines can be installed as Python packages. This means that they’re a component of your application, just like any other module. They’re versioned and can be defined as a dependency in your requirements.txt. Trained pipelines can be installed from a download URL or a local directory, manually or via pip. Their data can be located anywhere on your file system.

$ python -m spacy download en_core_web_md

To download the exact package version

$ python -m spacy download en_core_web_sm-3.0.0 --direct

To load the mode use the following code:

import spacy
nlp = spacy.load('en_core_web_md')
doc = nlp('I have a ginger cat.')

Getting started with displaCy

Visualizing a dependency parse or named entities in a text is not only a fun NLP demo — it can also be incredibly helpful in speeding up development and debugging your code and training process.

https://explosion.ai/demos/displacy to use the interactive demo. The visualizer performs two syntactic parses, POS tagging and a dependency parse on the submitted text to visualize the sentence’s syntactic structure.

displaCy on Jupyter notebooks

from spacy import displacy
displacy.render(doc, style='dep')

Entity visualizer

displaCy’s entity visualizer highlights the named entities in your text. Visit https://explosion.ai/demos/displacy-ent for the online demo.

Use the following code to identify the entities in your text on a Jupyter notebook.

doc2 = nlp('Bill Gates is the CEO of Microsoft')
displacy.render(doc2, style='ent')

Github

You can find the Jupyter notebook for this article in the following link: https://github.com/jabirjamal/jabirjamal.com/tree/main/NLP/NLP-01

Reference