spaCy is an open-source software library for advanced natural language processing in Python. It provides tools for tokenization, part-of-speech tagging, named entity recognition, dependency parsing, text classification, and more. spaCy is known for its speed, accuracy, and ease of use.
Here’s an example of how to use spaCy for basic natural language processing tasks:
# load a language model
nlp = spacy.load("en_core_web_sm")
# process a text string
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
# print the entities found in the text
for ent in doc.ents:
# print the part-of-speech tags for each word in the text
for token in doc:
In this code snippet, we load the English language model using the
spacy.load function. Then we process a text string using the
nlp object, which creates a
Doc object containing information about the text. We can then access various properties of the
Doc object, such as the named entities and part-of-speech tags.
Here are some examples of what the code above might output:
$1 billion MONEY
As you can see, spaCy correctly identifies “Apple” and “U.K.” as named entities and “buying” and “looking” as verbs. It also correctly identifies the part-of-speech tags for each word in the text.