spaCy is an open-source software library for advanced natural language processing in Python. It provides tools for tokenization, part-of-speech tagging, named entity recognition, dependency parsing, text classification, and more. spaCy is known for its speed, accuracy, and ease of use.
Here’s an example of how to use spaCy for basic natural language processing tasks:
import spacy
# load a language model
nlp = spacy.load("en_core_web_sm")
# process a text string
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
# print the entities found in the text
for ent in doc.ents:
print(ent.text, ent.label_)
# print the part-of-speech tags for each word in the text
for token in doc:
print(token.text, token.pos_)
In this code snippet, we load the English language model using the spacy.load
function. Then we process a text string using the nlp
object, which creates a Doc
object containing information about the text. We can then access various properties of the Doc
object, such as the named entities and part-of-speech tags.
Here are some examples of what the code above might output:
Apple ORG
U.K. GPE
$1 billion MONEY
Apple PROPN
is AUX
looking VERB
at ADP
buying VERB
U.K. PROPN
startup NOUN
for ADP
$ MONEY
1 NUM
billion NUM
Apple PROPN
is AUX
looking VERB
at ADP
buying VERB
U.K. PROPN
startup NOUN
for ADP
$ SYM
1 NUM
billion NUM
As you can see, spaCy correctly identifies “Apple” and “U.K.” as named entities and “buying” and “looking” as verbs. It also correctly identifies the part-of-speech tags for each word in the text.