Decode with Python APIs

Once ELIT is installed, NLP components can be used to decode raw text into NLP structures.

Create NLP Tools

The followings show how to create NLP tools for 6 core components:

from elit.component import EnglishTokenizer
from elit.component import EnglishMorphAnalyzer
from elit.component import POSFlairTagger
from elit.component import NERFlairTagger
from elit.component import DEPBiaffineParser
from elit.component import SDPBiaffineParser

tok = EnglishTokenizer()
morph = EnglishMorphAnalyzer()
pos = POSFlairTagger()
ner = NERFlairTagger()
dep = DEPBiaffineParser()
sdp = SDPBiaffineParser()

tools = [tok, pos, morph, ner, dep, sdp]

Take a look at the individual tool page for more details about available components:

Import Models

All pre-trained models are publicly available in the ELIT’s S3 bucket. The followings show how to import models for pos, ner, dep, and sdp:

from elit.resources.pre_trained_models import ELIT_POS_FLAIR_EN_MIXED
from elit.resources.pre_trained_models import ELIT_NER_FLAIR_EN_ONTONOTES
from elit.resources.pre_trained_models import ELIT_DEP_BIAFFINE_EN_MIXED
from elit.resources.pre_trained_models import ELIT_SDP_BIAFFINE_EN_MIXED

pos.load(ELIT_POS_FLAIR_EN_MIXED)
ner.load(ELIT_NER_FLAIR_EN_ONTONOTES)
dep.load(ELIT_DEP_BIAFFINE_EN_MIXED)
sdp.load(ELIT_SDP_BIAFFINE_EN_MIXED)

The load function takes two parameters, model_path and model_root:

  • model_path indicates either the name of the model (e.g., elit_pos_flair_en_mixed_20190626) or a public URL to the model file compressed in the zip format (e.g., https://elit-models.s3-us-west-2.amazonaws.com/elit_pos_flair_en_mixed_20190626.zip).
  • model_root indicates the root directory in the local machine where all models are saved, and has the default value of ~/.elit/models/.
  • If model_path points to an URL, this function downloads the remote file and unzips it under the directory indicated by model_root, which will create a directory with the same model name (e.g., ~/.elit/models/elit_pos_flair_en_mixed_20190626/).
  • Each model directory has the configuration file, config.json, that may indicate dependencies to other models, in which case, it will recursively download all necessary models and unzip them under model_root (see Train with CLI for more details about how models are saved).

Prepare Raw Text

The followings show how to prepare raw text for decoding:

docs = [
    'Emory University is a private research university in Atlanta, Georgia. The university is ranked 21st nationally according to U.S. News.',
    'Emory University was founded in 1836 by the Methodist Episcopal Church. It was named in honor of John Emory who was a Methodist bishop.']

ELIT accepts a list of strings as input, where each string represents a document such that there are two documents in docs.

Decode with NLP Tools

Finally, the followings show how to decode the raw text with the NLP tools:

for tool in tools:
    docs = tool.decode(docs)

The decode function in the tokenizer takes a list of strings and returns a list of Document, whereas the decode functions in other models take a list of document objects and return a list of the same objects where the decoding results are added as distinct fields (see the NLP Output below).

All Together

The followings put all the codes together:

from elit.component import EnglishTokenizer
from elit.component import EnglishMorphAnalyzer
from elit.component import POSFlairTagger
from elit.component import NERFlairTagger
from elit.component import DEPBiaffineParser
from elit.component import SDPBiaffineParser

from elit.resources.pre_trained_models import ELIT_POS_FLAIR_EN_MIXED
from elit.resources.pre_trained_models import ELIT_NER_FLAIR_EN_ONTONOTES
from elit.resources.pre_trained_models import ELIT_DEP_BIAFFINE_EN_MIXED
from elit.resources.pre_trained_models import ELIT_SDP_BIAFFINE_EN_MIXED

tok = EnglishTokenizer()
morph = EnglishMorphAnalyzer()
pos = POSFlairTagger().load(ELIT_POS_FLAIR_EN_MIXED)
ner = NERFlairTagger().load(ELIT_NER_FLAIR_EN_ONTONOTES)
dep = DEPBiaffineParser().load(ELIT_DEP_BIAFFINE_EN_MIXED)
sdp = SDPBiaffineParser().load(ELIT_SDP_BIAFFINE_EN_MIXED)

tools = [tok, pos, morph, ner, dep, sdp]

docs = [
    'Emory University is a private research university in Atlanta, Georgia. The university is ranked 21st nationally according to U.S. News.',
    'Emory University was founded in 1836 by the Methodist Episcopal Church. It was named in honor of John Emory who was a Methodist bishop.']

for tool in tools:
    docs = tool.decode(docs)

print(docs)

NLP Output

The followings show the printed output of the above code:

To be filled

See the Formats page for more details about how the decoding results are added to Document.