Morphological Analysis

English Analyzer

The English Analyzer takes an input token and its part-of-speech tag in the Penn Treebank style, and splits it into morphemes using inflection, derivation, and prefix rules.

  • Associated models: elit-morph-idprule-en
  • API reference: EnglishMorphAnalyzer
  • Supplementary documentation
  • Decode parameters:
    • derivation: True (default) or False
    • prefix: 0 (no prefix analysis; default), 1 (shortest preferred), 2 (longest preferred)

Web API

{"model": "elit_morph_lexrule_en", "args": {"derivation": true, "prefix": 0}}

Python API

from elit.structure import Document, Sentence, TOK, POS, MORPH
from elit.component import EnglishMorphAnalyzer

tokens = ['dramatized', 'ownerships', 'environmentalists', 'certifiable', 'realistically']
postags = ['VBD', 'NNS', 'NNS', 'JJ', 'RB']
doc = Document()
doc.add_sentence(Sentence({TOK: tokens, POS: postags}))

morph = EnglishMorphAnalyzer()
morph.decode([doc], derivation=True, prefix=0)
print(doc.sentences[0][MORPH])

Output

[
  [["drama", "NN"], ["+tic", "J_IC"], ["+ize", "V_IZE"], ["+d", "I_PST"]], 
  [["own", "VB"], ["+er", "N_ER"], ["+ship", "N_SHIP"], ["+s", "I_PLR"]], 
  [["environ", "VB"], ["+ment", "N_MENT"], ["+al", "J_AL"], ["+ist", "N_IST"], ["+s", "I_PLR"]], 
  [["cert", "NN"], ["+ify", "V_FY"], ["+iable", "J_ABLE"]], 
  [["real", "NN"], ["+ize", "V_IZE"], ["+stic", "J_IC"], ["+ally", "R_LY"]]
]