This week I continued work on the better looking output for the parsing program. In English it is fairly easy to recognize contractions because of the apostrophe but in other languages like Spanish contractions will just become one word. For example in Spanish when the masculine form of the word the, el, directly follows word for of or from, de, instead of de el it becomes del. While the parsing program can be trained to recognize this its output for these words is different so I had to modify the script to turn it into the brat annotation from last week. So by running an example sentence through this program and converting it to the brat annotation it looks like this: As you can see by the highlighted words de and el the parser was able to break apart the contraction into its base words from which it can determine their lemma. This is important for when the computer would need to do something with the input sentences like translate into another language.
0 Comments
|
|