This week I started taking a look at language groups in order to prepare for the surprise languages provided later in the project. Specifically I was looking at the relationships between words in the same language groups. These relationships are based on the universal dependencies format some of which are recognizable like direct object and indirect object while others are ones you probably didn't learn in school like discourse elements or xcomp (open clausal complement). I looked at Romance, Germanic, and Slavic languages. In the Romance languages the case relationship was high for instance. Case generally is used when their is a possessive, preposition or a nominal element. One possible reason for this is that the Romance languages have words to shown possession and sill act as prepositions like the word "de" in Spanish. This is all to better be able to deal with languages that we don't get training data for. This is because by being able to group the language with others that have similar characteristics would greatly improve the accuracy of parsing those languages. Next week I will continue this and look at what parts of speech are linked by these relations.
0 Comments
|
|