Supporting Universal Dependencies in Tree Editor TrEd
Keywords:
NLP, treebank, Universal DependenciesAbstract
The paper presents the tree editor TrEd and related tools that can be used to create, modify, browse, and search treebanks - large language corpora annotated with syntactic and/or semantic structure information. This might include not only phrase structure or dependencies, but also coreference, discourse analysis, and even inter-sentence relations. The project started in the year 2000, and it has been in continuous use since then at various institutions all over the world. Most of the tools are written in Perl, which makes them available to all major operating systems. For searching the treebanks, a query language was developed that describes sets of tree nodes and the relations between them. It also supports aggregation to produce quantitative outputs. There are two different implementations, one translates the queries into SQL statements, the other searches the data directly in the editor. Originally, TrEd supported the PML data format used for the Prague Dependency Treebank. To process data in a different format, one first needed to convert the data into the PML format (and possibly convert the modified data data back to the initial format). Later, a versatile extension system was added to TrEd which made it possible to support other data formats directly. We will show how this works on the example of Universal Dependencies. UD is a framework for grammar annotation across different human languages. The described extension allows TrEd (and some other tools) to open the files in the original UD format natively, building the internal representation on the fly, and also serialise them back after editing.
Published
Issue
Section
Categories
License
Copyright (c) 2024 Jan Štěpánek (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.