Introduction of the UNL

A Gift for a Millennium

The Universal Networking Language (UNL) project is an international enterprise under the auspice of the United Nations University (UNU) in Tokyo. The project was set up in 1996 at the Institute of Advanced Studies (IAS) of the UNU under the lead of Prof. Tarcisio Della Senta, the director of the IAS, and Dr. Hiroshi Uchida, UNL project director. The mission of the project is to provide the methods and tools for overcoming the language barrier on the World Wide Web in a systematic way.

For this the UNL has been defined as a digital metalanguage for describing, summarizing, refining, storing and disseminating information in a machine-independent and human-language-neutral form. Other, comparable, systems exist for annotating sentence meaning, leading to a serious problem: the existing systems vary greatly in detail and conception, depending in many cases on the human language they were developed for, thus generating fundamental incompatibilities among them and inadequacies in dealing with different languages. A language-neutral metalanguage can circumvent this problem, permitting the coding, storage, dissemination and retrieval of information independently of the original language in which it was expressed.

UNL can be seen as a kind of mark-up language which represents not the formatting but the core information of a text. As such it can be embedded in the eXtensible Mark-up Language (XML), and as HTML/XML annotations already can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.

The principle of the UNL presentational system is its concept orientation anchored in three basic mechanisms:

Labeled links (binary relations);
Universal words (UWs);
Attributes.

The UNL represents information and meaning sentence by sentence for each sentence of a given text. Sentence information is represented as a list of interrelated (semantic) labeled links, each between two of the concepts present in the sentence. Concepts are represented as character-strings called UWs which are used to index the UNL knowledge base (KB). UWs can be annotated with attributes which provide further information about how the concept is being used in the specific environment of the sentence. The UNL KB is dynamic in the sense that it evolves as information is added. The semantic links that build structures out of concepts are signalled in human language texts by different grammatical means such as word order, agreement, suffixes, etc. for different languages. The links can also be interrelated in complex ways to represent very complex relations between concepts or groups of concepts, for example, coordinated structures. Thus, representations across sentence boundaries are possible but they are not yet on the UNL.

The design of the UNL system and its core software applications is being created by the UNL Center at the Institute of Advanced Studies of UNU. This includes: (1) the Universal Networking Language system; (2) enconversion and deconversion software; and (3) technical specifications and guidelines for developing native-language enconversion and deconversion modules. Conversion software modules for each native language are being developed in partnership with research institutes, universities, and R&D groups under contract with UNU/IAS.

Launched in 1996, the UNL project has already demonstrated both R&D vigor and a capacity to motivate and obtain commitments from the best-known research centers in computer linguistics. It has also produced tangible results that are encouraging the further expansion of commitments.

The period for the full development of the UNL project is ten years. The first four years (1996-1999) are devoted to creating the UNL core system and of the conversion modules for a dozen natural languages, including the six official languages of the United Nations. The UNL Center is fully operational, and has begun implementing an experimental software system, in collaboration with research institutes and R&D companies all over the world which conduct researches in several languages: Arabic, Chinese, French, German, Hindu, Indonesian, Italian, Japanese, Mongolian, Portuguese, Russian, and Spanish. The remaining seven years (1999-2005) will be applied to the development of modules for the native languages of the other member states and to improving system performance and quality.