Welcome to COSMAT Project
COSMAT Project aims to give access to scientific resources through an adapted machine translation service.
The scientific objectives of this project are to improve the core machine translation engines, enabling a collaborative application. The particular characteristics of this application pose very interesting challenges to the underlying technology, for instance translation speed versus accuracy, task adaptation versus genericity, improved statistical models and/or integration of linguistic knowledge, whole document coherence, etc. The machine translation engines will be based on a statistical approach and integrate linguistic knowledge.
An important part of the fundamental research, development and engineering work will go into the development of a collaborative internet translation service of scientific documents that will be smoothly integrated into the open document archive (HAL) operated by INRIA and CNRS. All the existing and new documents will be automatically translated between French and English. Through a Web2.0 interface, users will be able to edit and correct the automatic translations, judge their quality or suggest alternative translations, maintain domain specific terminology in collaboration with peers and customize the system to his needs. The system will automatically learn from these user interventions and improve continually. All major document formats of scientific publications will be supported. Note that this system will be accessed free of charge.
We will work on the automatic acquisition of new bilingual data using either user interaction or unsupervised techniques. We will in particular use the rule-based translation engine from SYSTRAN to foster for parallel texts using information retrieval techniques. Fundamental research is also needed to improve the global coherence of the translated documents, i.e. the system must guarantee that a word with the same meaning has the same translation in the entire text. The current fundamental equations of statistical machine translation are based on a translation sentence by sentence only.
The company SYSTRAN is the world-wide leader of rule-based translation technology and we will work to develop the best techniques to incorporate these resources into the statistical translation engine. Statistical and rule-based approaches to machine translation are often considered as two different worlds that will continue to develop independently. This project aims in using the best of both worlds.
COSMAT is a scientific project funded by the ANR (Agence Nationale de la Recherche)
Project reference: ANR-09-CORD-004
