Open-source Portuguese–Spanish machine translation
Carme Armentano-Oller, Rafael C Carrasco, Antonio M Corbí-Bellot+7 more
Mikel L Forcada, Mireia Ginestí-Rosell, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Miriam A Scalco
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese Spanish language pair, developed using the OpenT...Read more
rad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for structural transfer, and is based on a simple rationale: to produce fast, reasonably intelligible and easily correctable translations between related languages, it suffices to use a MT strategy which uses shallow parsing techniques to refine word-for-word MT. This paper briefly describes the MT engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine, and then goes on to describe in more detail the pilot PortugueseSpanish linguistic data.
Prompsit’s submission to wmt 2018 parallel corpus filtering shared task
Víctor M Sánchez-Cartagena, Marta Bañón, Sergio Ortiz-Rojas+1 more
Gema Ramírez‐Sánchez
This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs...Read more
of sentences that are mutual translations. A set of hand-crafted hard rules for discarding sentences with evident flaws were applied before the classifier. We explored different strategies for achieving a training corpus with diverse vocabulary and fluent sentences: language model scoring, an active-learning-inspired data selection algorithm and n-gram saturation. Our submissions were very competitive in comparison with other participants on the 100 million word training corpus.
The Apertium machine translation platform: five years on
Mikel L Forcada, Francis Tyers, Gema Ramírez‐Sánchez
This paper describes Apertium: a free/open-source machine translation platform (engine, toolbox and data), its history, its philosophy of design, its technology, the community of developers, the resea...Read more
rch and business based on it, and its prospects and challenges, now that it is five years old.
Documentation of the open-source shallow-transfer machine translation platform Apertium
Mikel L Forcada, Boyan Ivanov Bonev, S Ortiz Rojas+6 more
JA Pérez Ortiz, G Ramírez Sánchez, F Sánchez Martínez, Carme Armentano-Oller, Marco A Montava, Francis M Tyers
This documentation describes the Apertium platform, one of the opensource machine translation systems which originated within the project” Open-Source Machine Translation for the Languages of Spain”(”...Read more
Traducci ón automática de código abierto para las lenguas del estado espanol”). It is a shallow-transfer machine translation system, initially designed for the translation between related language pairs, although some of its components have been also used in the deep-transfer architecture (Matxin) that has been developed in the same project for the pair Spanish-Basque. Apertium can translate at present between the pairs Spanish-Galician, Spanish–Catalan1 Catalan-Occitan, Catalan-French, and can be used to build translators between other related language pairs, such as Danish-Swedish, Czech–Slovak, etc. Existing machine translation systems available at present for the pairs es–ca and es–gl are mostly commercial or use proprietary technologies, which makes them very hard to adapt to new usages; furthermore, they use different technologies across language pairs, which makes it very difficult to integrate them in a single multilingual content management system.
Opentrad Apertium open-source machine translation system: an opportunity for business and research
Gema Ramírez Sánchez, Felipe Sánchez-Martínez, Sergio Ortiz Rojas+2 more
Juan Antonio Pérez-Ortiz, Mikel L Forcada
Most successful machine translation systems built until now use proprietary software and data, and are either distributed as commercial products or are accessible on the net with some restrictions. Th...Read more
is kind of machine translation systems are regarded by most professional translators and researchers as closed and static products which cannot be adapted or enhanced for a particular purpose. In contrast to these systems, we present Opentrad Apertium, an open-source shallow-transfer machine translation engine originally intended for related-language pairs but currently being extended to deal with not so similar pairs. The opportunities offered by open-source software are very interesting in new research projects but they are also promising in business and particularly as a business model for innovative companies. Questions like which is the most suitable license to release open-source software, or how to make it visible are discussed in this paper. Real opportunities for research and business derived from the Apertium machine translation system are presented as well.
Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas
Sergio Ortiz Rojas, Mikel L Forcada, Gema Ramírez Sánchez
En este artículo se presenta un modelo de gestión de diccionarios basado en paradigmas para construir procesadores léxicos. Para ello, primero se muestran algunos ejemplos que permiten poner de manifi...Read more
esto la potencia expresiva del modelo presentado y el amplio abanico de lenguas al que se puede aplicar este sistema. A continuación se explica un método para construir eficientemente transductores de letras a partir de los diccionarios aprovechando el uso de paradigmas. Finalmente se presentan los resultados que se han obtenido con el sistema implementado.
Apertium, una plataforma de código abierto para el desarrollo de sistemas de traducción automática
Carme Armentano Oller, Antonio Miguel Corbí Bellot, Mikel L Forcada+6 more
Mireia Ginestí Rosell, Marco A Montava Belda, Sergio Ortiz Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez Sánchez, Felipe Sánchez-Martínez
Uno de los principales retos de la informática para las próximas décadas es el desarrollo de sistemas capaces de procesar eficazmente el lenguaje natural (o lenguaje humano). Dentro de este campo, los...Read more
sistemas de traducción automática, encargados de traducir un texto escrito en un idioma a una versión equivalente en otro idioma, reciben especial atención dado, por ejemplo, el carácter multilingüe de sociedades como la europea. La automatización de dicho proceso es particularmente compleja porque los programas han de enfrentarse a características del lenguaje natural, como la ambigüedad, cuyo tratamiento algorítmico no es factible, de modo que una mera aproximación o automatización parcial del proceso ya se considera un éxito. Los programas de traducción automática han sido tradicionalmente sistemas cerrados, pero en los últimos tiempos la tendencia marcada por el software libre ha llegado también a este campo. En este artículo describimos Apertium, apertium.org, una plataforma avanzada de código abierto, con licencia GNU GPL, que, gracias al desacoplamiento que ofrece entre datos y programas permite desarrollar cómodamente nuevos traductores automáticos. La plataforma Apertium ha sido desarrollada por el grupo de investigación Transducens de la Universitat d’Alacant en el marco de varios proyectos de colaboración con universidades y empresas de España en los que, además de los programas que conforman el motor de traducción, se han confeccionado datos lingüísticos abiertos para la traducción automática catalán–español, gallego–español, portugués–español, francés–catalán, inglés–catalán y …
Development of a free Basque to Spanish machine translation system
Mireia Ginestí-Rosell, Gema Ramírez-Sánchez, Sergio Ortiz-Rojas+2 more
Francis M Tyers, Mikel L Forcada
This paper presents a free (or open-source) rule-based machine translation system between Basque and Spanish, based on the Apertium machine translation platform aimed at assimilation, that is, as a he...Read more
lp for the understanding of texts written in Basque. The development process and current status are described and an evaluation is given of the utility of the output.
Extrinsic evaluation of web-crawlers in machine translation: a study on Croatian-English for the tourism domain
Antonio Toral, Raphael Rubino, Miquel Espla-Gomis+3 more
Tommi A Pirinen, Andy Way, Gema Ramírez‐Sánchez
We present an extrinsic evaluation of crawlers of parallel corpora from multilingual web sites in machine translation (MT). Our case study is on Croatian to English translation in the tourism domain. ...Read more
Given two crawlers, we build phrase-based statistical MT systems on the datasets produced by each crawler using different settings. We also combine the best datasets produced by each crawler (union and intersection) to build additional MT systems. Finally we combine the best of the previous systems (union) with general-domain data. This last system outperforms all the previous systems built on crawled data as well as two baselines (a system built on general-domain data and a well known online MT system).
Collaborative development of a rule-based machine translator between Croatian and Serbian
Filip Klubička, Gema Ramírez‐Sánchez, Nikola Ljubešić
This paper describes the development and current state of a bidirectional Croatian-Serbian machine translation system based on the open-source Apertium platform. It has been created inside the Abu-MaT...Read more
ran project with the aims of creating free linguistic resources as well as having non-experts and experts work together. We describe the collaborative way of collecting the necessary data to build our system, which outperforms other available systems.