prompsit logo

Publications

Nikolay Bogoychev, Jelmer van der Linde, Graeme Nail, Barry Haddow, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Lukas Weymann, Tudor Nicolae Mateiu, Jindřich Helcl, Mikko Aulamo

Developing high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field. We present a pair of tools OpusCleaner and OpusTrainer that aim to simplify the process, redu...Read more

PDF

Mikko Aulamo, Nikolay Bogoychev, Shaoxiong Ji, Graeme Nail, Gema Ramírez‐Sánchez, Jörg Tiedemann, Jelmer Van Der Linde, Jaume Zaragoza

Proceedings of the 24th Annual Conference of the European Association for Machine Translation

We describe the High Performance Language Technologies project (HPLT), a 3-year EU-funded project started in September 2022. HPLT will build a space combining petabytes of natural language data with large-scale model training. It ...Read more

PDF

Miquel Esplà-Gomis, Mikel L Forcada, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Gema Ramírez‐Sánchez, Jörg Tiedemann, Antonio Toral

Proceedings of the 1st Workshop on Open Community-Driven Machine Translation

Proceedings of the 1st Workshop on Open Community-Driven Machine Translation Page 1 Proceedings of the 1st Workshop on Open Community-Driven Machine Translation June 15 2023 Tampere, Finland Edited by Miquel Espl`a-Gomis (Universi...Read more

PDF

Gema Ramírez‐Sánchez

Proceedings of the 1st Workshop on Open Community-Driven Machine Translation

We present MutNMT, 1 an open-source web application for educational purposes to introduce non-experts to NMT. The tool, developed within the MultiTraiNMT project2 along with other training materials (a book3 and activities4), gath...Read more

PDF

Jaume Zaragoza-Bernabeu, Gema Ramírez‐Sánchez, Marta Bañón, Sergio Ortiz-Rojas

Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the experiments carried out during the development of the latest version of Bicleaner, named Bicleaner AI, a tool that aims at detecting noisy sentences in parallel corpora. The tool, which now implements a ne...Read more

PDF

Marta Banón, Miquel Espla-Gomis, Mikel L Forcada, Cristian García-Romero, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Peter Rupnik, Vít Suchomel, Antonio Toral, Tobias van der Werff, Jaume Zaragoza

23rd Annual Conference of the European Association for Machine Translation, EAMT 2022

We introduce the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel ...Read more

PDF

Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval)

Jožef Stefan Institute

Quality assessment has been an ongoing activity of the series of ParaCrawl efforts to crawl massive amounts of parallel data from multilingual websites for 29 languages. The goal of ParaCrawl is to get parallel data that is good f...Read more

PDF

Gema Ramírez-Sánchez

Machine translation for everyone: Empowering users in the age of artificial intelligence

This chapter gives an overview of the theoretical and practical implications of customizing machine translation (MT) to make it fit for a particular purpose. The chapter is written for readers who have just a basic knowledge of MT...Read more

PDF

Mikel L Forcada, Pilar Sánchez-Gijón, Dorothy Kenny, Felipe Sánchez‐Martínez, Juan Antonio Pérez-Ortiz, Riccardo Superbo, Gema Ramírez‐Sánchez, Olga Torres-Hostench, Caroline Rossi

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators. The training materials include...Read more

PDF

Jaume Zaragoza-Bernabeu, Gema Ramírez‐Sánchez, Marta Bañón, Sergio Ortiz-Rojas

Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper describes the experiments carried out during the development of the latest version of Bicleaner, named Bicleaner AI, a tool that aims at detecting noisy sentences in parallel corpora. The tool, which now implements a ne...Read more

PDF

Gema Ramírez-Sánchez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Caroline Rossi, Dorothy Kenny, Riccardo Superbo, Pilar Sánchez-Gijón, Olga Torres-Hostench

TRITON 2021 (Translation and Interpreting Technology Online)

The aim of the MultiTraiNMT Erasmus+ project is to develop an open innovative syllabus in neural machine translation (NMT) for language learners and translators as multilingual citizens. Machine translation is seen as a resource t...Read more

PDF

Gema Ramírez‐Sánchez, Jaume Zaragoza-Bernabeu, Marta Bañón, Sergio Ortiz-Rojas

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

This paper shows the utility of two open-source tools designed for parallel data cleaning: Bifixer and Bicleaner. Already used to clean highly noisy parallel content from crawled multilingual websites, we evaluate their performanc...Read more

PDF

Miquel Espla-Gomis, Víctor M Sánchez-Cartagena, Jaume Zaragoza-Bernabeu, Felipe Sánchez‐Martínez

Proceedings of the Fifth Conference on Machine Translation

This paper describes the joint submission of Universitat d’Alacant and Prompsit Language Engineering to the WMT 2020 shared task on parallel corpus filtering. Our submission, based on the free/open-source tool Bicleaner, enhances ...Read more

PDF

Marta Bañón and Sergio Ortiz Rojas Gema Ramírez-Sánchez, Jaume Zaragoza-Bernabeu

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation (2020)

This paper shows the utility of two open-source tools designed for parallel data cleaning: Bifixer and Bicleaner. Already used to clean highly noisy parallel content from crawled multilingual websites, we evaluate their performanc...Read more

PDF

Miquel Esplà-Gomis, Mikel L Forcada, Gema Ramírez‐Sánchez, Hieu Hoang

Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

We describe two projects funded by the Connecting Europe Facility, Provision of Web-Scale Parallel Corpora for Official European Languages (2016-EU-IA-0114, completed) and Broader Web-Scale Provision of Parallel Corpora for Europe...Read more

PDF

Jaume Zaragoza Bernabeu

Universitat Politècnica de València

Entenem com a paràfrasi l'acte de reescriure un text amb paraules diferents mantenint el seu significat. Hi podem trobar moltes aplicacions de la paràfrasi tals com reescriure paraules mentre s'escriu una text, proporcionar traduc...Read more

Sheila Castilho, Natália Resende, Federico Gaspari, Andy Way, Tony O’Dowd, Marek Mazur, Manuel Herranz, Alex Helle, Gema Ramírez‐Sánchez, Víctor Sánchez Cartagena, Mārcis Pinnis, Valters Šics

Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

This paper reports the results of an indepth evaluation of 34 state-of-the-art domain-adapted machine translation (MT) systems that were built by four leading MT companies as part of the EU-funded iADAATPA project. These systems s...Read more

PDF

Víctor M Sánchez-Cartagena, Marta Bañón, Sergio Ortiz-Rojas, Gema Ramírez‐Sánchez

Proceedings of the third conference on machine translation: shared task papers

This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs of sentences that are mutual ...Read more

PDF

Mikel L Forcada, Francis Tyers, Gema Ramírez‐Sánchez

Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

This paper describes Apertium: a free/open-source machine translation platform (engine, toolbox and data), its history, its philosophy of design, its technology, the community of developers, the research and business based on it, ...Read more

PDF

Gema Ramírez‐Sánchez

Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

How does AltLang work? The basics… 1/3 ● automatically and quickly replaces differences among two variants of the same language→ nice for dynamic content ● performs only controlled changes→ no (or low) risks● highly customisable→ ...Read more

PDF

MTradumàtica: Free Statistical Machine Translation Customisation for Translators

2017

prompsit logo

Gökhan Doğru, Adrià Martín-Mor, Sergio Ortiz-Rojas

Annual Conference of the European Association for Machine Translation

MTradumàtica is a free, Moses-based web platform for training and using statistical machine translation systems with a user-friendly graphical interface. Its goal is to offer translators a free tool to customise their own statisti...Read more

PDF

Filip Klubička, Gema Ramírez‐Sánchez, Nikola Ljubešić

Proceedings of the 19th Annual Conference of the European Association for Machine Translation

This paper describes the development and current state of a bidirectional Croatian-Serbian machine translation system based on the open-source Apertium platform. It has been created inside the Abu-MaTran project with the aims of c...Read more

PDF

Antonio Toral, Raphael Rubino, Gema Ramírez‐Sánchez

Proceedings of the 19th Annual Conference of the European Association for Machine Translation

We re-assess the impact brought by a set of widely-used SMT models and techniques by means of human evaluation. These include different types of development sets (crowdsourced vs translated professionally), reordering, operation s...Read more

PDF

Antonio Toral, Sergio ORTIZ_ROJAS, Mikel FORCADA, Nikola Ljubesic, Prokopis Prokopidis

Baltic Journal of Modern Computing

We present the current status of Abu-MaTran (http://www. abumatran. eu), a 4-year project (January 2013-December 2016) on rapid development of machine translation for underresourced languages. It is funded under Marie Curie's Indu...Read more

Gema Ramírez-Sánchez

EAMT (Projects/Products)

AltLang is a rule-based automatic converter for language varieties. It deals with differences in spelling, lexicon and local grammar along with numeric, style and punctuation conventions. It is available for varieties of English, ...Read more

Ilknur Durgar El-Kahlout, Mehmed Özkan, Felipe Sánchez‐Martínez, Gema Ramírez‐Sánchez, Fred Hollowood, Andy Way

Proceedings of the 18th Annual Conference of the European Association for Machine Translation

It has been a huge honour for me to serve as president of the European Association for Machine Translation (EAMT) over the past six years. As I step down from office, I am delighted that the last EAMT annual conference under my pr...Read more

PDF

Nikola Ljubešić, Miquel Esplà-Gomis, Antonio Toral, Sergio Ortiz-Rojas, Filip Klubička

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents an approach for building large monolingual corpora and, at the same time, extracting parallel data by crawling the top-level domain of a given language of interest. For gathering linguistically relevant data fr...Read more

PDF

Cloudlm: a cloud-based language model for machine translation

2016

prompsit logo

Jorge Ferrández-Tordera, Sergio Ortiz-Rojas, Antonio Toral

Prague Bulletin of Mathematical Linguistics

Language models (LMs) are an essential element in statistical approaches to natural language processing for tasks such as speech recognition and machine translation (MT). The advent of big data leads to the availability of massive...Read more

PDF

Víctor M Sánchez-Cartagena, Marta Bañón, Sergio Ortiz Rojas, Gema Ramírez-Sánchez

Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes Prompsit Language Engineering’s submissions to the WMT 2018 parallel corpus filtering shared task. Our four submissions were based on an automatic classifier for identifying pairs of sentences that are mutual ...Read more

PDF

Raphael Rubino, Tommi A Pirinen, Miquel Espla-Gomis, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis, Antonio Toral

Proceedings of the Tenth Workshop on Statistical Machine Translation

This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish–English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish ...Read more

PDF

İIknur Durgar El-Kahlout, Mehmed Özkan, Felipe Sánchez-Martínez, Gema Ramírez-Sánchez, Fred Hollowood, Andy Way

Antalya, Turkey

This paper presents the work done to port a deep-transfer rule-based machine translation system to translate from a different source language by maximizing the exploitation of existing resources and by limiting the development wor...Read more

PDF

Antonio Toral, Raphael Rubino, Miquel Espla-Gomis, Tommi A Pirinen, Andy Way, Gema Ramírez‐Sánchez

Proceedings of the 17th Annual conference of the European Association for Machine Translation

We present an extrinsic evaluation of crawlers of parallel corpora from multilingual web sites in machine translation (MT). Our case study is on Croatian to English translation in the tourism domain. Given two crawlers, we build p...Read more

PDF

Raphael Rubino, Antonio Toral, Victor M Sánchez-Cartagena, Jorge Ferrández-Tordera, Sergio Ortiz-Rojas, Gema Ramírez‐Sánchez, Felipe Sánchez‐Martínez, Andy Way

Proceedings of the ninth workshop on statistical machine translation

This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English–French with a focus on French as the target language. The French to E...Read more

PDF

Miquel Espla-Gomis, Filip Klubička, Nikola Ljubešić, Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis

European Language Resources Association (ELRA)

In this paper we compare two tools for automatically harvesting bitexts from multilingual websites: bitextor and ILSP-FC. We used both tools for crawling 21 multilingual websites from the tourism domain to build a domain-specific ...Read more

PDF

Antonio Toral, Guillermo Latour, Stanislav Gurevich, Mikel Forcada, Gema Ramírez-Sánchez

Procesamiento del Lenguaje Natural

aims to establish a linguistic Olympiad in Spain. We introduce the Linguistic Olympiads, our rationale and objectives for setting up OLE as well as our implementation plan for. We foresee our work to be useful for other countries ...Read more

PDF

Antonio Toral, Guillermo Latour, Stanislav Gurevich, Mikel Forcada, Gema Ramírez-Sánchez

Raphael Rubino, Antonio Toral, Nikola Ljubeˇsic, Gema Ramírez-Sánchez

This paper presents a novel approach for parallel data generation using machine translation and quality estimation. Our study focuses on pivot-based machine translation from English to Croatian through Slovene. We generate an Engl...Read more

Raphael Rubino, Antonio Toral, Victor M Sánchez-Cartagena, Jorge Ferrández-Tordera, Sergio Ortiz-Rojas, Gema Ramírez‐Sánchez, Felipe Sánchez‐Martínez, Andy Way

Proceedings of the ninth workshop on statistical machine translation

This paper presents the machine translation systems submitted by the Abu-MaTran project to the WMT 2014 translation task. The language pair concerned is English–French with a focus on French as the target language. The French to E...Read more

PDF

Boyan Bonev, Gema Ramírez-Sánchez, Sergio Ortiz Rojas

arXiv preprint arXiv:1303.0446

The classification of opinion texts in positive and negative is becoming a subject of great interest in sentiment analysis. The existence of many labeled opinions motivates the use of statistical and machine-learning methods. Firs...Read more

PDF

Automatic acquisition of machine translation resources in the Abu-MaTran project

2013

prompsit logo

Miquel Esplà-Gomis, Nikola Ljubesic, Filip Klubicka, Prokopis Prokopidis, Vassilis Papavassiliou, Antonio Toral, Tommi Pirinen, Andy Way, Raphael Rubino, Gema Ramírez-Sánchez, Sergio Ortiz-Rojas, Víctor Sánchez-Cartagena, Jorge Ferrández-Tordera, Mikel Forcada

Procesamiento del Lenguaje Natural, Sociedad Española para el Procesamiento del Lenguaje Natural

This paper provides an overview of the research and development activities carried out to alleviate the language resources’ bottleneck in machine translation within the Abu-MaTran project. We have developed a range of tools for th...Read more

PDF

Jordi Duran, Lluís Villarejo, Mireia Farrús, Sergio Ortiz, Gema Ramírez

Springer Berlin Heidelberg

The Universitat Oberta de Catalunya (Open University of Catalonia, UOC), is a public university based in Barcelona. The UOC is characterised by three main factors: (a) it is a virtual university based in an e-Learning model, (b) i...Read more

PDF

Boyan Bonev, Gema Ramírez‐Sánchez, Sergio Ortíz-Rojas

Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

The classification of opinion texts in positive and negative can be tackled by evaluating separate key words but this is a very limited approach. We propose an approach based on the order of the words without using any syntactic a...Read more

PDF

Mikel L Forcada, Mireia Ginestí-Rosell, Jacob Nordfalk, Jim O’Regan, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Gema Ramírez-Sánchez, Francis M Tyers

Machine translation - Springer Netherlands

Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pair...Read more

PDF

Using Apertium linguistic data for tokenization to improve Moses SMT performance

2011

prompsit logo

Sergio Ortiz Rojas, Santiago Cortés Vaıllo, UMH Campus, Edficio Quorum III

LIHMT 2011

This paper describes a new method to tokenize texts, both to train a Moses SMT system and to be used during the translation process. The new method involves reusing the morphological analyser and part-of-speech tagger of the Apert...Read more

Francis M Tyers, Felipe Sánchez-Martínez, Sergio Ortiz Rojas, Mikel L Forcada

Charles University in Prague. Institute of Formal and Applied Linguistics

This paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems. Resources within the platform take the form of finite-state morphologies for ...Read more

PDF

Miquel Espla-Gomis, Mikel L Forcada, Sergio Ortiz-Rojas, Jorge Ferrández-Tordera

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

This paper describes the participation of Prompsit Language Engineering and the Universitat d’Alacant in the shared task on document alignment at the First Conference on Machine Translation (WMT 2016). Two systems have been submit...Read more

PDF

Boyan Bonev, Gema Ramírez‐Sánchez, Sergio Ortíz-Rojas

Francois Masselot, Petra Ribiczey, Gema Ramírez‐Sánchez

We present a user case of the free/opensource Spanish↔ Brazilian Portuguese Apertium machine translation system inside the localization workflow of Autodesk. This system, initially developed to perform general-domain translations,...Read more

PDF

Luis Villarejo, Mireia Farrús, Sergio Ortiz, Gema Ramírez

Proceedings of the International Multiconference on Computer Science and Information Technology

In this paper, we describe the adaptation process of Apertium, a free/open-source rule-based machine translation platform which is operating in a number of different real-life contexts, to the linguistic needs of the Universitat O...Read more

PDF

Development of a free Basque to Spanish machine translation system

2009

prompsit logo

Mireia Ginestí-Rosell, Gema Ramírez-Sánchez, Sergio Ortiz-Rojas, Francis M Tyers, Mikel L Forcada

Sociedad Española para el Procesamiento del Lenguaje Natural

This paper presents a free (or open-source) rule-based machine translation system between Basque and Spanish, based on the Apertium machine translation platform aimed at assimilation, that is, as a help for the understanding of te...Read more

PDF

Mireia Ginestí-Rosell, Gema Ramírez-Sánchez, Sergio Ortiz-Rojas, Francis M Tyers, Mikel L Forcada

Procesamiento del Lenguaje Natural

Este artículo presenta un sistema de traducción automática libre (de código abierto) basado en reglas entre euskera y castellano, construido sobre la plataforma de traducción automática Apertium y pensado para la asimilación, es d...Read more

PDF

Luis Villarejo Munoz, Sergio Ortíz-Rojas, Mireia Ginestí-Rosell

Proceedings of the First International Workshop on Free/Open-Source Rule-based Machine Translation

This article describes the needs of UOC regarding translation and how these needs are satisfied by Prompsit further developing a free rule-based machine translation system: Apertium. We initially describe the general framework reg...Read more

PDF

Documentation of the open-source shallow-transfer machine translation platform Apertium

2007

prompsit logo

Mikel L Forcada, Boyan Ivanov Bonev, S Ortiz Rojas, JA Pérez Ortiz, G Ramírez Sánchez, F Sánchez Martínez, Carme Armentano-Oller, Marco A Montava, Francis M Tyers

Departament de Llenguatges i Sistemes Informatics Universitat d‟ Alacant

This documentation describes the Apertium platform, one of the opensource machine translation systems which originated within the project” Open-Source Machine Translation for the Languages of Spain”(” Traducci ón automática de cód...Read more

Carme Armentano Oller, Antonio Miguel Corbí Bellot, Mikel L Forcada, Mireia Ginestí Rosell, Marco A Montava Belda, Sergio Ortiz Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez Sánchez, Felipe Sánchez-Martínez

Universidad de Cádiz. Servicio de Publicaciones

Uno de los principales retos de la informática para las próximas décadas es el desarrollo de sistemas capaces de procesar eficazmente el lenguaje natural (o lenguaje humano). Dentro de este campo, los sistemas de traducción automá...Read more

PDF

Carme Armentano-Oller, Rafael C Carrasco, Antonio M Corbí-Bellot, Mikel L Forcada, Mireia Ginestí-Rosell, Sergio Ortiz-Rojas, Juan Antonio Pérez-Ortiz, Gema Ramírez-Sánchez, Felipe Sánchez-Martínez, Miriam A Scalco

International Workshop on Computational Processing of the Portuguese Language

This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.a...Read more

PDF

Gema Ramírez Sánchez, Felipe Sánchez-Martínez, Sergio Ortiz Rojas, Juan Antonio Pérez-Ortiz, Mikel L Forcada

Aslib

Most successful machine translation systems built until now use proprietary software and data, and are either distributed as commercial products or are accessible on the net with some restrictions. This kind of machine translation...Read more

PDF

Sergio Ortiz Rojas, Mikel L Forcada, Gema Ramírez Sánchez

Sociedad Española para el Procesamiento del Lenguaje Natural

En este artículo se presenta un modelo de gestión de diccionarios basado en paradigmas para construir procesadores léxicos. Para ello, primero se muestran algunos ejemplos que permiten poner de manifiesto la potencia expresiva del...Read more

PDF

Enrique Sánchez-Villamil, Susana Santos-Antón, Sergio Ortiz-Rojas, Mikel L Forcada

Advances in Natural Language Processing: 5th International Conference on NLP, FinTAL 2006 Turku, Finland, August 23-25, 2006 Proceedings

The Internet constitutes a potential huge store of parallel text that may be collected to be exploited by many applications such as multilingual information retrieval, machine translation, etc. These applications usually require a...Read more

PDF