Showing 31-40 of 62 publications (page 4 of 7)

2014

Comparing two acquisition systems for automatically building an english-croatian parallel corpus from multilingual websites

Miquel Espla-Gomis, Filip Klubička, Nikola Ljubešić
+3 more Sergio Ortiz-Rojas, Vassilis Papavassiliou, Prokopis Prokopidis
In this paper we compare two tools for automatically harvesting bitexts from multilingual websites: bitextor and ILSP-FC. We used both tools for crawling 21 multilingual websites from the tourism doma...
Read morein to build a domain-specific English–Croatian parallel corpus. Different settings were tried for both tools and 10,662 unique document pairs were obtained. A sample of about 10% of them was manually examined and the success rate was computed on the collection of pairs of documents detected by each setting. We compare the performance of the settings and the amount of different corpora detected by each setting. In addition, we describe the resource obtained, both by the settings and through the human evaluation, which has been released as a high-quality parallel corpus.
2016

Re-assessing the Impact of SMT Techniques with Human Evaluation: a Case Study on English—Croatian

Antonio Toral, Raphael Rubino, Gema Ramírez‐Sánchez
We re-assess the impact brought by a set of widely-used SMT models and techniques by means of human evaluation. These include different types of development sets (crowdsourced vs translated profession...
Read moreally), reordering, operation sequence and bilingual neural language models as well as common approaches to data selection and combination. In some cases our results corroborate previous findings found in the literature, when those approaches were evaluated in terms of automatic metrics, but in some other cases they do not.
2022

Construcción rápida de un sistema de traducción automática español-portugués partiendo de un sistema español-catalán

Gema Ramírez-Sánchez
This chapter gives an overview of the theoretical and practical implications of customizing machine translation (MT) to make it fit for a particular purpose. The chapter is written for readers who hav...
Read moree just a basic knowledge of MT, but experts who are seeking new ways of explaining MT to non-experts may also find it useful. The MT paradigm assumed in the chapter is that of neural MT.
2019

Large-scale machine translation evaluation of the iADAATPA Project

Sheila Castilho, Natália Resende, Federico Gaspari
+9 more Andy Way, Tony O’Dowd, Marek Mazur, Manuel Herranz, Alex Helle, Gema Ramírez‐Sánchez, Víctor Sánchez Cartagena, Mārcis Pinnis, Valters Šics
This paper reports the results of an indepth evaluation of 34 state-of-the-art domain-adapted machine translation (MT) systems that were built by four leading MT companies as part of the EU-funded iAD...
Read moreAATPA project. These systems support a wide variety of languages for several domains. The evaluation combined automatic metrics and human methods, namely assessments of adequacy, fluency, and comparative ranking. The paper also discusses the most effective techniques to build domain-adapted MT systems for the relevant language combinations and domains.
2012

Opinum: statistical sentiment analysis for opinion classification

Boyan Bonev, Gema Ramírez‐Sánchez, Sergio Ortíz-Rojas
The classification of opinion texts in positive and negative can be tackled by evaluating separate key words but this is a very limited approach. We propose an approach based on the order of the words...
Read more without using any syntactic and semantic information. It consists of building one probabilistic model for the positive and another one for the negative opinions. Then the test opinions are compared to both models and a decision and confidence measure are calculated. In order to reduce the complexity of the training corpus we first lemmatize the texts and we replace most namedentities with wildcards. We present an accuracy above 81% for Spanish opinions in the financial products domain.
2010

Using the Apertium Spanish-Brazilian Portuguese machine translation system for localization

Boyan Bonev, Gema Ramírez‐Sánchez, Sergio Ortíz-Rojas
We present a user case of the free/opensource Spanish↔ Brazilian Portuguese Apertium machine translation system inside the localization workflow of Autodesk. This system, initially developed to perfor...
Read morem general-domain translations, has been customized by Prompsit to fit with Autodesk needs and by respecting the localization workflow as much as possible. This original scenario shows that postedited machine translation can generate immediately significant productivity gains with publication-ready linguistic quality.
2021

MultiTraiNMT: training materials to approach neural machine translation from scratch

Gema Ramírez-Sánchez, Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez
+5 more Caroline Rossi, Dorothy Kenny, Riccardo Superbo, Pilar Sánchez-Gijón, Olga Torres-Hostench
The aim of the MultiTraiNMT Erasmus+ project is to develop an open innovative syllabus in neural machine translation (NMT) for language learners and translators as multilingual citizens. Machine trans...
Read morelation is seen as a resource that can provide support for citizens when trying to acquire and develop language skills, provided they are given informed and critical training. Machine translation would thus help tackle the mismatch between the EU aim of having multilingual citizens who speak at least two foreign languages and the current situation in which citizens generally fall far short of this objective. The training materials consist of an open-access coursebook, an open-source NMT web application called MutNMT for training purposes and corresponding activities.
2010

A web-based translation service at the UOC based on Apertium

Luis Villarejo, Mireia Farrús, Sergio Ortiz
+1 more Gema Ramírez
In this paper, we describe the adaptation process of Apertium, a free/open-source rule-based machine translation platform which is operating in a number of different real-life contexts, to the linguis...
Read moretic needs of the Universitat Oberta de Catalunya (Open University of Catalonia, UOC), a private e-learning university based in Barcelona where linguistic and cultural diversity is a crucial factor. This paper describes the main features of the Apertium platform and the practical developments required to fully adapt it to UOC's linguistic needs. The settting up of a translation service at UOC based on Apertium shows the growing interest of large institutions with translation needs for open-source solutions in which their investment is oriented toward adding value to the available features to offer the best possible adapted service to their user community.
2022

MultitraiNMT Erasmus+ project: Machine Translation Training for multilingual citizens (multitrainmt. eu)

Mikel L Forcada, Pilar Sánchez-Gijón, Dorothy Kenny
+6 more Felipe Sánchez‐Martínez, Juan Antonio Pérez-Ortiz, Riccardo Superbo, Gema Ramírez‐Sánchez, Olga Torres-Hostench, Caroline Rossi
The MultitraiNMT Erasmus+ project has developed an open innovative syl-labus in machine translation, focusing on neural machine translation (NMT) and targeting both language learners and translators. ...
Read moreThe training materials include an open access coursebook with more than 250 activities and a pedagogical NMT interface called MutNMT that allows users to learn how neural machine translation works. These materials will allow students to develop the technical and ethical skills and competences required to become informed, critical users of machine translation in their own language learn-ing and translation practice. The pro-ject started in July 2019 and it will end in July 2022.
2013

Statistical sentiment analysis performance in Opinum

Boyan Bonev, Gema Ramírez-Sánchez, Sergio Ortiz Rojas
The classification of opinion texts in positive and negative is becoming a subject of great interest in sentiment analysis. The existence of many labeled opinions motivates the use of statistical and ...
Read moremachine-learning methods. First-order statistics have proven to be very limited in this field. The Opinum approach is based on the order of the words without using any syntactic and semantic information. It consists of building one probabilistic model for the positive and another one for the negative opinions. Then the test opinions are compared to both models and a decision and confidence measure are calculated. In order to reduce the complexity of the training corpus we first lemmatize the texts and we replace most named-entities with wildcards. Opinum presents an accuracy above 81% for Spanish opinions in the financial products domain. In this work we discuss which are the most important factors that have impact on the classification performance.
NLP Research & Publications | Machine Translation Papers | Prompsit