data biases

Reducing Disambiguation Biases in NMT by Leveraging Explicit Word Sense Information

We provide a novel approach for automatically creating high-precision sense-annotated parallel corpora, and put forward a specifically tailored fine-tuning strategy for exploiting these sense annotations during training without introducing any additional requirement at inference time, for improved word sense disambiguation in machine translation.

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

We introduce the novel Wino-X benchmark to investigate whether translation models can perform coreference resolution that requires commonsense knowledge and whether multilingual language models are capable of commonsense reasoning across multiple languages. Our findings indicate that models are prone to biases and often fail to identify disambiguating information.

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

We introduce a method for the prediction of disambiguation errors based on statistical data properties, and develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models.