1

Injecting Domain Knowledge in Language Models for Task-oriented Dialogue Systems

This paper utilizes light-weight adapters that can be easily integrated with PLMs and serve as a repository for facts learned from different KBs and introduces Knowledge Probing using Response Selection (KPRS) – a probe designed specifically for TOD models.

Reducing Disambiguation Biases in NMT by Leveraging Explicit Word Sense Information

We provide a novel approach for automatically creating high-precision sense-annotated parallel corpora, and put forward a specifically tailored fine-tuning strategy for exploiting these sense annotations during training without introducing any additional requirement at inference time, for improved word sense disambiguation in machine translation.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

We introduce the Beyond the Imitation Game benchmark (BIG-bench) to inform future research into (large-scale) language modeling, prepare for disruptive new model capabilities, and ameliorate socially harmful effects. A thorough evaluation of state-of-the-art language models illustrates the challenging nature of BIG-bench.

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

We introduce the novel Wino-X benchmark to investigate whether translation models can perform coreference resolution that requires commonsense knowledge and whether multilingual language models are capable of commonsense reasoning across multiple languages. Our findings indicate that models are prone to biases and often fail to identify disambiguating information.

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

We introduce a method for the prediction of disambiguation errors based on statistical data properties, and develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models.

Widening the Representation Bottleneck in Neural Machine Translation with Lexical Shortcuts

We argue that the need to represent and propagate lexical features in each layer limits the transformer’s capacity for learning and representing contextual information. To alleviate this bottleneck, we introduce gated shortcut connections between the embedding layer and each subsequent layer within the encoder and decoder, which enables the model to access relevant lexical content dynamically, without expending limited resources on storing it within intermediate states.

The University of Edinburgh's Submissions to the WMT18 News Translation Task

The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN/Transformer ensembles, data selection and weighting, and extensions to back-translation.