We introduce a method for the prediction of disambiguation errors based on statistical data properties, and develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models.
We argue that the need to represent and propagate lexical features in each layer limits the transformer’s capacity for learning and representing contextual information. To alleviate this bottleneck, we introduce gated shortcut connections between the embedding layer and each subsequent layer within the encoder and decoder, which enables the model to access relevant lexical content dynamically, without expending limited resources on storing it within intermediate states.
The University of Edinburgh made submissions to all 14 language pairs in the news translation task, with strong performances in most pairs. We introduce new RNN-variant, mixed RNN/Transformer ensembles, data selection and weighting, and extensions to back-translation.