Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

We introduce a method for the prediction of disambiguation errors based on statistical data properties, and develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models.