机器翻译,又称为自动翻译,是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支,是人工智能的终极目标之一,具有重要的科学研究价值。

知识荟萃

机器翻译 Machine Translation 专知荟萃

入门学习

综述

进阶论文

1997

  1. Neco, R. P., & Forcada, M. L. (1997, June). Asynchronous translations with recurrent neural nets. In Neural Networks, 1997., International Conference on (Vol. 4, pp. 2535-2540). IEEE.
    [http://ieeexplore.ieee.org/document/614693/]

2003

  1. Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of machine learning research, 3(Feb), 1137-1155.
    [http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf]
  2. Pascanu, R., Mikolov, T., & Bengio, Y. (2013, February). On the difficulty of training recurrent neural networks. In International Conference on Machine Learning (pp. 1310-1318).
    [http://arxiv.org/abs/1211.5063]

2010

  1. Sudoh, K., Duh, K., Tsukada, H., Hirao, T., & Nagata, M. (2010, July). Divide and translate: improving long distance reordering in statistical machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (pp. 418-427). Association for Computational Linguistics.
    [https://dl.acm.org/citation.cfm?id=1868912]

2013

  1. Kalchbrenner, N., & Blunsom, P. (2013, October). Recurrent Continuous Translation Models. In EMNLP (Vol. 3, No. 39, p. 413).
    [https://www.researchgate.net/publication/289758666_Recurrent_continuous_translation_models]

2014

  1. Mnih, V., Heess, N., & Graves, A. (2014). Recurrent models of visual attention. In Advances in neural information processing systems (pp. 2204-2212)
    [http://arxiv.org/abs/1406.6247]
  2. Sutskever, I., Vinyals, O., & Le, Q. V. Sequence to sequence learning with neural networks. In Advances in neural information processing systems(pp. 3104-3112).
    [https://arxiv.org/abs/1409.3215]
  3. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. . Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
    [http://arxiv.org/abs/1406.1078]
  4. Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    [https://arxiv.org/abs/1409.0473]
  5. Jean, S., Cho, K., Memisevic, R., & Bengio, Y. (2014). On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007.
    [http://arxiv.org/abs/1412.2007]
  6. Luong, M. T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2014). Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206.
    [http://arxiv.org/abs/1410.8206]

2015

  1. Sennrich, R., Haddow, B., & Birch, A. (2015). Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709.
    [http://arxiv.org/abs/1511.06709]
  2. Dong, D., Wu, H., He, W., Yu, D., & Wang, H. (2015). Multi-Task Learning for Multiple Language Translation. In ACL (1) (pp. 1723-1732).
    [http://www.anthology.aclweb.org/P/P15/P15-1166.pdf]
  3. Shen, S., Cheng, Y., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2015). Minimum risk training for neural machine translation. arXiv preprint arXiv:1512.02433.
    [https://arxiv.org/abs/1512.02433]
  4. Bojar O, Chatterjee R, Federmann C, et al. Findings of the 2015 Workshop on Statistical Machine Translation[C]. Tech Workshop on Statistical Machine Translation,2015.
    [https://www-test.pure.ed.ac.uk/portal/files/23139669/W15_3001.pdfv]

2016

  1. Facebook:Convolutional Sequence to Sequence Learning Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin
    [https://arxiv.org/abs/1705.03122]
  2. Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … & Klingner, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
    [https://arxiv.org/abs/1609.08144v1]
  3. Gehring, J., Auli, M., Grangier, D., & Dauphin, Y. N. (2016). A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344.
    [https://arxiv.org/abs/1611.02344]
  4. Cheng, Y., Xu, W., He, Z., He, W., Wu, H., Sun, M., & Liu, Y. (2016). Semi-supervised learning for neural machine translation. arXiv preprint arXiv:1606.04596.
    [http://arxiv.org/abs/1606.04596]
  5. Wang, M., Lu, Z., Li, H., & Liu, Q. (2016). Memory-enhanced decoder for neural machine translation. arXiv preprint arXiv:1606.02003.
    [https://arxiv.org/abs/1606.02003]
  6. Sennrich, R., & Haddow, B. (2016). Linguistic input features improve neural machine translation. arXiv preprint arXiv:1606.02892.
    [http://arxiv.org/abs/1606.02892]
  7. Tu, Z., Lu, Z., Liu, Y., Liu, X., & Li, H. (2016). Modeling coverage for neural machine translation. arXiv preprint arXiv:1601.04811.
    [http://arxiv.org/abs/1601.04811]
  8. Cohn, T., Hoang, C. D. V., Vymolova, E., Yao, K., Dyer, C., & Haffari, G. (2016). Incorporating structural alignment biases into an attentional neural translation model. arXiv preprint arXiv:1601.01085.
    [http://www.m-mitchell.com/NAACL-2016/NAACL-HLT2016/pdf/N16-1102.pdf]
  9. Hitschler, J., Schamoni, S., & Riezler, S. (2016). Multimodal pivots for image caption translation. arXiv preprint arXiv:1601.03916.
    [https://arxiv.org/abs/1601.03916]
  10. Junczys-Dowmunt, M., Dwojak, T., & Hoang, H. (2016). Is neural machine translation ready for deployment. A case study on, 30.
    [https://arxiv.org/abs/1610.01108]
  11. Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., … & Hughes, M. (2016). Google』s multilingual neural machine translation system: enabling zero-shot translation. arXiv preprint arXiv:1611.04558.
    [https://arxiv.org/abs/1611.04558]
  12. Bartolome, Diego, and Gema Ramirez.「Beyond the Hype of Neural Machine Translation,」MIT Technology Review (May 23, 2016), bit.ly/2aG4bvR.
    [https://www.slideshare.net/TAUS/beyond-the-hype-of-neural-machine-translation-diego-bartolome-tauyou-and-gema-ramirez-prompsit-language-engineering]
  13. Crego, J., Kim, J., Klein, G., Rebollo, A., Yang, K., Senellart, J., … & Enoue, S. (2016). SYSTRAN』s Pure Neural Machine Translation Systems. arXiv preprint arXiv:1610.05540.
    [https://arxiv.org/abs/1610.05540]

2017

  1. Google:Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
    [http://arxiv.org/abs/1706.03762]
  2. Microsoft: Neural Phrase-based Machine Translation Po-Sen Huang, Chong Wang, Dengyong Zhou, Li Deng
    [http://arxiv.org/abs/1706.05565]
  3. A Neural Network for Machine Translation, at Production Scale. (2017). Research Blog. Retrieved 26 July 2017, from [https://research.googleblog.com/2016/09/a-neural-network-for-machine.html]
    [http://www.googblogs.com/a-neural-network-for-machine-translation-at-production-scale/]
  4. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional Sequence to Sequence Learning. arXiv preprint arXiv:1705.03122.
    [https://arxiv.org/abs/1705.03122]
  5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention Is All You Need. arXiv preprint arXiv:1706.03762.
    [https://arxiv.org/abs/1706.03762]
  6. Train Neural Machine Translation Models with Sockeye | Amazon Web Services. (2017). Amazon Web Services. Retrieved 26 July 2017, from
    [https://aws.amazon.com/blogs/ai/train-neural-machine-translation-models-with-sockeye/]
  7. Dandekar, N. (2017). How does an attention mechanism work in deep learning for natural language processing?. Quora. Retrieved 26 July 2017, from
    [https://www.quora.com/How-does-an-attention-mechanism-work-in-deep-learning-for-natural-language-processing]
  8. Microsoft Translator launching Neural Network based translations for all its speech languages. (2017). Translator. Retrieved 27 July 2017, from
    [https://blogs.msdn.microsoft.com/translation/2016/11/15/microsoft-translator-launching-neural-network-based-translations-for-all-its-speech-languages/]
  9. ACL 2017. (2017). Accepted Papers, Demonstrations and TACL Articles for ACL 2017. [online] Available at:
    [https://chairs-blog.acl2017.org/2017/04/05/accepted-papers-and-demonstrations/] [Accessed 7 Aug. 2017].

2018

  1. Miguel Domingo, Álvaro Peris and Francisco Casacuberta. 2018. Segment-based interactive-predictive machine translation. Machine Translation.[https://www.researchgate.net/publication/322275484_Segment-based_interactive-predictive_machine_translation] [Citation: 2]

  2. Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018. No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling. In Proceedings of ACL 2018.[http://aclweb.org/anthology/P18-1083] [Citation: 10]

  3. Arun Tejasvi Chaganty, Stephen Mussman, and Percy Liang. 2018. The price of debiasing automatic metrics in natural language evaluation.[https://arxiv.org/pdf/1807.02202] [In Proceedings of ACL 2018.]

  4. Xin Wang, Wenhu Chen, Yuan-Fang Wang, and William Yang Wang. 2018. No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling. In Proceedings of ACL 2018. (Citation: 10)

  5. Arun Tejasvi Chaganty, Stephen Mussman, and Percy Liang. 2018. The price of debiasing automatic metrics in natural language evaluation. In Proceedings of ACL 2018.

  6. Lukasz Kaiser, Aidan N. Gomez, and Francois Chollet. 2018. Depthwise Separable Convolutions for Neural Machine Translation. In Proceedings of ICLR 2018. (Citation: 27)

  7. Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu. 2018. Dense Information Flow for Neural Machine Translation. In Proceedings of NAACL 2018. (Citation: 3)

  8. Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, and Ming Zhou. 2018. Generative Bridging Network for Neural Sequence Prediction. In Proceedings of NAACL 2018. (Citation: 3)

  9. Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Zhifeng Chen, Yonghui Wu, and Macduff Hughes. 2018. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In Proceedings of ACL 2018. (Citation: 22)

  10. Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan, and Hermann Ney. 2018. Neural Hidden Markov Model for Machine Translation. In Proceedings of ACL 2018. (Citation: 3)

  11. Jingjing Gong, Xipeng Qiu, Shaojing Wang, and Xuanjing Huang. 2018. Information Aggregation via Dynamic Routing for Sequence Encoding. In COLING 2018.

  12. Qiang Wang, Fuxue Li, Tong Xiao, Yanyang Li, Yinqiao Li, and Jingbo Zhu. 2018. Multi-layer Representation Fusion for Neural Machine Translation. In Proceedings of COLING 2018 .

  13. Yachao Li, Junhui Li, and Min Zhang. 2018. Adaptive Weighting for Neural Machine Translation. In Proceedings of COLING 2018 .

  14. Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu. 2018. Double Path Networks for Sequence to Sequence Learning. In Proceedings of COLING 2018 .

  15. Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Shuming Shi, and Tong Zhang. 2018. Exploiting Deep Representations for Neural Machine Translation. In Proceedings of EMNLP 2018 . (Citation: 1)

  16. Biao Zhang, Deyi Xiong, Jinsong Su, Qian Lin, and Huiji Zhang. 2018. Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks. In Proceedings of EMNLP 2018 .

  17. Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of EMNLP 2018 . (Citation: 6)

  18. Ke Tran, Arianna Bisazza, and Christof Monz. 2018. The Importance of Being Recurrent for Modeling Hierarchical Structure. In Proceedings of EMNLP 2018 . (Citation: 6)

  19. Parnia Bahar, Christopher Brix, and Hermann Ney. 2018. Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation. In Proceedings of EMNLP 2018 . (Citation: 1)

  20. Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu. 2018. Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation. In Proceedings of NeurIPS 2018 . (Citation: 2)

  21. Harshil Shah and David Barber. 2018. Generative Neural Machine Translation. In Proceedings of NeurIPS 2018 .

  22. Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, and Ming Zhou. 2018. Achieving Human Parity on Automatic Chinese to English News Translation. Technical report. Microsoft AI & Research. (Citation: 41)

  23. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding. In Proceedings of AAAI 2018 . (Citation: 60)

  24. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, and Chengqi Zhang. 2018. Bi-directional Block Self-attention for Fast and Memory-efficient Sequence Modeling. In Proceedings of ICLR 2018 . (Citation: 13)

  25. Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, Chengqi Zhang. 2018. Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling. In Proceedings of IJCAI 2018 . (Citation: 18)

  26. Peter Shaw, Jakob Uszkorei, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of NAACL 2018 . (Citation: 24)

  27. Lesly Miculicich Werlen, Nikolaos Pappas, Dhananjay Ram, and Andrei Popescu-Belis. 2018. Self-Attentive Residual Decoder for Neural Machine Translation. In Proceedings of NAACL 2018 . (Citation: 3)

  28. Xintong Li, Lemao Liu, Zhaopeng Tu, Shuming Shi, and Max Meng. 2018. Target Foresight Based Attention for Neural Machine Translation. In Proceedings of NAACL 2018 .

  29. Biao Zhang, Deyi Xiong, and Jinsong Su. 2018. Accelerating Neural Transformer via an Average Attention Network. In Proceedings of ACL 2018 . (Citation: 5)

  30. Tobias Domhan. 2018. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures. In Proceedings of ACL 2018 . (Citation: 3)

  31. Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, and Deyi Xiong. 2018. Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings. In Proceedings of ACL 2018 . (Citation: 1)

  32. Chaitanya Malaviya, Pedro Ferreira, and André F. T. Martins. 2018. Sparse and Constrained Attention for Neural Machine Translation. In Proceedings of ACL 2018 . (Citation: 4)

  33. Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-Head Attention with Disagreement Regularization. In Proceedings of EMNLP 2018 . (Citation: 1)

  34. Wei Wu, Houfeng Wang, Tianyu Liu and Shuming Ma. 2018. Phrase-level Self-Attention Networks for Universal Sentence Encoding. In Proceedings of EMNLP 2018 .

  35. Baosong Yang, Zhaopeng Tu, Derek F. Wong, Fandong Meng, Lidia S. Chao, and Tong Zhang. 2018. Modeling Localness for Self-Attention Networks. In Proceedings of EMNLP 2018 . (Citation: 2)

  36. Junyang Lin, Xu Sun, Xuancheng Ren, Muyu Li, and Qi Su. 2018. Learning When to Concentrate or Divert Attention: Self-Adaptive Attention Temperature for Neural Machine Translation. In Proceedings of EMNLP 2018 .

  37. Shiv Shankar, Siddhant Garg, and Sunita Sarawagi. 2018. Surprisingly Easy Hard-Attention for Sequence to Sequence Learning. In Proceedings of EMNLP 2018 .

  38. Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, and Yonghui Wu. 2018. Training Deeper Neural Machine Translation Models with Transparent Attention. In Proceedings of EMNLP 2018 .

  39. Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, and Pascal Poupart. 2018. Variational Attention for Sequence-to-Sequence Models. In Proceedings of COLING 2018 . (Citation: 14)

  40. Maha Elbayad, Laurent Besacier, and Jakob Verbeek. 2018. Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction. In Proceedings of CoNLL 2018 . (Citation: 4)

  41. Yuntian Deng, Yoon Kim, Justin Chiu, Demi Guo, and Alexander M. Rush. 2018 Latent Alignment and Variational Attention. In Proceedings of NeurIPS 2018 . (Citation)

  42. Peyman Passban, Qun Liu, and Andy Way. 2018. Improving Character-Based Decoding Using Target-Side Morphological Information for Neural Machine Translation. In Proceedings of NAACL 2018 . (Citation: 5)

  43. Huadong Chen, Shujian Huang, David Chiang, Xinyu Dai, and Jiajun Chen. 2018. Combining Character and Word Information in Neural Machine Translation Using a Multi-Level Attention. In Proceedings of NAACL 2018 .

  44. Frederick Liu, Han Lu, and Graham Neubig. 2018. Handling Homographs in Neural Machine Translation. In Proceedings of NAACL 2018 . (Citation: 8)

  45. Taku Kudo. 2018. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In Proceedings of ACL 2018 . (Citation: 17)

  46. Makoto Morishita, Jun Suzuki, and Masaaki Nagata. 2018. Improving Neural Machine Translation by Incorporating Hierarchical Subword Features. In Proceedings of COLING 2018 .

  47. Yang Zhao, Jiajun Zhang, Zhongjun He, Chengqing Zong, and Hua Wu. 2018. Addressing Troublesome Words in Neural Machine Translation. In Proceedings of EMNLP 2018 .

  48. Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, and Wolfgang Macherey. 2018. Revisiting Character-Based Neural Machine Translation with Capacity and Compression. In Proceedings of EMNLP 2018 . (Citation: 1)

  49. Rebecca Knowles and Philipp Koehn. 2018. Context and Copying in Neural Machine Translation. In Proceedings of EMNLP 2018 .

  50. Sergey Edunov, Myle Ott, Michael Auli, David Grangier, and Marc’Aurelio Ranzato. 2018. Classical Structured Prediction Losses for Sequence to Sequence Learning. In Proceedings of NAACL 2018 . (Citation: 20)

  51. Zihang Dai, Qizhe Xie, and Eduard Hovy. 2018. From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction. In Proceedings of ACL 2018 . (Citation: 1)

  52. Zhen Yang, Wei Chen, Feng Wang, and Bo Xu. 2018. Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets. In Proceedings of NAACL 2018 . (Citation: 43)

  53. Kevin Clark, Minh-Thang Luong, Christopher D. Manning, and Quoc Le. 2018. Semi-Supervised Sequence Modeling with Cross-View Training. In Proceedings of EMNLP 2018 .

  54. Lijun Wu, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. A Study of Reinforcement Learning for Neural Machine Translation. In Proceedings of EMNLP 2018 . (Citation: 2)

  55. Jason Lee, Elman Mansimov, and Kyunghyun Cho. 2018. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement. In Proceedings of EMNLP 2018 .

  56. Semih Yavuz, Chung-Cheng Chiu, Patrick Nguyen, and Yonghui Wu. 2018. CaLcs: Continuously Approximating Longest Common Subsequence for Sequence Level Optimization. In Proceedings of EMNLP 2018 .

  57. Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jianhuang Lai, and Tie-Yan Liu. 2018. Learning to Teach with Dynamic Loss Functions. In Proceedings of NeurIPS 2018 .

  58. Jiatao Gu, James Bradbury, Caiming Xiong, Victor O.K. Li, and Richard Socher. 2018. Non-Autoregressive Neural Machine Translation. In Proceedings of ICLR 2018 . (Citation: 23)

  59. Łukasz Kaiser, Aurko Roy, Ashish Vaswani, Niki Parmar, Samy Bengio, Jakob Uszkoreit, and Noam Shazeer. 2018. Fast Decoding in Sequence Models Using Discrete Latent Variables. In Proceedings of ICML 2018 . (Citation: 3)

  60. Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, and Hongji Wang. 2018. Asynchronous Bidirectional Decoding for Neural Machine Translation. In Proceedings of AAAI 2018 . (Citation: 10)

  61. Jiatao Gu, Daniel Jiwoong Im, and Victor O.K. Li. 2018. Neural machine translation with gumbel-greedy decoding. In Proceedings of AAAI 2018 . (Citation: 5)

  62. Philip Schulz, Wilker Aziz, and Trevor Cohn. 2018. A Stochastic Decoder for Neural Machine Translation. In Proceedings of ACL 2018 . (Citation: 3)

  63. Raphael Shu and Hideki Nakayama. 2018. Improving Beam Search by Removing Monotonic Constraint for Neural Machine Translation. In Proceedings of ACL 2018 .

  64. Junyang Lin, Xu Sun, Xuancheng Ren, Shuming Ma, Jinsong Su, and Qi Su. 2018. Deconvolution-Based Global Decoding for Neural Machine Translation. In Proceedings of COLING 2018 . (Citation: 2)

  65. Chunqi Wang, Ji Zhang, and Haiqing Chen. 2018. Semi-Autoregressive Neural Machine Translation. In Proceedings of EMNLP 2018 .

  66. Xinwei Geng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. Adaptive Multi-pass Decoder for Neural Machine Translation. In Proceedings of EMNLP 2018 .

  67. Wen Zhang, Liang Huang, Yang Feng, Lei Shen, and Qun Liu. 2018. Speeding Up Neural Machine Translation Decoding by Cube Pruning. In Proceedings of EMNLP 2018 .

  68. Xinyi Wang, Hieu Pham, Pengcheng Yin, and Graham Neubig. 2018. A Tree-based Decoder for Neural Machine Translation. In Proceedings of EMNLP 2018 . (Citation: 1)

  69. Chenze Shao, Xilin Chen, and Yang Feng. 2018. Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation. In Proceedings of EMNLP 2018 .

  70. Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, and Hai Zhao. 2018. Exploring Recombination for Efficient Decoding of Neural Machine Translation. In Proceedings of EMNLP 2018 .

  71. Jetic Gū, Hassan S. Shavarani, and Anoop Sarkar. 2018. Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing. In Proceedings of EMNLP 2018 .

  72. Yilin Yang, Liang Huang, and Mingbo Ma. 2018. Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation. In Proceedings of EMNLP 2018 . (Citation: 3)

  73. Yun Chen, Victor O.K. Li, Kyunghyun Cho, and Samuel R. Bowman. 2018. A Stable and Effective Learning Strategy for Trainable Greedy Decoding. In Proceedings of EMNLP 2018 .

2019

  1. Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. compare-mt: A Tool for Holistic Comparison of Language Generation Systems. In Proceedings of NAACL 2019 .
  2. Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios Avramidis, and Sebastian Möller. 2019. Train, Sort, Explain: Learning to Diagnose Translation Models. In Proceedings of NAACL 2019 .
  3. Nitika Mathur, Timothy Baldwin, and Trevor Cohn. 2019. Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation. In Proceedings of ACL 2019 .
  4. Prathyusha Jwalapuram, Shafiq Joty, Irina Temnikova, and Preslav Nakov. 2019. Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite. In Proceedings of ACL 2019 .
  5. Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks. In Proceedings of ICLR 2019 .
  6. Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin, and Michael Auli. 2019. Pay Less Attention with Lightweight and Dynamic Convolutions. In Proceedings of ICLR 2019 . (Citation: 1)
  7. Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, Lukasz Kaiser. 2019. Universal Transformers. In Proceedings of ICLR 2019 . (Citation: 12)
  8. Zi-Yi Dou, Zhaopeng Tu, Xing Wang, Longyue Wang, Shuming Shi, and Tong Zhang. 2019. Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement. In Proceedings of AAAI 2019 .
  9. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. In Proceedings of ACL 2019 . (Citation: 8)
  10. Wenpeng Yin and Hinrich Schütze. 2019. Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms. Transactions of the Association for Computational Linguistics .
  11. Shiv Shankar and Sunita Sarawagi. 2019. Posterior Attention Models for Sequence to Sequence Learning. In Proceedings of ICLR 2019 .
  12. Baosong Yang, Jian Li, Derek Wong, Lidia S. Chao, Xing Wang, and Zhaopeng Tu. 2019. Context-Aware Self-Attention Networks. In Proceedings of AAAI 2019 .
  13. Reza Ghaeini, Xiaoli Z. Fern, Hamed Shahbazi, and Prasad Tadepalli. 2019. Saliency Learning: Teaching the Model Where to Pay Attention. In Proceedings of NAACL 2019 .
  14. Sameen Maruf, André F. T. Martins, and Gholamreza Haffari. 2019. Selective Attention for Context-aware Neural Machine Translation. In Proceedings of NAACL 2019 .
  15. Sainbayar Sukhbaatar, Edouard Grave, Piotr Bojanowski, and Armand Joulin. 2019. Adaptive Attention Span in Transformers. In Proceedings of ACL 2019 .
  16. Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. Multi-Agent Dual Learning. In Proceedings of ICLR 2019 .
  17. Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, and Lawrence Carin. 2019. Improving Sequence-to-Sequence Learning via Optimal Transport. In Proceedings of ICLR 2019 .
  18. Sachin Kumar and Yulia Tsvetkov. 2019. Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs. In Proceedings of ICLR 2019 . 19 Xing Niu, Weijia Xu, and Marine Carpuat. 2019. Bi-Directional Differentiable Input Reconstruction for Low-Resource Neural Machine Translation. In Proceedings of NAACL 2019 .
  19. Weijia Xu, Xing Niu, and Marine Carpuat. 2019. Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation. In Proceedings of NAACL 2019 .
  20. Junliang Guo, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. 2019. Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input. In Proceedings of AAAI 2019 . (Citation: 2)
  21. Yiren Wang, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. 2019. Non-Autoregressive Machine Translation with Auxiliary Regularization. In Proceedings of AAAI 2019 .
  22. Wouter Kool, Herke van Hoof, and Max Welling. 2019. Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. In Proceedings of ICML 2019 .
  23. Ashwin Kalyan, Peter Anderson, Stefan Lee, and Dhruv Batra. 2019. Trainable Decoding of Sets of Sequences for Neural Sequence Models. In Proceedings of ICML 2019 .
  24. Eldan Cohen and Christopher Beck. 2019. Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models. In Proceedings of ICML 2019 .
  25. Kartik Goyal, Chris Dyer, and Taylor Berg-Kirkpatrick. 2019. An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search. In Proceedings of NAACL 2019 .
  26. Rico Sennrich and Biao Zhang. 2019. Revisiting Low-Resource Neural Machine Translation: A Case Study. In Proceedings of ACL 2019 .
  27. Shuo Wang, Yang Liu, Chao Wang, Huanbo Luan, and Maosong Sun. 2019. Improving Back-Translation with Uncertainty-based Confidence Estimation. In Proceedings of EMNLP 2019 .
  28. Jiawei Wu, Xin Wang, and William Yang Wang. 2019. Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation. In Proceedings of NAACL 2019 .
  29. Nima Pourdamghani, Nada Aldarrab, Marjan Ghazvininejad, Kevin Knight, and Jonathan May. 2019. Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation. In Proceedings of ACL 2019 .
  30. Jiaming Luo, Yuan Cao, and Regina Barzilay. 2019. Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B. In Proceedings of ACL 2019 .
  31. Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, and Tie-Yan Liu. 2019. Unsupervised Pivot Translation for Distant Languages. In Proceedings of ACL 2019 .
  32. Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. An Effective Approach to Unsupervised Machine Translation. In Proceedings of ACL 2019 .
  33. Mengzhou Xia, Xiang Kong, Antonios Anastasopoulos, and Graham Neubig. 2019. Generalized Data Augmentation for Low-Resource Translation. In Proceedings of ACL 2019 .
  34. Jinhua Zhu, Fei Gao, Lijun Wu, Yingce Xia, Tao Qin, Wengang Zhou, Xueqi Cheng, and Tie-Yan Liu. 2019. Soft Contextual Data Augmentation for Neural Machine Translation. In Proceedings of ACL 2019 .
  35. Chunting Zhou, Xuezhe Ma, Junjie Hu, and Graham Neubig. 2019. Handling Syntactic Divergence in Low-resource Machine Translation. In Proceedings of EMNLP 2019 .
  36. Yuanpeng Li, Liang Zhao, Jianyu Wang, and Joel Hestness. 2019. Compositional Generalization for Primitive Substitutions. In Proceedings of EMNLP 2019 .
  37. Yunsu Kim, Petre Petrov, Pavel Petrushkov, Shahram Khadivi, and Hermann Ney. 2019. Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages. In Proceedings of EMNLP 2019 .
  38. Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies. In Proceedings of ACL 2019 .
  39. Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, and Tie-Yan Liu. 2019. Multilingual Neural Machine Translation with Knowledge Distillation. In Proceedings of ICLR 2019 .
  40. Xinyi Wang, Hieu Pham, Philip Arthur, and Graham Neubig. 2019. Multilingual Neural Machine Translation With Soft Decoupled Encoding. In Proceedings of ICLR 2019 .
  41. Maruan Al-Shedivat and Ankur P. Parikh. 2019. Consistency by Agreement in Zero-shot Neural Machine Translation. In Proceedings of NAACL 2019 .
  42. Roee Aharoni, Melvin Johnson, and Orhan Firat. 2019. Massively Multilingual Neural Machine Translation. In Proceedings of NAACL 2019 .
  43. Yunsu Kim, Yingbo Gao, and Hermann Ney. 2019. Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies. In Proceedings of ACL 2019 .
  44. Inigo Jauregi Unanue, Ehsan Zare Borzeshi, Nazanin Esmaili, and Massimo Piccardi. ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems. In Proceedings of NAACL 2019 .
  45. Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, and Min Zhang. 2019. Code-Switching for Enhancing NMT with Pre-Specified Translation. In Proceedings of NAACL 2019 .
  46. Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, and Jingbo Zhu. 2019. Shared-Private Bilingual Word Embeddings for Neural Machine Translation. In Proceedings of ACL 2019 .
  47. Georgiana Dinu, Prashant Mathur, Marcello Federico, and Yaser Al-Onaizan. 2019. Training Neural Machine Translation to Apply Terminology Constraints. In Proceedings of ACL 2019 .
  48. Rudra Murthy V, Anoop Kunchukuttan, and Pushpak Bhattacharyya. 2019. Addressing word-order Divergence in Multilingual Neural Machine Translation for extremely Low Resource Languages. In Proceedings of NAACL 2019 .
  49. Meishan Zhang, Zhenghua Li, Guohong Fu, and Min Zhang. 2019. Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations. In Proceedings of NAACL 2019 .
  50. Linfeng Song, Daniel Gildea, Yue Zhang, Zhiguo Wang, and Jinsong Su. 2019. Semantic Neural Machine Translation Using AMR. Transactions of the Association for Computational Linguistics .
  51. Nader Akoury, Kalpesh Krishna, and Mohit Iyyer. 2019. Syntactically Supervised Transformers for Faster Neural Machine Translation. In Proceedings of ACL 2019 .
  52. Antonios Anastasopoulos, Alison Lui, Toan Nguyen, and David Chiang. 2019. Neural Machine Translation of Text from Non-Native Speakers. In Proceedings of NAACL 2019 .
  53. Paul Michel, Xian Li, Graham Neubig, and Juan Miguel Pino. 2019. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models. In Proceedings of NAACL 2019 .
  54. Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, and Graham Neubig. 2019. Improving Robustness of Machine Translation with Synthetic Noise. In Proceedings of NAACL 2019 .
  55. Yong Cheng, Lu Jiang, and Wolfgang Macherey. 2019. Robust Neural Machine Translation with Doubly Adversarial Inputs. In Proceedings of ACL 2019 .
  56. Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, and Zhongjun He. 2019. Robust Neural Machine Translation with Joint Textual and Phonetic Embedding. In Proceedings of ACL 2019 .
  57. Yonatan Belinkov, and James Glass. 2019. Analysis Methods in Neural Language Processing: A Survey. Transactions of the Association for Computational Linguistics .
  58. Sofia Serrano and Noah A. Smith. 2019. Is Attention Interpretable?. In Proceedings of ACL 2019 .
  59. Elena Voita, David Talbot, Fedor Moiseev, Rico Sennrich, and Ivan Titov. 2019. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. In Proceedings of ACL 2019 .
  60. Joris Baan, Jana Leible, Mitja Nikolaus, David Rau, Dennis Ulmer, Tim Baumgärtner, Dieuwke Hupkes, and Elia Bruni. 2019. On the Realization of Compositionality in Neural Networks. In Proceedings of ACL 2019 .
  61. Jesse Vig and Yonatan Belinkov. 2019. Analyzing the Structure of Attention in a Transformer Language Model. In Proceedings of ACL 2019 .
  62. Ashwin Kalyan, Peter Anderson, Stefan Lee, and Dhruv Batra. 2019. Trainable Decoding of Sets of Sequences for Neural Sequence Models. In Proceedings of ICML 2019 .
  63. Tianxiao Shen, Myle Ott, Michael Auli, and Marc’Aurelio Ranzato. 2019. Mixture Models for Diverse Machine Translation: Tricks of the Trade. In Proceedings of ICML 2019 .
  64. Wouter Kool, Herke van Hoof, and Max Welling. 2019. Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement. In Proceedings of ICML 2019 .
  65. Won Ik Cho, Ji Won Kim, Seok Min Kim, and Nam Soo Kim. 2019. On Measuring Gender Bias in Translation of Gender-neutral Pronouns. In Proceedings of ACL 2019 .
  66. Gabriel Stanovsky, Noah A. Smith, and Luke Zettlemoyer. 2019. Evaluating Gender Bias in Machine Translation. In Proceedings of ACL 2019 .
  67. Guillaume Lample and Alexis Conneau. 2019. Cross-lingual Language Model Pretraining. arXiv:1901.07291 . (Citation: 3)
  68. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL 2019 . (Citation: 292)
  69. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. Technical Report, OpenAI. (Citation: 9) 71.Sergey Edunov, Alexei Baevski, and Michael Auli. 2019. Pre-trained Language Model Representations for Language Generation. In Proceedings of NAACL 2019 . (Citation: 1)
  70. Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. In Proceedings of ICML 2019 .
  71. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv:1906.08237 .
  72. Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit. 2019. Insertion Transformer: Flexible Sequence Generation via Insertion Operations. In Proceedings of ICML 2019.
  73. Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, and Sharon Goldwater. 2019. Pre-training on high-resource speech recognition improves low-resource speech-to-text translation. In Proceedings of NAACL 2019 .
  74. Nikolai Vogler, Craig Stewart, and Graham Neubig. 2019. Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation. In Proceedings of NAACL 2019 .
  75. Elizabeth Salesky, Matthias Sperber, and Alex Waibel. 2019. Fluent Translations from Disfluent Speech in End-to-End Speech Translation. In Proceedings of NAACL 2019 .
  76. Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, and Colin Raffel. 2019. Monotonic Infinite Lookback Attention for Simultaneous Machine Translation. In Proceedings of ACL 2019 .

Tutorial

  1. ACL 2016 Tutorial -- Neural Machine Translation Lmthang在ACL 2016上所做的tutorial [http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf]
  2. 神经机器翻译前沿进展 由清华大学的刘洋老师在第十二届全国机器翻译讨论会(2016年8月在乌鲁木齐举办)上做的报告 [http://nlp.csai.tsinghua.edu.cn/~ly/talks/cwmt2016_ly_v3_160826.pptx]
  3. CCL2016 | T1B: 深度学习与机器翻译 第十五届全国计算语言学会议(CCL 2016) [http://www.cips-cl.org/static/CCL2016/tutorialsT1B.html]
  4. Neural Machine Translation [http://statmt.org/mtma16/uploads/mtma16-neural.pdf]
  5. ACL2016上Thang Luong,Kyunghyun Cho和Christopher Manning的讲习班 [https://sites.google.com/site/acl16nmt/]
  6. Kyunghyun Cho的talk : New Territory of Machine Translation,主要是讲cho自己所关注的NMT问题 [https://drive.google.com/file/d/0B16RwCMQqrtdRVotWlQ3T2ZXTmM/view]

视频教程

  1. cs224d neural machine translation [https://cs224d.stanford.edu/lectures/CS224d-Lecture15.pdf] [https://www.youtube.com/watch?v=IxQtK2SjWWM&index=11&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6]
  2. 清华大学刘洋:基于深度学习的机器翻译
  3. A Practical Guide to Neural Machine Translation [https://www.youtube.com/watch?v=vxibD6VaOfI]

代码

  1. seq2seq 实现了谷歌提出的seq2seq模型,基于TensorFlow框架开发。 [https://github.com/tensorflow/tensorflow]
  2. nmt.matlab 由Stanford的博士Lmthang开源的,代码由Matlab所写。[https://github.com/lmthang/nmt.matlab]
  3. GroundHog 实现了基于注意力机制的神经机器翻译模型,由Bengio研究组,基于Theano框架开发。 [https://github.com/lisa-groundhog/GroundHog]
  4. NMT-Coverage 实现了基于覆盖率的神经机器翻译模型,由华为诺亚方舟实验室李航团队,基于Theano框架开发。 [https://github.com/tuzhaopeng/NMT-Coverage]
  5. OpenNMT 由哈佛大学NLP组开源的神经机器翻译工具包,基于Torch框架开发,达到工业级程度。 [http://opennmt.net/]
  6. EUREKA-MangoNMT 由中科院自动化所的张家俊老师开发,采用C++。 [https://github.com/jiajunzhangnlp/EUREKA-MangoNMT]
  7. dl4mt-tutorial 基于Theano框架开发。 [https://github.com/nyu-dl/dl4mt-tutorial]

领域专家

  1. Université de Montréal: Yoshua Bengio,Dzmitry Bahdanau
  2. New York University: KyungHyun Cho
  3. Stanford University: Manning,Lmthang
  4. Google: IIya Sutskever,Quoc V.Le
  5. 中科院计算所: 刘群
  6. 东北大学: 朱靖波
  7. 清华大学: 刘洋
  8. 中科院自动化所: 宗成庆,张家俊
  9. 苏州大学: 熊德意,张民
  10. 华为-诺亚方舟: 李航,涂兆鹏
  11. 百度: 王海峰,吴华

初步版本,水平有限,有错误或者不完善的地方,欢迎大家提建议和补充,会一直保持更新,本文为专知内容组原创内容,未经允许不得转载,如需转载请发送邮件至fangquanyi@gmail.com 或 联系微信专知小助手(Rancho_Fang)

敬请关注http://www.zhuanzhi.ai 和关注专知公众号,获取第一手AI相关知识

VIP内容

【导读】ACL-IJCNLP 2021是CCF A类会议,是人工智能领域自然语言处理( Natural Language Processing,NLP)方向最权威的国际会议。ACL2021计划于今年8月1日-8月6日以线上会议形式召开. 最近字节跳动AI实验室总监李磊重返学术界,进入加州大学圣巴巴拉分校担任助理教授。他和王明轩给了关于预训练时代机器翻译的教程,非常值得关注!

预训练是自然语言处理(NLP)[28,8,20]、计算机视觉(CV)[12,34]和自动语音识别(ASR)[3,6,24]的主导范式。通常,首先对模型进行大量未标记数据的预训练,以捕获丰富的输入表示,然后通过提供上下文感知的输入表示,或初始化下游模型的参数进行微调,将模型应用于下游任务。最近,自监督的预训练和任务特定的微调范式终于完全达到了神经机器翻译(NMT)[37,35,5]。

尽管取得了成功,但在NMT中引入一个通用的预训练模型并非易事,而且不一定会产生有希望的结果,特别是对于资源丰富的环境。在几个方面仍然存在独特的挑战。首先,大多数预训练方法的目标不同于下游的NMT任务。例如,BERT[8]是一种流行的预训练模型,其设计目的是仅使用一个转换器编码器进行语言理解,而NMT模型通常由一个编码器和一个解码器组成,以执行跨语言生成。这一差距使得运用NMT[30]的预训练不够可行。此外,机器翻译本身就是一个多语言问题,但一般的NLP预训练方法主要集中在英语语料库上,如BERT和GPT。鉴于迁移学习在多语言机器翻译中的成功,对NMT[7]进行多语言预训练是非常有吸引力的。最后,语音翻译近年来受到了广泛的关注,而大多数的预训练方法都侧重于文本表示。如何利用预训练的方法来提高口语翻译水平成为一个新的挑战。

本教程提供了一个充分利用神经机器翻译的预训练的全面指导。首先,我们将简要介绍NMT的背景、预训练的方法,并指出将预训练应用于NMT的主要挑战。在此基础上,我们将着重分析预训练在提高非语言教学绩效中的作用,如何设计更好的预训练模式来执行特定的非语言教学任务,以及如何更好地将预训练模式整合到非语言教学系统中。在每一部分中,我们将提供例子,讨论训练技巧,并分析在应用预训练时转移了什么。

第一个主题是NMT的单语预训练,这是研究最深入的领域之一。ELMo、GPT、MASS和BERT等单语文本表征具有优势,显著提高了各种自然语言处理任务的性能[25,8,28,30]。然而,NMT有几个明显的特点,如大的训练数据(1000万或更多)的可用性和基线NMT模型的高容量,这需要仔细设计预训练。在这一部分,我们将介绍不同的预训练方法,并分析它们在不同的机器翻译场景(如无监督的NMT、低资源的NMT和富资源的NMT)中应用的最佳实践[37,35]。我们将介绍使用各种策略对预训练的模型进行微调的技术,如知识蒸馏和适配器[4,16]。

下一个话题是NMT的多语言预训练。在此背景下,我们旨在缓解英语为中心的偏见,并建议可以建立不同语言的普遍表示,以改善大量多语言的NMT。在这部分中,我们将讨论不同语言的一般表示,并分析知识如何跨语言迁移。这将有助于更好地设计多语言预训练,特别是零样本迁移到非英语语言对[15,27,7,26,13,17,19,23,18]。

本教程的最后一个技术部分是关于NMT的预训练。特别地,我们关注于利用弱监督或无监督训练数据来改进语音翻译。在这一部分中,我们将讨论在言语和文本中建立一个一般表示的可能性。并展示了文本或音频预处理训练如何引导NMT的文本生成[33,21,32,14,22,10,9,11,36]。

在本教程的最后,我们指出了在应用NMT预训练时的最佳实践。这些主题涵盖了针对不同的NMT情景的各种预训练方法。在本教程之后,观众将理解为什么NMT预训练不同于其他任务,以及如何充分利用NMT预训练。重要的是,我们将深入分析预训练如何以及为什么在NMT中起作用,这将为未来设计特定的NMT预训练范式提供启发。

https://sites.cs.ucsb.edu/~lilei/TALKS/2021-ACL/

报告嘉宾:

李磊,加州大学圣巴巴拉分校担任助理教授,曾任字节跳动人工智能实验室总监。本科博士分别毕业于上海交通大学和卡耐基梅隆大学计算机系。曾任加州大学伯克利分校作博士后研究员和百度美国深度学习实验室少帅科学家。曾获2012年美国计算机学会SIGKDD最佳博士论文第二名、2017年吴文俊人工智能技术发明二等奖、2017年CCF杰出演讲者、2019年CCF青竹奖。在机器学习、数据挖掘和自然语言处理领域于国际顶级学术会议发表论文100余篇,拥有二十余项技术发明专利。担任CCF自然语言处理专委委员和EMNLP, NeurIPS, AAAI, IJCAI, KDD等多个会议组委成员和领域主席。

王明轩,字节跳动人工智能实验室资深研究员,博士毕业于中国科学院计算技术研究所,主要研究方向为机器翻译。主导研发了火山翻译系统,服务全球过亿用户,并多次带领团队在 WMT 机器翻译评测中拿到过冠军。在 ACL、EMNLP、NAACL 等相关领域发表论文 30 多篇。担任CCF自然语言处理专委委员和国内外多个会议组委成员。

成为VIP会员查看完整内容
0
41

最新论文

We conduct an empirical study of neural machine translation (NMT) for truly low-resource languages, and propose a training curriculum fit for cases when both parallel training data and compute resource are lacking, reflecting the reality of most of the world's languages and the researchers working on these languages. Previously, unsupervised NMT, which employs back-translation (BT) and auto-encoding (AE) tasks has been shown barren for low-resource languages. We demonstrate that leveraging comparable data and code-switching as weak supervision, combined with BT and AE objectives, result in remarkable improvements for low-resource languages even when using only modest compute resources. The training curriculum proposed in this work achieves BLEU scores that improve over supervised NMT trained on the same backbone architecture by +12.2 BLEU for English to Gujarati and +3.7 BLEU for English to Kazakh, showcasing the potential of weakly-supervised NMT for the low-resource languages. When trained on supervised data, our training curriculum achieves a new state-of-the-art result on the Somali dataset (BLEU of 29.3 for Somali to English). We also observe that adding more time and GPUs to training can further improve performance, which underscores the importance of reporting compute resource usage in MT research.

0
0
下载
预览
Top