THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University.

Online Demo

Please click here to play with the online demo.

Implementation

THUMT has currently two main implementations:

The following table summarizes the features of two implementations:

Implementation Model Criterion Optimizer LRP
Theano RNNsearch MLE, MRT, SST SGD, Adadelta, Adam RNNsearch
TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam RNNsearch, Transformer

We recommend using THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-TensorFlow.

Downloads

Stable version

Link Size Description Date
THUMT-TensorFlow-v1.1.tar.gz 286K The package contains the source code and documentation of the TensorFlow implementation 2018-08-04
THUMT-Theano-v1.02.tar.gz 533K The package contains the source code and documentation of Theano implementation 2018-01-15

Latest version

Please visit GitHub to obtain the latest version.

History

TensorFlow Implementation:

Version Size Updates Date
v1.1 286K add layer-wise relevance propagation 2018-08-04
v1.0 269K First version 2018-01-15

Theano Implementation:

Version Size Updates Date
v1.02 533K Add learning rate decay policies. Minor bug fix. 2018-01-15
v1.01 530K Bug fix 2017-07-02
v1.0 529K First version 2017-06-20

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com.

Citation

Please cite the following paper:
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

Contributors

Contact

If you have questions, suggestions and bug reports, please email thumt17@gmail.com.



© 2018 Natural Language Processing and Computational Social Science Lab, Tsinghua University