THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University.

Implementation

THUMT has currently two main implementations:

The following table summarizes the features of two implementations:

Implementation Model Criterion Optimizer LRP
Theano RNNsearch MLE, MRT, SST SGD, Adadelta, Adam RNNsearch
TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam n/a

We recommend using THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-TensorFlow.

Downloads

Stable version

Link Size Description Date
THUMT-TensorFlow-v1.0.tar.gz 269K The package contains the source code and documentation of the TensorFlow implementation 2018-01-15
THUMT-Theano-v1.02.tar.gz 533K The package contains the source code and documentation of Theano implementation 2018-01-15

Latest version

Please visit GitHub to obtain the latest version.

History

TensorFlow Implementation:

Version Size Updates Date
v1.0 269K First version 2018-01-15

Theano Implementation:

Version Size Updates Date
v1.02 533K Add learning rate decay policies. Minor bug fix. 2018-01-15
v1.01 530K Bug fix 2017-07-02
v1.0 529K First version 2017-06-20

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email thumt17@gmail.com.

Citation

Please cite the following paper:
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

Contributors

Contact

If you have questions, suggestions and bug reports, please email thumt17@gmail.com.



© 2018 Natural Language Processing and Computational Social Science Lab, Tsinghua University