THUMT: An Open Source Toolkit for Neural Machine Translation



THUMT is a data-driven machine translation system developed by the Natural Language Processing Group at Tsinghua University.

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

On top of Theano, THUMT is an open-source toolkit for neural machine translation with the following features:

User Manual

This user manual describes how to install and use THUMT.


This documentation provides detailed information about the functions in THUMT.


Stable version

Link Size Description Date
THUMT-v1.01.tar.gz 530K The package contains the source code of the system and example datasets 2017-07-02

Latest version

Please visit GitHub to obtain the latest version.


Version Size Updates Date
v1.01 530K Bug fix 2017-07-02
v1.0 529K First version 2017-06-20


The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email


Please cite the following paper:
Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng


If you have questions, suggestions and bug reports, please email


Q: Does THUMT support the latest version of Theano?
A: Yes. THUMT also supports Theano 0.9.0 released on 2017/03/20. We notice that there is a small problem with building the optimizer. Fortunately, this error does not affect running THUMT. We are working on solving this problem.

© 2017 Natural Language Processing and Computational Social Science Lab, Tsinghua University