Ng Wai Foong
1 min readMay 22, 2020

--

When you download the pre-trained model via python download_model.py 117M(replace it with the name of the model that you preferred), it contains the following files:

  1. checkpoint
  2. encoder.json
  3. hparams.json
  4. model.ckpt.data-00000-of-00001
  5. model.ckpt.index
  6. model.ckpt.meta
  7. vocab.bpe

For multi-language support, it is advisable to use a pre-trained model. You need to modify quite a lot of stuff especially the tokenization, vocab, etc. I have never tried training it for other language. You can try to search for existing model using gpt2 <language> as keyword in your browser.

I found the following for Chinese and Japanese.

Hope it helps you. Have a great day ahead!

--

--

Ng Wai Foong
Ng Wai Foong

Written by Ng Wai Foong

Senior AI Engineer@Yoozoo | Content Writer #NLP #datascience #programming #machinelearning | Linkedin: https://www.linkedin.com/in/wai-foong-ng-694619185/

No responses yet