Hi,
1. The script is used to encode your dataset and save it as compressed file. You will get a npz file after the encoding. You can check the source code for the file here:
https://github.com/nshepperd/gpt-2/blob/finetuning/encode.py
2. Unfortunately, I have not train it on TPU. Hence, I can't provide an advice for your on this issue. You can raise an issue at the following github repo:
https://github.com/nshepperd/gpt-2/issues
3. Training on GPU is difficult unless you have a full-scale architecture to support it to prevent out of memory error. For CPU training, seems someone able to train the full-sized model, using Adam, on an Amazon r4.4xlarge EC2 instance (16 vCPU, 122GB RAM). It was taking approx 30-40 seconds for a single step using batch size of 1.
Also, this tutorial is quite old and some of it might be outdated. Consider checking the official repo: