Hi,
The official gpt2 github made some modification to the encoder.py file. It now accepts an extra parameter. They have removed the encode.py file as well.
def get_encoder(model_name, models_dir): with open(os.path.join(models_dir, model_name, ‘encoder.json’), ‘r’) as f: encoder = json.load(f) with open(os.path.join(models_dir, model_name, ‘vocab.bpe’), ‘r’, encoding=”utf-8") as f: bpe_data = f.read() bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split(‘\n’)[1:-1]] return Encoder( encoder=encoder, bpe_merges=bpe_merges, )
This is the one that I am using
def get_encoder(model_name): with open(os.path.join(‘models’, model_name, ‘encoder.json’), ‘r’) as f: encoder = json.load(f) with open(os.path.join(‘models’, model_name, ‘vocab.bpe’), ‘r’, encoding=”utf-8") as f: bpe_data = f.read() bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split(‘\n’)[1:-1]] return Encoder( encoder=encoder, bpe_merges=bpe_merges, )
Advisable to just clone the whole project from this github link instead of the official one.