Hi,
There seems to be a misunderstanding here. For your first question, the purpose of the three files are as follow.
Train — This dataset is used to train the model
Dev — This dataset is used to evaluate the performance of the model
Test — This dataset is used to make prediction on. This is why I added the “Mapping results to the respective classes”. The accuracy reported during the training process are just the performance of the model based on train.tsv and dev.tsv.
As for your second question, I have never tried BERT with 4 modes. I will look around and inform you should I found any repository with such implementation.
I am not sure if my explanation is good enough for you. Should there arise a need for additional clarification, kindly send another message. I will try my best to do it.