GPT-NEO is a series of languages model from EleutherAI, that tries to replicate OpenAI’s GPT-3 language model. EleutherAI’s current models (1.7 Billion and 2.7 Billion Parameters) are not yet as big as OpenAIs biggest GPT-3 model Davinci (175 Billion Parameters). But unlike OpenAI’s models, they are freely available to try out and finetune.
Finetuning large language models like GPT-NEO is often difficult, as these models usually are too big to fit on a single GPU.
This guide explains how to finetune GPT-NEO (2.7B Parameters) with just one command of the Huggingface Transformers library on a single GPU.
This is made…
I needed to finetune the GPT2 1.5 Billion parameter model for a project, but the model didn’t fit on my gpu. So i figured out how to run it with deepspeed and gradient checkpointing, which reduces the required GPU memory.
I hope this guide helps some people, who also want to finetune GPT2, but don’t want to set up distributed training.
You can find the repo with the most recent version of the guide here.
Note: The model does run on any server with a GPU with at least 16 GB VRAM and 70 GB RAM