Image from and credit to “Language Models are Few-Shot Learners” from Brown Et al., 2020

GPT-NEO is a series of languages model from EleutherAI, that tries to replicate OpenAI’s GPT-3 language model. EleutherAI’s current models (1.7 Billion and 2.7 Billion Parameters) are not yet as big as OpenAIs biggest GPT-3 model Davinci (175 Billion Parameters). But unlike OpenAI’s models, they are freely available to try out and finetune.

Finetuning large language models like GPT-NEO is often difficult, as these models usually are too big to fit on a single GPU.

This guide explains how to finetune GPT-NEO (2.7B Parameters) with just one command of the Huggingface Transformers library on a single GPU.

This is made…

I needed to finetune the GPT2 1.5 Billion parameter model for a project, but the model didn’t fit on my gpu. So i figured out how to run it with deepspeed and gradient checkpointing, which reduces the required GPU memory.

I hope this guide helps some people, who also want to finetune GPT2, but don’t want to set up distributed training.

You can find the repo with the most recent version of the guide here.

1. (Optional) Setup VM with V100 in Google Compute Engine

Note: The model does run on any server with a GPU with at least 16 GB VRAM and 70 GB RAM


  1. Install the Google Cloud…

Peter Albert

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store