I run Llama 3 8B locally on CPU

2 min readApr 20, 2024

If you have not been living under the rock, you might notice the release of new LLM from Meta AI — Llama3. Even Meta CEO Mark Zuckerberg made an instagram video about it. Also some LinkedIn posts go into details of how many thousands GPUs and billion of dollars they spend on it, and now everyone can enjoy it for free. Big win!

Official Llama3 GitHub repo looking cute

So I downloaded the official repo I run the code locally on my MacBook Pro m1. Actually, I wanted to run it, but I run out of memory even for 8B model, so I had to go back to 32GB RAM Asus :) Also I run into several errors, like hard coded .cuda() in the model loading code and some other distributed computing errors like

    67 if not torch.distributed.is_initialized():
---> 68     torch.distributed.init_process_group("nccl")
...
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

But don’t worry, I made couple of adjustments, pushed them to my fork and I will go through all details of what I changed in the original code.

To solve the first error, we need to do workaround for distributed torch computing, it can be done with creating localhost env with 1 machine:

import os
import torch
import torch.distributed as dist
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '12355' #whatever port you like and is free
os.environ['RANK'] = '0'
os.environ['WORLD_SIZE'] = '1'
dist.init_process_group(backend='gloo', init_method='env://')

Next, I removed .cuda() in llama/model.py twice to avoid errors on machines without cuda. Similar adjustments should be made to llama/generation.py , like commenting out torch.set_default_tensor_type(torch.cuda.BFloat16Tensor) and replacing it with torch.set_default_device(‘cpu’). For full changes please refer to GitHub Fork.

In the end, running code based on example_text_completion.py I got impressive results. It was little slow, taking 90 seconds for generation of 4 prompts completion, but it is still very impressive and pretty useful to start working with state of the art model for free.

Final thoughts:

English text completion is impressive, English to French translation from example is working good.
Giving the task to the model to translate to less widespread language yields some errors, for example my native Ukrainian gave 9 of 10 correct answers.

Translate English to Ukrainian:

        sea otter => морська видра
        peppermint => перцева м‘ята
        plush girafe => плюшева жирафа
        cheese => 
>  сир
        vanilla => ваніль
        chocolate => шоколад
        strawberry => тропічне фруктове напій
        honey => мед
        caramel => карамель
        coconut => кокос
        lemon => лимон
        banana => банан
        cherry => ви

==================================

Kudos to all free educators in the YouTube for teaching me DL and inspiring me!
What a time to be alive! (c)

I run Llama 3 8B locally on CPU

Written by Igor Kasianenko

Responses (1)