LLama 2 System Requirements

Use the charts to help choose what model to use for your system.

8-bit Model Requirements for LLaMA

ModelVRAM UsedMinimum Total VRAMCard examplesRAM/Swap to Load*
LLaMA-7B9.2GB10GB3060 12GB, 3080 10GB24 GB
LLaMA-13B16.3GB20GB3090, 3090 Ti, 409032 GB
LLaMA-30B36GB40GBA6000 48GB, A100 40GB64 GB
LLaMA-65B74GB80GBA100 80GB128 GB

*System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

4-bit Model Requirements for LLaMA

ModelMinimum Total VRAMCard examplesRAM/Swap to Load*
LLaMA-7B6GBGTX 1660, 2060, AMD 5700 XT, RTX 3050, 30606 GB
LLaMA-13B10GBAMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A200012 GB
LLaMA-30B20GBRTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V10032 GB
LLaMA-65B40GBA100 40GB, 2×3090, 2×4090, A40, RTX A6000, 800064 GB

*System RAM, not VRAM, required to load the model, in addition to having enough VRAM. Not required to run the model. You can use swap space if you do not have enough RAM.

llama.cpp Requirements

ModelOriginal SizeQuantized Size (4-bit)
7B13 GB3.9 GB
13B24 GB7.8 GB
30B60 GB19.5 GB
65B120 GB38.5 GB

As the models are currently fully loaded into memory, you will need adequate disk space to save them and sufficient RAM to load them. At the moment, memory and disk requirements are the same.


Current Popular Choices

Popular choice means the top models that are currently the most widely used and generally best quality for most tasks.

Base LLaMA Models

ModelDownload
LLaMA 7BSource – GPTQ – ggml
LLaMA 13BSource – GPTQ – ggml
LLaMA 30BSource – GPTQ – ggml
LLaMA 65BSource – GPTQ – ggml

Llama 2 Models

ModelDownload
Llama 2 7BSource – HF – GPTQ – ggml
Llama 2 7B ChatSource – HF – GPTQ – ggml
Llama 2 13BSource – HF – GPTQ – ggml
LLama 2 13B ChatSource – HF – GPTQ – ggml
Llama 2 70BSource – HF – GPTQ
Llama 2 70B ChatSource – GPTQ
Source: LLAMA2 Download Reddit Links

Leave a Comment