TL;DR Quick Fix
One cause, multiple exits: bitsandbytes couldn't find a CUDA-enabled GPU. Your next move depends on your setup:
- GPU present but CUDA not detected β reinstall bitsandbytes with the matching CUDA wheel
- CPU-only machine β skip 8-bit entirely and use GGUF/llama.cpp instead
- Windows β install
bitsandbytes-windows, or use WSL2 with CUDA passthrough
The Full Error
RuntimeError: bitsandbytes was compiled without GPU support. 8-bit optimizers and quantization require a GPU to function.
Typically triggered by code like this:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1",
quantization_config=bnb_config
)
Root Cause
bitsandbytes links against CUDA kernels at load time. No CUDA runtime? No kernels. It's that simple.
Three situations lead here:
- You installed
torchfrom the default PyPI index β the CPU-only build β instead of the CUDA-specific index - Your machine has no NVIDIA GPU (cloud CPU instance, Mac, CI/CD server)
- CUDA is installed system-wide, but
nvccor the CUDA runtime isn't on PATH and PyTorch can't see it
Step 1 β Diagnose Your Setup
Start here before touching anything:
python -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
nvcc --version
nvidia-smi
torch.cuda.is_available() returning False is the smoking gun. bitsandbytes won't load GPU kernels no matter how it was compiled β fix PyTorch first.
Fix A β Reinstall bitsandbytes with CUDA (GPU Available)
Got an NVIDIA GPU? The problem is almost always a mismatched PyTorch build. Reinstall in order:
1. Reinstall PyTorch with CUDA support:
# For CUDA 12.1 (check yours with: nvcc --version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
2. Reinstall bitsandbytes:
pip uninstall bitsandbytes -y
pip install bitsandbytes
3. Confirm it's working:
python -c "import bitsandbytes as bnb; print(bnb.__version__)"
python -c "import torch; print('CUDA:', torch.cuda.is_available())"
Fix B β Windows-Specific
Standard bitsandbytes simply doesn't work on Windows. Use the community-maintained fork instead:
pip uninstall bitsandbytes -y
pip install bitsandbytes-windows
For serious LLM work on Windows, WSL2 with CUDA passthrough is more stable long-term:
# Inside WSL2 (Ubuntu)
pip install torch --index-url https://download.pytorch.org/whl/cu121
pip install bitsandbytes
Fix C β CPU-Only Machine (No GPU)
Hard limit: bitsandbytes 8-bit quantization requires a CUDA GPU. No CPU fallback exists in the library. You have three practical options:
Option 1: llama.cpp / GGUF (recommended for CPU)
This is the go-to for running LLMs on CPU. A Q4_K_M quantized Mistral 7B fits in roughly 6 GB of RAM:
pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
n_ctx=2048
)
output = llm("What is quantization?", max_tokens=200)
Option 2: Hugging Face without quantization
Drop the BitsAndBytesConfig entirely and load at full precision:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-2",
torch_dtype=torch.float32,
device_map="cpu"
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
1β3B models are usable on CPU. A 7B model at float32 needs 28+ GB RAM and will be painfully slow.
Option 3: ctransformers (C++ backend)
pip install ctransformers
from ctransformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
model_type="mistral"
)
Fix D β Google Colab / Cloud Notebooks
Getting this error on Colab despite selecting a GPU runtime? First, verify the GPU was actually assigned:
import subprocess
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
print(result.stdout)
If nvidia-smi fails, go to Runtime β Change runtime type β T4 GPU and reconnect. Then force-reinstall bitsandbytes:
!pip install -q bitsandbytes
import importlib, bitsandbytes
importlib.reload(bitsandbytes)
Verification β Confirm the Fix Worked
import torch
import bitsandbytes as bnb
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"bitsandbytes: {bnb.__version__}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
import torch.nn as nn
layer_8bit = bnb.nn.Linear8bitLt(64, 64, has_fp16_weights=False).cuda()
print("8-bit layer created successfully")
You're looking for this:
CUDA available: True
CUDA version: 12.1
bitsandbytes: 0.43.1
GPU: NVIDIA GeForce RTX 3090
8-bit layer created successfully
Version Compatibility
bitsandbytes >= 0.41.0β supports CUDA 11.7, 11.8, 12.0, 12.1+bitsandbytes 0.37β0.40β CUDA 11.x only, will break on CUDA 12.x- Match bitsandbytes to PyTorch's CUDA version β not the system CUDA version
Check which CUDA PyTorch was actually built against β this is the number that matters for compatibility:
import torch
print(torch.version.cuda)

