RackNerd Billboard Banner

How to run DeepSeek locally

DeepSeek is an open-source LLM (Large Language Model) project that’s gaining attention for its strong performance and accessibility. If you’re looking to test or fine-tune DeepSeek on your local machine, this guide will walk you through the setup process.

Why Run DeepSeek Locally?

Running DeepSeek locally gives you full control over the model without relying on cloud services. It’s great for:

  • Testing custom prompts and workflows
  • Ensuring data privacy
  • Saving on API costs
  • Fine-tuning or experimenting with model behavior

Prerequisites

Before diving in, make sure your machine meets the following requirements:

  • Operating System: Linux or macOS (Windows users should use WSL)
  • GPU: NVIDIA GPU with at least 16GB VRAM (for decent performance)
  • Python: 3.10 or later
  • CUDA & cuDNN: Installed and configured
  • Disk Space: 50GB+ (model files can be large)
  • Git and pip: Installed

Step 1: Clone the DeepSeek Repository

Open a terminal and run:

git clone https://github.com/deepseek-ai/DeepSeek-V2.git
cd DeepSeek-V2

If you’re using a different version (e.g., DeepSeek Coder), be sure to clone the appropriate repo.


Step 2: Set Up a Python Environment

Create a virtual environment to keep dependencies clean:

python3 -m venv venv
source venv/bin/activate

Then install dependencies:

pip install -r requirements.txt

Step 3: Download the Model Weights

Model weights are usually hosted on Hugging Face. Visit https://huggingface.co/deepseek-ai and choose the variant you want (7B or 67B, base or chat).

Download using git-lfs:

git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-base

Move the downloaded weights into a models/ folder inside your DeepSeek directory.


Step 4: Run the Model with Transformers

You can now load the model using Hugging Face’s transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "./models/deepseek-llm-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

inputs = tokenizer("Hello, DeepSeek!", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=50)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

This will output a basic completion. You’re up and running!


Optional: Use Text Generation WebUI

For a user-friendly interface, consider running DeepSeek via oobabooga/text-generation-webui. This supports multiple LLMs, including DeepSeek, with a web interface and GPU acceleration.


Troubleshooting Tips

  • Out of Memory Errors? Try using 4-bit or 8-bit quantized models via bitsandbytes.
  • CUDA Errors? Make sure your GPU drivers and CUDA toolkit versions are compatible.
  • Slow Load Times? Use torch.compile() or inference acceleration libraries like vllm if supported.

Final Thoughts

DeepSeek is a powerful LLM, and running it locally opens up a lot of flexibility for custom development and private experimentation. With a capable GPU and the right setup, you can have a local AI assistant that doesn’t rely on the cloud.

Let me know in the comments if you run into any issues or want a tutorial on fine-tuning DeepSeek!

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

RackNerd Billboard Banner
Copy link