Deploying an LLM Model on my Own Server with Docker and Tailscale

The time is now

After seeing the huge news about DeepSeek-R1’s release last month and its affect on Nvidia’s stockand after dabbling in running LLM’s locally last year for a research project, aaaaand after I just deployed my first server for my home lab and looking for a project to run on it. I figured now was a better time then ever to get started!

Technology Stack

After some research for this project I decided on this Technology stack:

Technology	Product/Platform	Reason
Server	Windows 11	Modern OS and flexible for other projects down the road.
Virtual Machine Platform	Windows Subsystem for Linux (WSL)	Native to Windows and supports GPU acceleration with the use of a shared Docker daemon and images between Linux and Windows.
Containerization Platform	Docker Desktop	Open Source, very popular and I wanted to expand my skills with the platform.
Large Language Model (LLM)	llama2	Made by Meta and is lightweight (7b) for quick testing and deployments.
LLM Engine	Ollama	Free, open source and is a new tool I wanted to explore!
Web Server and Graphical User Interface (GUI)	Open Web UI	Open Source and has conveniently made packages for Ollama.
Virtual Private Network (VPN)	Tailscale	Free, mostly open source and helps streamline networking.
Networking Serving	Tailscale Serve	Allows you to share privately a locally hosted service (Not internet facing).

Configuration

I first configured my server with an installation of Windows 11.
Installed and configured Tailscale.
Enabled the WSL windows feature and installed a fresh Debian distribution.
Installed Docker desktop and configured it to use the WSL 2 based engine.
I pulled the Ollama+Open Web UI docker image down to my Debian VM and installed the necessary GPU drivers.
Once deployed locally I went into the Tailscale admin console to enable HTTPS traffic and Tailscale Serve.
Once Tailscale Serve was configured I was able to open the Open Web UI on my iPhone and start talking to the LLM! The experience was very similar to using the official ChatGPT app. (Outside of it being a 7 billion parameter model vs 4o 200 billion, My RTX 2070 Super stays strong!).

Resources

Ollama is now available as an official Docker image

Installing the NVIDIA Container Toolkit

Tailscale Serve

Enabling HTTPS | Tailscale

Certificate Transparency (CT)

Open WebUI

Installing Open WebUI with Bundled Ollama Support and With GPU Support

Explorer

Calvin
Schmeichel

Deploying an LLM Model on my Own Server with Docker and Tailscale

The time is now

Technology Stack

Configuration

Table of Contents