The time is now
After seeing the huge news about DeepSeek-R1’s release last month and its affect on Nvidia’s stockand after dabbling in running LLM’s locally last year for a research project, aaaaand after I just deployed my first server for my home lab and looking for a project to run on it. I figured now was a better time then ever to get started!
Technology Stack
After some research for this project I decided on this Technology stack:
Technology | Product/Platform | Reason |
---|---|---|
Server | Windows 11 | Modern OS and flexible for other projects down the road. |
Virtual Machine Platform | Windows Subsystem for Linux (WSL) | Native to Windows and supports GPU acceleration with the use of a shared Docker daemon and images between Linux and Windows. |
Containerization Platform | Docker Desktop | Open Source, very popular and I wanted to expand my skills with the platform. |
Large Language Model (LLM) | llama2 | Made by Meta and is lightweight (7b) for quick testing and deployments. |
LLM Engine | Ollama | Free, open source and is a new tool I wanted to explore! |
Web Server and Graphical User Interface (GUI) | Open Web UI | Open Source and has conveniently made packages for Ollama. |
Virtual Private Network (VPN) | Tailscale | Free, mostly open source and helps streamline networking. |
Networking Serving | Tailscale Serve | Allows you to share privately a locally hosted service (Not internet facing). |
Configuration
- I first configured my server with an installation of Windows 11.
- Installed and configured Tailscale.
- Enabled the WSL windows feature and installed a fresh Debian distribution.
- Installed Docker desktop and configured it to use the WSL 2 based engine.
- I pulled the Ollama+Open Web UI docker image down to my Debian VM and installed the necessary GPU drivers.
- Once deployed locally I went into the Tailscale admin console to enable HTTPS traffic and Tailscale Serve.
- Once Tailscale Serve was configured I was able to open the Open Web UI on my iPhone and start talking to the LLM! The experience was very similar to using the official ChatGPT app. (Outside of it being a 7 billion parameter model vs 4o 200 billion, My RTX 2070 Super stays strong!).
Resources