I’ve recently got into learning about AI and thought I’d share my dev environment for doing experiments. My goal was to be able to follow along with various online learning resources that center around Jupyter notebooks using Python and PyTorch. Rather than just use these notebooks directly, I wanted a full IDE (VS Code) with Copilot integrated so that I could ask questions about the code I’m writing and solicit help. I cannot recommend Copilot X enough for that. The economics region of my brain is broken, so I also wanted to develop against two Nvidia 4090 GPUs I bought recently.
I’ve broken this post down into a hardware section and a software section, but here’s the TL;DR if you don’t care about the details and are just curious:
I’ve got two desktop computers sitting under my desk (well, more, but two that are relevant here). One, which I’ll call the ML computer, has two beefy GPUs in it and the other, which I’ll call the desktop computer, is a bare-bones machine used to render my desktop environment.
Cloud is the way to go for a hobbyist if you are rational about money. It takes quite a few hours of machine learning to justify a dedicated GPU and even full-time students are unlikely to justify it. I’m more of an emotional spender and I really can’t bring myself to pay by the hour for anything, so I went ahead and did something absurd: I bought two Nvidia 4090 GPUs. At least I’ll be able to finally try ray tracing in games.
If you’re thinking of doing the same, first read this blog: Which GPU(s) to Get for Deep Learning. It’s very thorough. TL;DR: if you care about VRAM, 4090 is currently the best bang-for buck, especially if you want to do stuff with 8-bit weights.
I went with two because I wanted the extra VRAM for larger models, though the 4090s don’t support SLI and instead need to be manually managed by the software I write. It’s kind of a pain in the ass and I haven’t had much luck in consistently getting newbie-friendly projects working across both, but I’m learning. PiPPy is one promising avenue that I intend to explore further for this. Having two also means I’ll be able to sort out multi-GPU problems locally even if I end up doing the heavy lifting in the cloud. Honestly, though, it would have been better to start with just one GPU.
Other GPU-related tips and notes:
I wasn’t planning to go this route at first, but it quickly became apparent that my desktop environment hates running out of VRAM unexpectedly. Running out of VRAM turns out to be a common problem with ML workloads. It crashes or hangs, forcing a restart each time. I’m running Arch Linux with Cinnamon, though I suspect this is true for just about any OS. Ultimately, I decided to set up my primary ML computer to dual-boot to a console-only version of Arch and got a second computer whose only job is to run a desktop environment and connect over SSH to my ML computer. I’d recommend an Intel NUC or similar for the desktop computer, though I already happened to already have an old computer I could use.
Having my desktop environment on the second computer means nothing bad will happen if I run out of VRAM on the main computer – at worst, I’ll just have to start training or inference over again with different parameters. Fortunately, everything else I wanted to do works pretty easily across the two machines too.
Some other hardware specs for the ML computer:
Once I got the hardware installed, I set up Arch Linux on both. On the ML computer, it boots to run level 3 (no desktop environment) when I’m doing ML stuff. The desktop computer is running Cinnamon. If you’re not a Linux nerd, I recommend Debian for both; I prefer Arch because I like configuring everything myself.
I have an ~/ai
directory on both computers, where I put all my code and models
etc. On the desktop computer, it’s empty and I mount the ML computer to it over
SSHFS:
$ sshfs primary:ai ~/ai
This way, I have convenient access to all of the files from both devices. It
essentially treats the ML computer’s directory as being local on the desktop
computer’s. I also keep a couple of SSH terminals open as well to run commands
and monitor GPU VRAM on the ML computer (with nvidia-smi
).
Python is what everybody in AI is using for ML. In particular, Jupyter notebooks seem to be popular among AI researchers. Jupyter comes with a web interface that is decent, but it leaves much to be desired if you’re used to a full IDE (and GitHub Copilot…). Fortunately, VS Code support Jupyter notebooks, including remote ones, which means you can run the IDE on the desktop computer and the Jupyter environment on the ML computer. I’m running the Insiders edition of VS Code to gain access to Copilot X, though there is also a Genie GPT plugin as an alternative.
Unfortunately, not everything works right out of the box when using VS Code with
Jupyter notebooks. In particular, I’ve discovered some of the
fast.ai widgets don’t render correctly without some
tweaks (and even then, I’ve found the
ImageClassifierCleaner
never works).
To get progress bars to render, run this in a cell:
from IPython.display import clear_output, DisplayHandle
# Define a function that updates an existing display object
def update_patch(self, obj):
# Clear any outputs in the current IPython cell,
# but wait until new outputs arrive before doing so
clear_output(wait=True)
# Update the display with a new object
self.display(obj)
# Extend the DisplayHandle class by adding the update_patch function
# as a method named 'update'
# This effectively overwrites the existing 'update' method (if any) in DisplayHandle
DisplayHandle.update = update_patch
Rather than mess with pip
, pipenv
, conda
, etc. to keep my various projects
isolated, I run Docker from my project directory.
It’s more hermetic than the alternatives and provides a modicum of security
should I pull in a third-party package with some kind of vulnerability in it
(though this isn’t a perfect solution).
For Jupyter notebooks, this is the command I use:
$ docker run \
--shm-size=2G \ # Set the size of the shared memory to 2GB
--gpus all \ # Pass through all GPUs available
--detach \ # Run the container in the background
--interactive \ # Keep STDIN open even if not attached
--tty \ # Allocate a pseudo-TTY
--publish 8848:8888 \ # Publish 8888 to the host's port 8848
--publish 8080:8080 \ # Publish 8080 to the host's port 8080
--volume $(pwd):/home/jovyan \ # Mount the current directory
--env GRANT_SUDO=yes \ # Set env GRANT_SUDO to yes
--env JUPYTER_ENABLE_LAB=yes \ # Set env JUPYTER_ENABLE_LAB to yes
--env JUPYTER_TOKEN="secrettoken" \ # Replace with your own, made up token
--user root \ # Set the user or UID
cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only
Change secrettoken
to something only you know; it acts as a password of sorts
to connect to it. When you open a notebook file with VS Code, it’ll ask you to
pick the Python environment. There’s an option to connect to a remote server;
put the IP of the primary computer and add ?token=secrettoken
to connect. I
often experience a bug where it doesn’t show up after adding it until I restart
VS Code, however.
I arrived at this Docker command through trial and error; if you use it, be sure that you understand everything it does first – even Docker cannot guarantee perfect isolation.
On the desktop computer, I do use Pipenv as well so that VS Code knows about the various APIs I pull in and can give me hints about them.