What is Docker overlayfs copy-up and why does it cause SSD writes?

Docker uses overlay2, a copy-on-write filesystem. When you rename or move a directory inside a container with mv, the kernel copies the entire directory tree from the read-only image layer to the writable layer on SSD. A single mv /testbed /testbed.ro copies 340 MB to SSD per container, even if you never modify those files.

How do you monitor Docker SSD I/O on Linux?

Read /proc/diskstats twice per second and compute deltas for your NVMe device. Track sectors_written and io_ticks to get write MB/s and utilization percentage. A 53-line monitoring script revealed more about the system than weeks of code review.

How I Almost Killed a $600 SSD by Trying to Save It

Q: How do you reduce Docker container SSD writes?

Don't fight Docker's native copy-on-write — it only writes files the container actually modifies. Use --log-driver=none to eliminate log writes. Use --tmpfs flags for high-write paths like pip installs. Redirect host temp directories to /dev/shm (RAM). And never use mv on large directories inside containers — it triggers a full copy-up.

Q: How much does SSD thrashing actually cost?

A $600 enterprise SSD rated for 1,000 TBW (terabytes written) costs about $0.60 per terabyte of writes. At 840 GB/day, that's $0.50/day or $180/year in drive wear — burning through 30% of the drive's total lifespan annually. After optimization, the same workload costs $4/year.

dockeroverlayfsssdperformancelinuxdebugging

Fifty Cents a Day Doesn't Sound Like Much

Until you do the math.

My SSDSolid-State Drive — the storage in modern computers. Unlike old hard drives with spinning metal platters, SSDs use memory chips. They're fast, silent, and have one fatal flaw: every single write wears the chips out slightly. Think of it like a pencil eraser — works great for thousands of uses, but eventually there's nothing left. Manufacturers rate them with a 'total bytes written' guarantee. Once you exceed it, you're living on borrowed time. cost $600. It's rated for 1,000 terabytes of total writes — a petabyteA petabyte is 1,000 terabytes. A terabyte is roughly 1,000 copies of the entire English Wikipedia. So a petabyte is a million Wikipedias. This drive can absorb a million Wikipedias before it wears out. Sounds like a lot. It isn't, when you're accidentally writing 840 copies every single day.. That sounds like a lot, and for normal use, it is. You'd have to try pretty hard to wear it out.

I was trying very hard. I just didn't know it.

Every day, my system was writing over a terabyte of unnecessary data to this drive — from half a dozen different sources I hadn't thought to measure. At the worst point, the drive was hitting 4,380 megabytes per second of writes. That's nearly its physical maximum. The drive was screaming.

At that rate, I was grinding through $180 worth of drive life every year, burning 30% of the SSD's total lifespan annually, turning a drive that should last a decade into a three-year countdown.

Here's what that looks like:

$600 SSD — Lifespan Drain840 GB / day · $0.50 in wear

100.0%drive health remaining

days elapsed

0 GB

written to SSD

$0.00

in drive life burned

$600

drive cost

A $600 SSD rated for 1 petabyte. Watch what 840 GB of daily writes does to it over a year.

This is the story of how I found six different ways my system was thrashing an SSD — and how the worst offender turned out to be a script I wrote to protect it.

What I Was Building

I run an AI evaluation systemThink of it like a standardized test for AI coding assistants. You give each AI agent the same set of real bugs from real open-source projects, let them try to fix each one, then grade them. Did the fix work? Did the tests pass? It's how you know whether an AI tool actually helps or just sounds confident. The answer varies more than you'd expect. — it tests AI coding agents against real bugs from production open-source projects. 500 bug-fix tasks. 200 feature-building tasks. Different tools, different models, different configurations.

Each task gets its own sealed sandbox — a Docker containerDocker creates isolated environments called 'containers.' Imagine a sealed room with everything a program needs: its own operating system, its own files, its own tools. The program inside can't see or touch anything outside the room. When the program finishes, you demolish the room. Every AI agent in my system gets its own room, works for 30 minutes, and gets demolished afterward. About 600 rooms per day.. Spin it up, let the agent work for 30 minutes, capture results, tear it down. About 600 of these per campaign. A full campaign takes 20 hours. Campaigns run daily.

600

Containers

Spun up and torn down every campaign

20 hrs

Per Campaign

Running daily, 7 days a week

$600

SSD Cost

Rated for 1 petabyte total writes

The machine is built for this: a GPUA Graphics Processing Unit — originally designed to render video game graphics, now the engine behind AI. GPUs can do thousands of math operations simultaneously, which happens to be exactly what AI models need. This one is an NVIDIA RTX PRO 6000 with 96 GB of dedicated memory and a 600-watt appetite. It draws more power than a space heater. It also means I can run AI models locally instead of paying cloud providers per query. running AI models locally (the same one that had its own drama with a firmware bug), dozens of processes managing container lifecycles, constant disk activity. Heavy workload.

So when I first noticed the SSD taking a beating, the reaction was: "Yeah, that tracks."

It didn't track. Not even close.

The Monitoring Script That Changed Everything

I'd built a small monitoring tool — 53 lines of code that read the operating system's disk statistics twice per second, calculated write speed and drive utilization, and logged the numbers. Under the hoodLinux tracks every read and write operation in a file called /proc/diskstats. It's like a mileage counter for your hard drive, updated in real-time by the kernel. If you read it twice a second and subtract the old numbers from the new ones, you get live write speed. Free, built into every Linux system, and more useful than any expensive monitoring tool I've ever used., it was reading /proc/diskstats and doing subtraction. Took about as long to write as making a sandwich.

That sandwich-effort script revealed a disaster I'd been blind to for weeks.

I kicked off a batch of 16 containers and watched the monitor:

Caught in the Act — 16 Containers Starting Up

real data

My 53-line monitoring script captured this. Each bar shows write speed vs. the drive's physical maximum. Red means the drive is nearly pegged.

0 drive max: 5,000 MB/s