Pioneering GPT-3 in 2022: Before Anyone Knew What ChatGPT Was
In Q2 2022, I shipped GPT-3 at ButcherBox for support automation — saving 2 hours/week/employee with an NPS of 57, six months before ChatGPT.
In Q2 2022, I Pitched a Technology Nobody Had Heard Of
I walked into a meeting at ButcherBox and told leadership we should use GPT-3 to automate customer support workflows. Nobody in the room had heard of it. ChatGPT wouldn't exist for another six months.
That's not hindsight talking. In mid-2022, large language models were research papers and Twitter threads. OpenAI had an API. A handful of developers were experimenting with it. Enterprise adoption? Almost nonexistent.
I'd been watching the space closely. GPT-3 wasn't perfect. But it was good enough for a specific class of problems — and I knew exactly which one. Here's what made the difference.
What Did the Pitch Look Like?
The pitch was simple: our support team spent hours on repetitive ticket processing that followed predictable patterns. GPT-3 could draft responses, categorize issues, and surface order data automatically. The ROI was measurable in hours saved per week, not some vague "AI transformation" promise.
The skepticism was real. This wasn't like pitching a new framework or a database migration. Nobody could Google "GPT-3 enterprise case study" and find examples. There were no best practices. No Stack Overflow answers. No vendor presentations with polished slides.
I framed it around the problem, not the technology. Support was processing a massive volume of tickets on a platform handling $2B+ in transactions. The patterns were repetitive. The responses were templated. The data the team needed was scattered across systems.
GPT-3 could sit in the middle. Draft a response. Pull the relevant order data. Categorize the ticket. A human still reviewed everything before it went out. But instead of building every response from scratch, the team started from an 80% draft.
Leadership didn't say yes to "AI." They said yes to "2 hours back per employee per week."
What GPT-3 Could and Couldn't Do in 2022
Here's what people forget about GPT-3 before ChatGPT made everything look easy: it was powerful and unreliable in equal measure.
It could generate coherent text. It could follow instructions if you structured your prompts carefully. It could extract information from context windows. That was enough for support workflows where the output was always reviewed by a person.
It couldn't reason reliably. It hallucinated. It didn't have the fine-tuning toolchain that exists today. There was no "chat" mode. You sent a completion request and hoped the prompt engineering held.
So I built guardrails.
Every output went through human review. The system drafted, it didn't send. Token budgets kept costs predictable. Content filtering caught the obvious failures. When the model returned low-confidence results, the system fell back to manual handling with no disruption.
The bet wasn't that GPT-3 was ready to replace people. The bet was that it was ready to make people faster.
The Results Nobody Expected
The GPT-3 deployment saved approximately 2 hours per week per employee across the support team and achieved an NPS of 57. The system drafted responses and surfaced order data while humans retained final approval, proving that early LLM adoption could deliver measurable ROI months before ChatGPT made the technology mainstream.
Two hours per week per employee. That was the measurable win. Across a support team, it added up fast.
But the NPS of 57 surprised me more. The team didn't just tolerate the tool. They liked it. The common feedback was that it eliminated the most tedious part of their day — the mechanical ticket-processing work that didn't require judgment but still ate time.
The support team went from building every response manually to reviewing and editing AI-drafted responses. Same quality bar. Same human oversight. Less grind.
And then, in November 2022, ChatGPT launched. Suddenly everyone was talking about the technology we'd been running in production for months.
I wasn't ahead of a trend. I was ahead of a problem. The trend caught up.
What Early Adoption Taught Me
Since ButcherBox, I've built AI systems at increasing scale. Multi-agent workflows that generate 85-90% of migration code automatically. A profile-aware chatbot on this portfolio that answers questions using real resume data. The common thread is the same principle from 2022: find the problem first, then check if the model is ready.
The engineers who waited for ChatGPT to feel safe missed a year of learning. The ones who jumped on every AI demo without a clear problem wasted a different year. The gap between those two groups is where useful work gets done.
Tyler Wall calls this approach AI-directed development — using AI as a core part of the engineering process, not an afterthought. The GPT-3 deployment was the earliest expression of that philosophy. For context on where the field was in 2022, OpenAI's original GPT-3 paper remains a good reference point.
See the Trajectory
The GPT-3 work at ButcherBox was the start. The AI engineering profile shows where that trajectory led — multi-agent systems, automated code generation, and AI governance at enterprise scale. The chatbot on this portfolio is a direct descendant of the same philosophy: AI that does real work, with guardrails, reviewed by humans.
In This Series
- One Afternoon, 23 Backgrounds — The 23 canvas engines behind every page
- One Resume Is Not Enough — How YAML drives 16 portfolio variants
- Text Is Not Enough — The profile-aware AI chatbot
- Why Everything Is Glass — The glassmorphism design system
- Ask ChatGPT Who Tyler Wall Is — Infrastructure and AI discoverability
Frequently Asked Questions
What did GPT-3 actually do at ButcherBox?
GPT-3 automated customer support workflows at ButcherBox, handling routine requests that previously required manual processing. It drafted responses, categorized tickets, and surfaced relevant order data — saving approximately 2 hours per week per employee across the support team, with an NPS of 57.
Why was GPT-3 adoption in 2022 considered early?
ChatGPT launched in November 2022 and brought LLMs into mainstream awareness. My GPT-3 implementation at ButcherBox was running in production months before that, making it one of the earliest enterprise GPT deployments — at a time when most engineers had never heard of large language models.
What guardrails did you put on GPT-3 in production?
Every GPT-3 output went through human review before reaching a customer. The system drafted responses and surfaced recommendations, but a support team member always had final approval. Token budgets, content filtering, and fallback paths to manual handling kept the system safe when the model returned low-confidence results.