Most cloud cost problems aren't mysterious. They're not caused by a sudden traffic spike or an expensive new feature. They're caused by resources that nobody is actively using and nobody remembers creating.
You know the feeling. The bill comes in higher than last month. You ask around. Nobody has an obvious answer. So you move on, assume it'll sort itself out, and it never does. Meanwhile, somewhere in your AWS accounts, a database from a project that shipped two years ago is quietly running. A load balancer with zero active connections has been charging $40 a month since the team that built it disbanded. An EC2 instance that was supposed to be temporary is now in its eighteenth month of life.
Nobody created these resources maliciously. Nobody forgot to care. The environment just kept growing and nobody had a complete enough picture to know what to clean up, or whether it was safe to touch anything at all.
These are zombie resources -- and for most engineering teams they're not a rare edge case. They're background noise that compounds every billing cycle.
What Are Zombie Resources?
A zombie resource is any cloud resource that is no longer serving its original purpose but is still running and still costing money. The three most common types are:
- Orphaned resources were created by a team, project, or engineer that no longer exists or no longer owns them. Nobody knows who deployed them or why. They don't appear in any active project's infrastructure documentation.
- Idle resources are still technically "owned" but doing no useful work. An EC2 instance with 0% CPU utilization for 90 days. An RDS database with no recent connections. A load balancer routing traffic to nothing.
- Over-provisioned resources are doing real work, but they were allocated far more capacity than they need. A production database running on a 32-core instance because that's what the last engineer provisioned "just to be safe."
What makes zombies hard to deal with isn't identifying that they exist -- it's knowing whether it's safe to remove them. Without knowing what depends on a resource, removing it is a gamble. That's why most teams know they have waste and don't touch it anyway.
Why Zombie Resources Happen (And Why They Keep Happening)
Zombie resources are almost never the result of laziness or carelessness. They're what happens when teams move fast without a complete picture of what they're building on top of.
- Projects end, and infrastructure doesn't. When a feature gets deprecated or a project gets cancelled, the cloud resources that supported it rarely get cleaned up at the same time. The team moves on. The resources stay.
- Ownership drifts over time. Engineers leave. Teams reorganize. A service that was clearly owned by the payments team in 2023 might have three possible owners or none by 2026. When nobody owns something clearly, nobody feels responsible for cleaning it up.
- Temporary infrastructure becomes permanent. A staging environment that was supposed to last two weeks is still running eight months later because everyone assumed someone else was going to turn it off. They didn't.
- Cloud accounts multiply faster than governance does. A company that started with one AWS account now has forty. Resources spread across accounts are harder to find, harder to attribute, and much easier to forget.
Industry estimates put cloud waste at 20 to 30 percent of total cloud spend. For a team running a $1M/year AWS bill, that's up to $300K sitting in things that are either doing nothing or doing something nobody can explain anymore.
How to Clean Up Zombie Resources Without Breaking Things
Most cleanup efforts fail not because the team doesn't care but because they move too fast and break something. Here's how to do it without that 2am phone call.
Step 1: Build the inventory
Before you can remove anything, you need a complete list of what's running across all your accounts and regions. Not what your IaC (Infrastructure as Code) says should be running -- what's actually running. These two things are often different. Resources that were created manually, resources that were never added to Terraform, and resources that drifted from their defined state all show up in the live environment but not in your code.
Step 2: Map ownership
For each resource, you need to know: who created this, what team does it belong to, and what project is it supporting? This is where most cleanup efforts stall, and honestly, where most teams realize their tagging discipline was never as good as they thought. If the tags don't tell you, you need to infer ownership from other signals: deployment history, IAM identity logs, IaC repository commits.
Step 3: Check dependencies
This is the step that saves you. A resource that looks completely idle might be a shared dependency for something nobody thought to document. An EBS volume with no recent reads might be an emergency backup that someone very much still cares about. Blast radius analysis -- understanding what breaks if you remove something -- is what separates a cleanup from a cleanup that becomes an incident. You can read more about how OpsCanvas approaches blast radius mapping for cost optimization.
Step 4: Classify and prioritize
Not all waste is equal. Prioritize by cost impact first (focus on what's costing the most), then confidence level (start with resources where you're certain they're unused), then rollback safety (flag anything uncertain as a soft delete before a hard delete).
Step 5: Decommission, don't just delete
Stop the resource first, wait a week, watch for any alerts or complaints, then remove it. For anything significant, capture the IaC lineage so you could recreate it if needed. Decommission -- don't just delete.
Frequently Asked Questions
Zombie resources aren't going away on their own. Every month you don't deal with them, the bill goes up a little more and the ownership picture gets a little blurrier.
But here's what the cleanup usually surfaces: zombie resources are a symptom of a deeper problem. Most engineering teams don't have a live, accurate picture of what's running in their cloud, who owns what, what depends on what, and what was built for a project that ended eighteen months ago. The cleanup matters. So does understanding why the mess accumulated in the first place. Without that picture, the waste comes back. The teams that stay ahead of it aren't the ones running cleanup scripts more often. They're the ones that stopped flying blind.
The hard part isn't finding the zombies. It's knowing which ones are safe to kill.
See what's actually running in your cloud.
Oscar builds a live context graph of your cloud in under 30 minutes -- mapping every resource, owner, dependency, and cost signal automatically. No tagging required. Free to download.