Most cloud cost problems aren't mysterious. They're not caused by a sudden traffic spike or an expensive new feature. They're caused by resources that nobody is actively using and nobody remembers creating.

You know the feeling. The bill comes in higher than last month. You ask around. Nobody has an obvious answer. So you move on, assume it'll sort itself out, and it never does. Meanwhile, somewhere in your AWS accounts, a database from a project that shipped two years ago is quietly running. A load balancer with zero active connections has been charging $40 a month since the team that built it disbanded. An EC2 instance that was supposed to be temporary is now in its eighteenth month of life.

Nobody created these resources maliciously. Nobody forgot to care. The environment just kept growing and nobody had a complete enough picture to know what to clean up, or whether it was safe to touch anything at all.

These are zombie resources -- and for most engineering teams they're not a rare edge case. They're background noise that compounds every billing cycle.

What Are Zombie Resources?

A zombie resource is any cloud resource that is no longer serving its original purpose but is still running and still costing money. The three most common types are:

What makes zombies hard to deal with isn't identifying that they exist -- it's knowing whether it's safe to remove them. Without knowing what depends on a resource, removing it is a gamble. That's why most teams know they have waste and don't touch it anyway.

Why Zombie Resources Happen (And Why They Keep Happening)

Zombie resources are almost never the result of laziness or carelessness. They're what happens when teams move fast without a complete picture of what they're building on top of.

Industry estimates put cloud waste at 20 to 30 percent of total cloud spend. For a team running a $1M/year AWS bill, that's up to $300K sitting in things that are either doing nothing or doing something nobody can explain anymore.

How to Clean Up Zombie Resources Without Breaking Things

Most cleanup efforts fail not because the team doesn't care but because they move too fast and break something. Here's how to do it without that 2am phone call.

Step 1: Build the inventory

Before you can remove anything, you need a complete list of what's running across all your accounts and regions. Not what your IaC (Infrastructure as Code) says should be running -- what's actually running. These two things are often different. Resources that were created manually, resources that were never added to Terraform, and resources that drifted from their defined state all show up in the live environment but not in your code.

Step 2: Map ownership

For each resource, you need to know: who created this, what team does it belong to, and what project is it supporting? This is where most cleanup efforts stall, and honestly, where most teams realize their tagging discipline was never as good as they thought. If the tags don't tell you, you need to infer ownership from other signals: deployment history, IAM identity logs, IaC repository commits.

Step 3: Check dependencies

This is the step that saves you. A resource that looks completely idle might be a shared dependency for something nobody thought to document. An EBS volume with no recent reads might be an emergency backup that someone very much still cares about. Blast radius analysis -- understanding what breaks if you remove something -- is what separates a cleanup from a cleanup that becomes an incident. You can read more about how OpsCanvas approaches blast radius mapping for cost optimization.

Step 4: Classify and prioritize

Not all waste is equal. Prioritize by cost impact first (focus on what's costing the most), then confidence level (start with resources where you're certain they're unused), then rollback safety (flag anything uncertain as a soft delete before a hard delete).

Step 5: Decommission, don't just delete

Stop the resource first, wait a week, watch for any alerts or complaints, then remove it. For anything significant, capture the IaC lineage so you could recreate it if needed. Decommission -- don't just delete.

Key Takeaways
Zombie resources are orphaned, idle, or over-provisioned cloud resources still running and costing money after they've outlived their purpose.
The hard part isn't finding them -- it's knowing which ones are safe to remove without breaking something downstream.
Ownership mapping and dependency checking are the two steps most teams skip, and the reason most cleanup efforts cause incidents.
A one-time cleanup isn't enough -- waste re-accumulates within 6 to 12 months without ongoing monitoring.
If you don't know what half your resources are for, you have a context problem -- not just a cleanup problem.

Frequently Asked Questions

How much waste do most companies really have?
Industry research consistently puts cloud waste at 20 to 35 percent of total cloud spend for most mid-size companies. For a team spending $1M a year on AWS, that's $200K to $350K sitting in resources that aren't earning their keep.
Can't I just use AWS Cost Explorer or a FinOps tool to find zombie resources?
Cost tools show you that waste exists. What they don't tell you is who created the resource, what it was for, what depends on it, or whether it's safe to remove. You need cost data plus ownership data plus dependency data in the same place to act confidently. That's the gap the OpsCanvas Context Graph is built to close.
What's the difference between a zombie resource and an underutilized resource?
An underutilized resource is still serving a purpose -- it's just oversized for what it's doing. A zombie resource is serving no purpose at all, or a purpose that no longer exists. Both cost money, but they require different remediation: right-sizing for underutilized, decommissioning for zombies.
How often should we do zombie resource cleanup?
A one-time cleanup is better than nothing, but waste re-accumulates quickly. Teams that do a single audit typically find themselves back at the same waste level within 6 to 12 months. Ongoing monitoring -- regularly scanning for newly orphaned or idle resources -- is the only way to keep it from compounding.
What if we don't know what half our resources are for?
More common than anyone likes to admit. And it's not just a cleanup problem -- it's a sign your cloud environment has drifted significantly from what your documentation says. The zombie resources are the symptom. The real problem is that nobody has a current, accurate picture of what's running and why. Until you have that, any cleanup is just guessing.
Zombie resources aren't going away on their own. Every month you don't deal with them, the bill goes up a little more and the ownership picture gets a little blurrier.

But here's what the cleanup usually surfaces: zombie resources are a symptom of a deeper problem. Most engineering teams don't have a live, accurate picture of what's running in their cloud, who owns what, what depends on what, and what was built for a project that ended eighteen months ago. The cleanup matters. So does understanding why the mess accumulated in the first place. Without that picture, the waste comes back. The teams that stay ahead of it aren't the ones running cleanup scripts more often. They're the ones that stopped flying blind.

The hard part isn't finding the zombies. It's knowing which ones are safe to kill.

Try OpsCanvas

See what's actually running in your cloud.

Oscar builds a live context graph of your cloud in under 30 minutes -- mapping every resource, owner, dependency, and cost signal automatically. No tagging required. Free to download.

Brian Kathman
CEO and Co-Founder, OpsCanvas
Brian co-founded OpsCanvas after running Signal Vine, where firsthand experience with cloud sprawl, surprise bills, and ownership gaps shaped the platform. He writes about cloud operations, AI agents, and what it actually takes to run infrastructure at scale.
LinkedIn →
Cloud cost Zombie resources Cloud operations FinOps Cost optimization