The Cloud Resources Nobody Owns (And Everyone Is Paying For)

What zombie resources are, why every cloud environment is full of them, and how to clean them up without breaking anything.

Most cloud cost problems aren't mysterious. They're not caused by a sudden traffic spike or an expensive new feature. They're caused by resources that nobody is actively using and nobody remembers creating.

You know the feeling. The bill comes in higher than last month. You ask around. Nobody has an obvious answer. So you move on, assume it'll sort itself out, and it never does. Meanwhile, somewhere in your AWS accounts, a database from a project that shipped two years ago is quietly running. A load balancer with zero active connections has been charging $40 a month since the team that built it disbanded. An EC2 instance that was supposed to be temporary is now in its eighteenth month of life.

Nobody created these resources maliciously. Nobody forgot to care. The environment just kept growing and nobody had a complete enough picture to know what to clean up, or whether it was safe to touch anything at all.

These are zombie resources -- and for most engineering teams they're not a rare edge case. They're background noise that compounds every billing cycle.

What Are Zombie Resources?

A zombie resource is any cloud resource that is no longer serving its original purpose but is still running and still costing money. The three most common types are:

Orphaned resources were created by a team, project, or engineer that no longer exists or no longer owns them. Nobody knows who deployed them or why. They don't appear in any active project's infrastructure documentation.
Idle resources are still technically "owned" but doing no useful work. An EC2 instance with 0% CPU utilization for 90 days. An RDS database with no recent connections. A load balancer routing traffic to nothing.
Over-provisioned resources are doing real work, but they were allocated far more capacity than they need. A production database running on a 32-core instance because that's what the last engineer provisioned "just to be safe."

What makes zombies hard to deal with isn't identifying that they exist -- it's knowing whether it's safe to remove them. Without knowing what depends on a resource, removing it is a gamble. That's why most teams know they have waste and don't touch it anyway.

Why Zombie Resources Happen (And Why They Keep Happening)

Zombie resources are almost never the result of laziness or carelessness. They're what happens when teams move fast without a complete picture of what they're building on top of.

Projects end, and infrastructure doesn't. When a feature gets deprecated or a project gets cancelled, the cloud resources that supported it rarely get cleaned up at the same time. The team moves on. The resources stay.
Ownership drifts over time. Engineers leave. Teams reorganize. A service that was clearly owned by the payments team in 2023 might have three possible owners or none by 2026. When nobody owns something clearly, nobody feels responsible for cleaning it up.
Temporary infrastructure becomes permanent. A staging environment that was supposed to last two weeks is still running eight months later because everyone assumed someone else was going to turn it off. They didn't.
Cloud accounts multiply faster than governance does. A company that started with one AWS account now has forty. Resources spread across accounts are harder to find, harder to attribute, and much easier to forget.

Industry estimates put cloud waste at 20 to 30 percent of total cloud spend. For a team running a $1M/year AWS bill, that's up to $300K sitting in things that are either doing nothing or doing something nobody can explain anymore.

How to Clean Up Zombie Resources Without Breaking Things

Most cleanup efforts fail not because the team doesn't care but because they move too fast and break something. Here's how to do it without that 2am phone call.

Step 1: Build the inventory

Before you can remove anything, you need a complete list of what's running across all your accounts and regions. Not what your IaC (Infrastructure as Code) says should be running -- what's actually running. These two things are often different. Resources that were created manually, resources that were never added to Terraform, and resources that drifted from their defined state all show up in the live environment but not in your code.

Step 2: Map ownership

For each resource, you need to know: who created this, what team does it belong to, and what project is it supporting? This is where most cleanup efforts stall, and honestly, where most teams realize their tagging discipline was never as good as they thought. If the tags don't tell you, you need to infer ownership from other signals: deployment history, IAM identity logs, IaC repository commits.

Step 3: Check dependencies

This is the step that saves you. A resource that looks completely idle might be a shared dependency for something nobody thought to document. An EBS volume with no recent reads might be an emergency backup that someone very much still cares about. Blast radius analysis -- understanding what breaks if you remove something -- is what separates a cleanup from a cleanup that becomes an incident. You can read more about how OpsCanvas approaches blast radius mapping for cost optimization.

Step 4: Classify and prioritize

Not all waste is equal. Prioritize by cost impact first (focus on what's costing the most), then confidence level (start with resources where you're certain they're unused), then rollback safety (flag anything uncertain as a soft delete before a hard delete).

Step 5: Decommission, don't just delete

Stop the resource first, wait a week, watch for any alerts or complaints, then remove it. For anything significant, capture the IaC lineage so you could recreate it if needed. Decommission -- don't just delete.

Key Takeaways

✓

Zombie resources are orphaned, idle, or over-provisioned cloud resources still running and costing money after they've outlived their purpose.

✓

The hard part isn't finding them -- it's knowing which ones are safe to remove without breaking something downstream.

✓

Ownership mapping and dependency checking are the two steps most teams skip, and the reason most cleanup efforts cause incidents.

✓

A one-time cleanup isn't enough -- waste re-accumulates within 6 to 12 months without ongoing monitoring.

✗

If you don't know what half your resources are for, you have a context problem -- not just a cleanup problem.

Frequently Asked Questions

How much waste do most companies really have?

Industry research consistently puts cloud waste at 20 to 35 percent of total cloud spend for most mid-size companies. For a team spending $1M a year on AWS, that's $200K to $350K sitting in resources that aren't earning their keep.

Can't I just use AWS Cost Explorer or a FinOps tool to find zombie resources?

Cost tools show you that waste exists. What they don't tell you is who created the resource, what it was for, what depends on it, or whether it's safe to remove. You need cost data plus ownership data plus dependency data in the same place to act confidently. That's the gap the OpsCanvas Context Graph is built to close.

What's the difference between a zombie resource and an underutilized resource?

An underutilized resource is still serving a purpose -- it's just oversized for what it's doing. A zombie resource is serving no purpose at all, or a purpose that no longer exists. Both cost money, but they require different remediation: right-sizing for underutilized, decommissioning for zombies.

How often should we do zombie resource cleanup?

A one-time cleanup is better than nothing, but waste re-accumulates quickly. Teams that do a single audit typically find themselves back at the same waste level within 6 to 12 months. Ongoing monitoring -- regularly scanning for newly orphaned or idle resources -- is the only way to keep it from compounding.

What if we don't know what half our resources are for?

More common than anyone likes to admit. And it's not just a cleanup problem -- it's a sign your cloud environment has drifted significantly from what your documentation says. The zombie resources are the symptom. The real problem is that nobody has a current, accurate picture of what's running and why. Until you have that, any cleanup is just guessing.

Zombie resources aren't going away on their own. Every month you don't deal with them, the bill goes up a little more and the ownership picture gets a little blurrier.

But here's what the cleanup usually surfaces: zombie resources are a symptom of a deeper problem. Most engineering teams don't have a live, accurate picture of what's running in their cloud, who owns what, what depends on what, and what was built for a project that ended eighteen months ago. The cleanup matters. So does understanding why the mess accumulated in the first place. Without that picture, the waste comes back. The teams that stay ahead of it aren't the ones running cleanup scripts more often. They're the ones that stopped flying blind.

The hard part isn't finding the zombies. It's knowing which ones are safe to kill.

Try OpsCanvas

See what's actually running in your cloud.

Oscar builds a live context graph of your cloud in under 30 minutes -- mapping every resource, owner, dependency, and cost signal automatically. No tagging required. Free to download.

↓ Get Oscar Free Request a Zombie Waste Assessment →

Brian Kathman

CEO and Co-Founder, OpsCanvas

Brian co-founded OpsCanvas after running Signal Vine, where firsthand experience with cloud sprawl, surprise bills, and ownership gaps shaped the platform. He writes about cloud operations, AI agents, and what it actually takes to run infrastructure at scale.

LinkedIn →

Cloud cost Zombie resources Cloud operations FinOps Cost optimization