Soft Delete vs Hard Delete: The Trade-offs Nobody Tells You About
Soft deletes sound safe -- just set a flag and the data is 'gone.' But they quietly introduce query complexity, unique constraint headaches, and storage bloat. Here's a clear-eyed look at both strategies and when each one actually makes sense.
Two Disasters, One Delete Button
Picture this. It is 2 AM and your on-call engineer runs a cleanup script against production. The script hard-deletes 12,000 user accounts that were flagged as "inactive." Turns out the flag was wrong -- those users were active paying customers. The data is gone. No backup covers the last 6 hours. The recovery effort takes three days and makes the front page of Hacker News for all the wrong reasons.
Now picture the opposite scenario. Your app uses soft deletes. A user named Alice signs up with alice@example.com, then deletes her account. Three months later, a different Alice tries to sign up with the same email. She gets: "Email already in use." Your support team can't figure out why. The original Alice's row is still sitting in the users table with is_deleted = true, and the unique constraint on email doesn't care about that flag.
Both of these are real patterns that real teams hit. The database deletion strategy you choose shapes your application's reliability, compliance posture, and day-to-day query complexity in ways that don't show up in the "getting started" guide.
Let's break down both approaches, the trade-offs that actually matter, and a decision framework you can use on your next project.
What Each Strategy Actually Means
At the highest level, the difference between soft delete and hard delete is simple: one removes the row, the other pretends to.
Hard delete means exactly what it sounds like: you issue a DELETE statement and the row is physically removed from the table. Once the transaction commits and your backups rotate, that data is unrecoverable through normal means.
Soft delete (also called a logical delete) means you mark the row as deleted without actually removing it. The row stays in the table. Your application code is responsible for filtering it out of every query.
The soft delete pattern appeals to teams because it feels safer. You can "undo" a delete. You have an audit trail. You never lose data. But that safety comes with a cost that compounds over time.
The Soft Delete Pattern: Implementation Basics
There are two common ways to implement soft deletes, and the choice between them matters more than you might think.
The Boolean Flag: is_deleted
The simplest approach. Add a boolean column:
ALTER TABLE\nUPDATE 1\n id | email | is_deleted\n----+--------------------+-----------\n 1 | bob@example.com | false\n 7 | carol@example.com | false
This tells you whether something was deleted but not when. That missing timestamp turns out to be important for debugging, compliance reporting, and data retention policies.
The Timestamp: deleted_at
The better variant. Use a nullable timestamp instead:
ALTER TABLE\nUPDATE 1\n id | email | deleted_at\n----+--------------------+-----------\n 1 | bob@example.com | [null]\n 7 | carol@example.com | [null]
Prefer deleted_at over is_deleted. A nullable timestamp gives you everything the boolean does (NULL = active, NOT NULL = deleted) plus a free audit trail of when the deletion happened. You can use it for time-based retention policies, compliance reporting, and debugging. The boolean flag is a strict subset of what the timestamp provides.
With either approach, your application layer needs to enforce the filter. Every query that touches a soft-deletable table must include the WHERE deleted_at IS NULL (or WHERE is_deleted = FALSE) predicate. Miss one, and you've got ghost data leaking into your application.
This is where the trouble starts.
The Hidden Costs of Soft Deletes
Soft deletes don't announce their costs upfront. They accumulate quietly until one day you're staring at a slow query plan and a table that's 60% dead rows.
Cost 1: Every Query Gets More Complex
The most immediate tax is query complexity. Every single query against a soft-deletable table needs a filter clause. This includes:
- Direct queries from your application
- JOIN conditions (you need to filter both sides)
- Aggregations and reports
- Background jobs and cron tasks
- Third-party integrations reading your database
Simple query without soft deletes
SELECT * FROM orders WHERE user_id = 42
Same query with soft deletes on both tables
SELECT * FROM orders WHERE user_id = 42 AND orders.deleted_at IS NULL AND EXISTS (SELECT 1 FROM users WHERE users.id = orders.user_id AND users.deleted_at IS NULL)
Multiply this by every query in your codebase
ORMs and middleware can help automate this, but they add their own layer of indirection. And they can be bypassed -- raw queries, database migrations, reporting tools, and admin consoles all skip your ORM's default scopes.
Cost 2: Storage and Index Bloat
Soft-deleted rows never leave. They sit in your table, consume disk space, bloat your indexes, and slow down sequential scans. On a high-churn table, the ratio of dead-to-live rows can get alarming.
The performance hit comes from two places. First, the table is physically larger, so full scans touch more pages. Second, if deleted_at isn't indexed (or isn't part of a partial index), the database has to evaluate that filter condition against every row.
Cost 3: Foreign Key Confusion
If a parent row is soft-deleted but its children aren't, your data model is in an inconsistent state. Hard deletes with ON DELETE CASCADE handle this automatically. Soft deletes require you to cascade the soft delete manually through every related table -- and remember to do so every time you add a new relationship.
The Unique Constraint Problem
This deserves its own section because it catches almost every team that adopts soft deletes, and the fix isn't obvious.
Here's the scenario: your users table has a unique constraint on email. User Alice (alice@example.com) deletes her account. You soft-delete her row. Now a new user tries to sign up with alice@example.com. The database rejects the insert because the unique constraint sees the soft-deleted row and says "that email already exists."
Standard unique constraints don't distinguish between active and soft-deleted rows. A soft-deleted row with email = 'alice@example.com' blocks any new row with the same email, even though the original row is logically "gone." This is one of the most common soft delete gotchas, and it requires a database-specific workaround.
There are two common solutions:
Partial index (PostgreSQL):
CREATE UNIQUE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;This index only enforces uniqueness among non-deleted rows. It's clean and efficient, but it's a PostgreSQL-specific feature. MySQL doesn't support partial indexes natively.
Composite unique constraint:
-- Include deleted_at in the unique constraint
ALTER TABLE users ADD CONSTRAINT uq_users_email
UNIQUE (email, deleted_at);This works because each soft-deleted row gets a different deleted_at timestamp, making the combination unique. But there's a catch: in most databases, NULL values are considered distinct in unique constraints, so two active rows (both with deleted_at = NULL) would not conflict in some database engines. PostgreSQL treats NULLs as distinct, meaning this approach doesn't work there without NULLS NOT DISTINCT (available in PostgreSQL 15+). Test your specific database engine.
Neither solution is free. Both add complexity that you wouldn't need with hard deletes.
When Hard Deletes Are Perfectly Fine
Hard deletes get a bad reputation because they sound scary and permanent. But for many applications, they're the right choice.
Hard deletes work well when:
- You don't need historical data. Session tokens, temporary files, expired OTP codes, shopping cart items -- nobody needs to recover these.
- The data is easily reproducible. Cached computations, derived tables, materialized views -- you can regenerate them.
- Regulatory requirements demand actual deletion. GDPR's "right to erasure" may require you to actually remove personal data, not just flag it (more on this below).
- Your tables are high-churn. If rows are created and deleted frequently (think message queues, job tables, notification logs), soft deletes cause rapid bloat.
- You have proper backups. A solid backup and point-in-time recovery strategy covers the "oops" case far better than soft deletes, because it covers all tables, not just the ones you remembered to add
deleted_atto.
For most standard CRUD applications, hard deletes combined with a good backup strategy are simpler, faster, and entirely sufficient.
GDPR and the Compliance Trap
๐ด Soft Delete May Not Satisfy 'Right to Erasure'
GDPR Article 17 grants users the "right to erasure" -- the right to have their personal data deleted. A soft delete that keeps the data in your database with a flag may not satisfy this requirement. Regulators and courts have generally interpreted "erasure" to mean the data should be rendered irrecoverable, not merely hidden from the application UI. If you operate in the EU or handle EU citizens' data, consult legal counsel before relying on soft deletes as your deletion mechanism for personal data.
This is the compliance trap that catches teams off guard. They adopt soft deletes thinking "we'll keep everything just in case" and then discover that their data retention practices conflict with privacy regulations.
The tension is real: your legal team wants data gone, your product team wants data recoverable, and your compliance team wants an audit trail. This three-way conflict is exactly what leads teams to the hybrid approach.
The Third Option: Audit Log + Hard Delete
There's a pattern that gives you the best properties of both approaches: log the deletion event, then hard-delete the actual data.
The idea is simple. Before you delete a row, you write a record to a separate audit log table capturing what was deleted, when, by whom, and why. Then you hard-delete the original row.
The audit log table might look like this:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
table_name TEXT NOT NULL,
record_id BIGINT NOT NULL,
action TEXT NOT NULL, -- 'DELETE', 'UPDATE', etc.
old_data JSONB, -- snapshot of the deleted row
performed_by BIGINT, -- who did it
performed_at TIMESTAMPTZ DEFAULT NOW()
);This gives you:
- Clean main tables: no dead rows, no filter clauses, no bloated indexes
- Full audit trail: you know what was deleted, when, and by whom
- Data recovery option: the
old_dataJSONB column holds a snapshot you can restore from - Compliance flexibility: you can purge the audit log on a schedule to satisfy data retention policies, or redact PII from the JSONB while keeping the event metadata
The trade-off is additional write overhead (one extra INSERT per delete) and a separate table to manage. But for most applications, this is a small price for the clean separation of concerns.
โ Use database triggers for reliability
If you go with the audit log pattern, consider implementing it as a database trigger rather than application code. A trigger fires regardless of whether the delete comes from your app, an admin console, or a migration script. It's a safety net that doesn't depend on every caller remembering to log first.
Should You Soft Delete? A Decision Framework
The answer depends on your specific constraints. Walk through this:
Do you need to recover individual deleted records frequently?
Comparison: All Three Approaches Side by Side
| Characteristic | Hard Delete | Soft Delete | Audit Log + Hard Delete |
|---|---|---|---|
| Data recovery | Only via backups | Trivial (flip the flag) | Possible from audit log JSONB |
| Query complexity | No added filters | Every query needs WHERE clause | No added filters on main tables |
| Unique constraints | Work naturally | Require workarounds (partial indexes) | Work naturally |
| Storage efficiency | Optimal (dead rows removed) | Grows over time with dead rows | Main tables optimal, audit log grows separately |
| Index performance | Clean indexes | Indexes include dead rows (bloat) | Clean indexes on main tables |
| Audit trail | None (data is gone) | Implicit (deleted rows still exist) | Explicit (structured event log) |
| GDPR compliance | Satisfies right to erasure | May NOT satisfy right to erasure | Flexible (purge audit log on schedule) |
| Implementation effort | Minimal | Moderate (ORM scopes, filters, cascading) | Moderate (audit table, triggers/hooks) |
| Cascade handling | ON DELETE CASCADE works | Must cascade soft delete manually | ON DELETE CASCADE works |
| Third-party tool safety | Safe (no hidden rows) | Risky (tools may not filter) | Safe (no hidden rows) |
Soft Delete Best Practices (If You Go That Route)
If after weighing the trade-offs you decide soft deletes are the right fit, here are the practices that prevent the most pain:
-
Use
deleted_attimestamp, not a boolean. You get the "when" for free, andWHERE deleted_at IS NULLis just as efficient to index asWHERE is_deleted = FALSE. -
Add partial indexes for unique constraints. In PostgreSQL,
CREATE UNIQUE INDEX ... WHERE deleted_at IS NULLsolves the duplicate constraint problem cleanly. -
Set up a purge job. Soft-deleted rows older than your retention period (30 days, 90 days, whatever your policy dictates) should be hard-deleted on a schedule. Soft delete is a grace period, not a forever archive.
-
Use database views or ORM default scopes. Create a view like
active_usersthat pre-applies theWHERE deleted_at IS NULLfilter, and use that view as the default interface. This reduces the chance of forgetting the filter. -
Test with realistic data ratios. If your production table is 70% soft-deleted rows, your development database should reflect that. Soft delete performance problems don't show up on small datasets.
-
Document which tables use soft deletes. Not every table needs this pattern. Be explicit about which tables are soft-deletable and which aren't.
Key Takeaways
โ Key Takeaways
-
Soft deletes are not free. They add query complexity, break unique constraints, cause index bloat, and may not satisfy GDPR requirements. Adopt them deliberately, not by default.
-
Hard deletes are underrated. For most CRUD applications, hard deletes plus a solid backup strategy are simpler, faster, and entirely sufficient.
-
The audit log + hard delete hybrid gives you the best of both worlds: clean tables, full audit trail, and compliance flexibility. It's often the right answer for systems that need traceability without the baggage of soft deletes.
-
If you do use soft deletes, use
deleted_attimestamps (not booleans), add partial indexes, set up a purge schedule, and make sure your entire team knows which tables are affected. -
There is no universal answer. The right database deletion strategy depends on your recovery needs, compliance requirements, query patterns, and team discipline. Use the decision framework above and choose per-table, not per-application.
References
- Soft Deletion Probably Isn't Worth It -- Brandur Leach's influential essay on why soft deletes cause more problems than they solve
- PostgreSQL Partial Indexes -- Official PostgreSQL documentation on creating indexes with WHERE clauses
- GDPR Article 17: Right to Erasure -- Full text of the "right to be forgotten" regulation
- PostgreSQL NULLS NOT DISTINCT -- PostgreSQL 15 feature for unique constraints treating NULLs as equal
- Audit Logging with PostgreSQL Triggers -- Community guide to implementing audit triggers in PostgreSQL