blog/database/soft-deletes-vs-hard-deletes
Database Design & Internals

Soft Delete vs Hard Delete: The Trade-offs Nobody Tells You About

Soft deletes sound safe -- just set a flag and the data is 'gone.' But they quietly introduce query complexity, unique constraint headaches, and storage bloat. Here's a clear-eyed look at both strategies and when each one actually makes sense.

ยท16 min read

Two Disasters, One Delete Button

Picture this. It is 2 AM and your on-call engineer runs a cleanup script against production. The script hard-deletes 12,000 user accounts that were flagged as "inactive." Turns out the flag was wrong -- those users were active paying customers. The data is gone. No backup covers the last 6 hours. The recovery effort takes three days and makes the front page of Hacker News for all the wrong reasons.

Now picture the opposite scenario. Your app uses soft deletes. A user named Alice signs up with alice@example.com, then deletes her account. Three months later, a different Alice tries to sign up with the same email. She gets: "Email already in use." Your support team can't figure out why. The original Alice's row is still sitting in the users table with is_deleted = true, and the unique constraint on email doesn't care about that flag.

Both of these are real patterns that real teams hit. The database deletion strategy you choose shapes your application's reliability, compliance posture, and day-to-day query complexity in ways that don't show up in the "getting started" guide.

Let's break down both approaches, the trade-offs that actually matter, and a decision framework you can use on your next project.


What Each Strategy Actually Means

At the highest level, the difference between soft delete and hard delete is simple: one removes the row, the other pretends to.

Hard Delete vs Soft Delete Flow
Application
Delete request
option A
Hard Delete
DELETE FROM table
Soft Delete
UPDATE SET deleted_at
Row Removed
Data is gone forever
Row Flagged
Data still exists

Hard delete means exactly what it sounds like: you issue a DELETE statement and the row is physically removed from the table. Once the transaction commits and your backups rotate, that data is unrecoverable through normal means.

Soft delete (also called a logical delete) means you mark the row as deleted without actually removing it. The row stays in the table. Your application code is responsible for filtering it out of every query.

The soft delete pattern appeals to teams because it feels safer. You can "undo" a delete. You have an audit trail. You never lose data. But that safety comes with a cost that compounds over time.


The Soft Delete Pattern: Implementation Basics

There are two common ways to implement soft deletes, and the choice between them matters more than you might think.

The Boolean Flag: is_deleted

The simplest approach. Add a boolean column:

Boolean Soft Delete
$ -- Schema\nALTER TABLE users ADD COLUMN is_deleted BOOLEAN DEFAULT FALSE;\n\n-- Delete a user\nUPDATE users SET is_deleted = TRUE WHERE id = 42;\n\n-- Query active users\nSELECT * FROM users WHERE is_deleted = FALSE;
ALTER TABLE\nUPDATE 1\n id | email              | is_deleted\n----+--------------------+-----------\n  1 | bob@example.com    | false\n  7 | carol@example.com  | false

This tells you whether something was deleted but not when. That missing timestamp turns out to be important for debugging, compliance reporting, and data retention policies.

The Timestamp: deleted_at

The better variant. Use a nullable timestamp instead:

Timestamp Soft Delete
$ -- Schema\nALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ DEFAULT NULL;\n\n-- Delete a user\nUPDATE users SET deleted_at = NOW() WHERE id = 42;\n\n-- Query active users\nSELECT * FROM users WHERE deleted_at IS NULL;
ALTER TABLE\nUPDATE 1\n id | email              | deleted_at\n----+--------------------+-----------\n  1 | bob@example.com    | [null]\n  7 | carol@example.com  | [null]
๐Ÿ“Œ deleted_at vs is_deleted

Prefer deleted_at over is_deleted. A nullable timestamp gives you everything the boolean does (NULL = active, NOT NULL = deleted) plus a free audit trail of when the deletion happened. You can use it for time-based retention policies, compliance reporting, and debugging. The boolean flag is a strict subset of what the timestamp provides.

With either approach, your application layer needs to enforce the filter. Every query that touches a soft-deletable table must include the WHERE deleted_at IS NULL (or WHERE is_deleted = FALSE) predicate. Miss one, and you've got ghost data leaking into your application.

This is where the trouble starts.


The Hidden Costs of Soft Deletes

Soft deletes don't announce their costs upfront. They accumulate quietly until one day you're staring at a slow query plan and a table that's 60% dead rows.

Cost 1: Every Query Gets More Complex

The most immediate tax is query complexity. Every single query against a soft-deletable table needs a filter clause. This includes:

  • Direct queries from your application
  • JOIN conditions (you need to filter both sides)
  • Aggregations and reports
  • Background jobs and cron tasks
  • Third-party integrations reading your database
1

Simple query without soft deletes

SELECT * FROM orders WHERE user_id = 42

2

Same query with soft deletes on both tables

SELECT * FROM orders WHERE user_id = 42 AND orders.deleted_at IS NULL AND EXISTS (SELECT 1 FROM users WHERE users.id = orders.user_id AND users.deleted_at IS NULL)

3

Multiply this by every query in your codebase

ORMs and middleware can help automate this, but they add their own layer of indirection. And they can be bypassed -- raw queries, database migrations, reporting tools, and admin consoles all skip your ORM's default scopes.

Cost 2: Storage and Index Bloat

Soft-deleted rows never leave. They sit in your table, consume disk space, bloat your indexes, and slow down sequential scans. On a high-churn table, the ratio of dead-to-live rows can get alarming.

Table Scan Performance: 1M Total Rows
Hard delete (1M live rows)120ms
Soft delete (200K live, 800K dead)450ms
Soft delete unindexed filter890ms

The performance hit comes from two places. First, the table is physically larger, so full scans touch more pages. Second, if deleted_at isn't indexed (or isn't part of a partial index), the database has to evaluate that filter condition against every row.

Cost 3: Foreign Key Confusion

If a parent row is soft-deleted but its children aren't, your data model is in an inconsistent state. Hard deletes with ON DELETE CASCADE handle this automatically. Soft deletes require you to cascade the soft delete manually through every related table -- and remember to do so every time you add a new relationship.


The Unique Constraint Problem

This deserves its own section because it catches almost every team that adopts soft deletes, and the fix isn't obvious.

Here's the scenario: your users table has a unique constraint on email. User Alice (alice@example.com) deletes her account. You soft-delete her row. Now a new user tries to sign up with alice@example.com. The database rejects the insert because the unique constraint sees the soft-deleted row and says "that email already exists."

๐Ÿ“Œ Unique Constraint + Soft Delete Conflict

Standard unique constraints don't distinguish between active and soft-deleted rows. A soft-deleted row with email = 'alice@example.com' blocks any new row with the same email, even though the original row is logically "gone." This is one of the most common soft delete gotchas, and it requires a database-specific workaround.

There are two common solutions:

Partial index (PostgreSQL):

CREATE UNIQUE INDEX idx_users_email_active
ON users (email)
WHERE deleted_at IS NULL;

This index only enforces uniqueness among non-deleted rows. It's clean and efficient, but it's a PostgreSQL-specific feature. MySQL doesn't support partial indexes natively.

Composite unique constraint:

-- Include deleted_at in the unique constraint
ALTER TABLE users ADD CONSTRAINT uq_users_email
UNIQUE (email, deleted_at);

This works because each soft-deleted row gets a different deleted_at timestamp, making the combination unique. But there's a catch: in most databases, NULL values are considered distinct in unique constraints, so two active rows (both with deleted_at = NULL) would not conflict in some database engines. PostgreSQL treats NULLs as distinct, meaning this approach doesn't work there without NULLS NOT DISTINCT (available in PostgreSQL 15+). Test your specific database engine.

Neither solution is free. Both add complexity that you wouldn't need with hard deletes.


When Hard Deletes Are Perfectly Fine

Hard deletes get a bad reputation because they sound scary and permanent. But for many applications, they're the right choice.

Hard deletes work well when:

  • You don't need historical data. Session tokens, temporary files, expired OTP codes, shopping cart items -- nobody needs to recover these.
  • The data is easily reproducible. Cached computations, derived tables, materialized views -- you can regenerate them.
  • Regulatory requirements demand actual deletion. GDPR's "right to erasure" may require you to actually remove personal data, not just flag it (more on this below).
  • Your tables are high-churn. If rows are created and deleted frequently (think message queues, job tables, notification logs), soft deletes cause rapid bloat.
  • You have proper backups. A solid backup and point-in-time recovery strategy covers the "oops" case far better than soft deletes, because it covers all tables, not just the ones you remembered to add deleted_at to.

For most standard CRUD applications, hard deletes combined with a good backup strategy are simpler, faster, and entirely sufficient.


GDPR and the Compliance Trap

๐Ÿ”ด Soft Delete May Not Satisfy 'Right to Erasure'

GDPR Article 17 grants users the "right to erasure" -- the right to have their personal data deleted. A soft delete that keeps the data in your database with a flag may not satisfy this requirement. Regulators and courts have generally interpreted "erasure" to mean the data should be rendered irrecoverable, not merely hidden from the application UI. If you operate in the EU or handle EU citizens' data, consult legal counsel before relying on soft deletes as your deletion mechanism for personal data.

This is the compliance trap that catches teams off guard. They adopt soft deletes thinking "we'll keep everything just in case" and then discover that their data retention practices conflict with privacy regulations.

The tension is real: your legal team wants data gone, your product team wants data recoverable, and your compliance team wants an audit trail. This three-way conflict is exactly what leads teams to the hybrid approach.


The Third Option: Audit Log + Hard Delete

There's a pattern that gives you the best properties of both approaches: log the deletion event, then hard-delete the actual data.

The idea is simple. Before you delete a row, you write a record to a separate audit log table capturing what was deleted, when, by whom, and why. Then you hard-delete the original row.

Audit Log + Hard Delete Pattern
1. Write event2. DELETE rowretainedpermanentApplicationDelete requestaudit_logImmutable event logusers tableOnly live rowsEvent Recordedwho, what, whenRow RemovedClean table

The audit log table might look like this:

CREATE TABLE audit_log (
    id          BIGSERIAL PRIMARY KEY,
    table_name  TEXT NOT NULL,
    record_id   BIGINT NOT NULL,
    action      TEXT NOT NULL,       -- 'DELETE', 'UPDATE', etc.
    old_data    JSONB,               -- snapshot of the deleted row
    performed_by BIGINT,             -- who did it
    performed_at TIMESTAMPTZ DEFAULT NOW()
);

This gives you:

  • Clean main tables: no dead rows, no filter clauses, no bloated indexes
  • Full audit trail: you know what was deleted, when, and by whom
  • Data recovery option: the old_data JSONB column holds a snapshot you can restore from
  • Compliance flexibility: you can purge the audit log on a schedule to satisfy data retention policies, or redact PII from the JSONB while keeping the event metadata

The trade-off is additional write overhead (one extra INSERT per delete) and a separate table to manage. But for most applications, this is a small price for the clean separation of concerns.

โœ… Use database triggers for reliability

If you go with the audit log pattern, consider implementing it as a database trigger rather than application code. A trigger fires regardless of whether the delete comes from your app, an admin console, or a migration script. It's a safety net that doesn't depend on every caller remembering to log first.


Should You Soft Delete? A Decision Framework

The answer depends on your specific constraints. Walk through this:

Should You Use Soft Deletes?

Do you need to recover individual deleted records frequently?


Comparison: All Three Approaches Side by Side

CharacteristicHard DeleteSoft DeleteAudit Log + Hard Delete
Data recoveryOnly via backupsTrivial (flip the flag)Possible from audit log JSONB
Query complexityNo added filtersEvery query needs WHERE clauseNo added filters on main tables
Unique constraintsWork naturallyRequire workarounds (partial indexes)Work naturally
Storage efficiencyOptimal (dead rows removed)Grows over time with dead rowsMain tables optimal, audit log grows separately
Index performanceClean indexesIndexes include dead rows (bloat)Clean indexes on main tables
Audit trailNone (data is gone)Implicit (deleted rows still exist)Explicit (structured event log)
GDPR complianceSatisfies right to erasureMay NOT satisfy right to erasureFlexible (purge audit log on schedule)
Implementation effortMinimalModerate (ORM scopes, filters, cascading)Moderate (audit table, triggers/hooks)
Cascade handlingON DELETE CASCADE worksMust cascade soft delete manuallyON DELETE CASCADE works
Third-party tool safetySafe (no hidden rows)Risky (tools may not filter)Safe (no hidden rows)

Soft Delete Best Practices (If You Go That Route)

If after weighing the trade-offs you decide soft deletes are the right fit, here are the practices that prevent the most pain:

  1. Use deleted_at timestamp, not a boolean. You get the "when" for free, and WHERE deleted_at IS NULL is just as efficient to index as WHERE is_deleted = FALSE.

  2. Add partial indexes for unique constraints. In PostgreSQL, CREATE UNIQUE INDEX ... WHERE deleted_at IS NULL solves the duplicate constraint problem cleanly.

  3. Set up a purge job. Soft-deleted rows older than your retention period (30 days, 90 days, whatever your policy dictates) should be hard-deleted on a schedule. Soft delete is a grace period, not a forever archive.

  4. Use database views or ORM default scopes. Create a view like active_users that pre-applies the WHERE deleted_at IS NULL filter, and use that view as the default interface. This reduces the chance of forgetting the filter.

  5. Test with realistic data ratios. If your production table is 70% soft-deleted rows, your development database should reflect that. Soft delete performance problems don't show up on small datasets.

  6. Document which tables use soft deletes. Not every table needs this pattern. Be explicit about which tables are soft-deletable and which aren't.


Key Takeaways

โœ… Key Takeaways

  • Soft deletes are not free. They add query complexity, break unique constraints, cause index bloat, and may not satisfy GDPR requirements. Adopt them deliberately, not by default.

  • Hard deletes are underrated. For most CRUD applications, hard deletes plus a solid backup strategy are simpler, faster, and entirely sufficient.

  • The audit log + hard delete hybrid gives you the best of both worlds: clean tables, full audit trail, and compliance flexibility. It's often the right answer for systems that need traceability without the baggage of soft deletes.

  • If you do use soft deletes, use deleted_at timestamps (not booleans), add partial indexes, set up a purge schedule, and make sure your entire team knows which tables are affected.

  • There is no universal answer. The right database deletion strategy depends on your recovery needs, compliance requirements, query patterns, and team discipline. Use the decision framework above and choose per-table, not per-application.


References