Most servers have some form of backup running. Most teams have never tested restoring from one. That gap — between having backups and trusting backups — is where disasters actually happen.
This post walks through building a backup strategy that holds up under pressure: picking the right backup types, setting realistic recovery objectives, scheduling intelligently, and making restore testing a normal part of operations instead of a last-minute scramble.
Start with RPO and RTO — Not Backup Software
Before you configure a single cron job, answer two questions:
- RPO (Recovery Point Objective): How much data loss is acceptable? If your database is backed up once a day and a failure hits at 11 PM, you could lose nearly 24 hours of data. Is that acceptable for your application?
- RTO (Recovery Time Objective): How long can your service be down during recovery? An e-commerce site might tolerate 30 minutes. A corporate intranet might tolerate 4 hours.
These two numbers dictate everything downstream — backup frequency, retention policy, storage location, and whether you need hot standby or cold restore. Define them with your stakeholders before touching configuration.
A common mistake: teams set a daily backup schedule because it's the default, then later discover their RPO is actually 4 hours. The backup schedule should be a deliberate answer to the RPO question, not an afterthought.
Full, Incremental, and Differential Backups
Understanding backup types lets you build efficient schedules without ballooning storage costs.
Full backups
A full backup copies everything — all files, all databases, all configuration. It's the slowest and most storage-intensive type, but restoration is straightforward: one archive, one restore operation.
Full backups are typically run once daily at a low-traffic window (commonly midnight). On managed hosting, the first daily backup is always a full backup — that's the anchor point for everything else.
Incremental backups
An incremental backup only captures what changed since the last backup — whether that was a full or another incremental. They're fast and small, which makes them ideal for high-frequency schedules. The tradeoff: restoration requires applying the full backup first, then every incremental in sequence. More moving parts, more potential for a missing piece.
Here's a simple rsync-based incremental backup approach using hard links:
#!/bin/bash DATE=$(date +%Y-%m-%d_%H-%M) DEST="/backups/incremental/$DATE" LINK="/backups/incremental/latest" rsync -az --link-dest="$LINK" /var/www/html/ "$DEST/" rm -f "$LINK" ln -s "$DEST" "$LINK"The --link-dest flag tells rsync to hard-link unchanged files from the previous backup, so identical files don't consume extra disk space across snapshots.
Differential backups
A differential backup captures everything changed since the last full backup. They grow larger over time as more changes accumulate, but restoration only needs two pieces: the full backup plus the most recent differential. Simpler than incremental chains, but more storage than pure incrementals.
Scheduling for Your RPO
Once you've mapped your RPO, translate it directly into a schedule. A few practical patterns:
- RPO of 24 hours: One full backup per day. Simple, low overhead. Suitable for low-write applications like marketing sites or documentation.
- RPO of 6–8 hours: One daily full + 2–3 additional backups (incremental or partial). You cover the full day in four windows without multiplying storage costs.
- RPO of 1–2 hours: Combine scheduled backups with database-level streaming replication or WAL archiving (for PostgreSQL) or binary log shipping (for MySQL).
For teams running high-write workloads — WooCommerce stores, SaaS apps, anything with frequent transactions — it's worth increasing backup frequency well beyond the default daily snapshot. We let servers run up to four automatic backups per day, with configurable partial backups (files-only or database-only) for the mid-day windows, so you're not storing full copies of unchanged static assets every six hours.
Off-Server Storage Is Non-Negotiable
A backup stored on the same physical server as your data is not a backup — it's a copy that will fail at exactly the same moment your data does. Every backup strategy needs at least one off-server destination.
Practical options:
- Object storage (S3, Backblaze B2, Wasabi): Cheap, durable, and easy to automate. Use lifecycle policies to expire old backups automatically.
- A separate VPS or dedicated server: Useful when you need backups accessible for rapid restore without downloading from cloud storage.
- Offsite physical media: Relevant for compliance-heavy environments. Usually paired with cloud, not instead of it.
Automating an upload to S3 after each local backup is straightforward with the AWS CLI:
aws s3 sync /backups/daily/ s3://your-bucket/server-backups/ \ --storage-class STANDARD_IA \ --deleteSTANDARD_IA (Infrequent Access) cuts storage costs significantly for backups you rarely retrieve. The --delete flag mirrors deletions, so your S3 bucket reflects your local retention policy.
Retention Policy: How Long to Keep What
More retention is not always better. Keeping 90 days of daily full backups for an active server is expensive and rarely useful — most restores happen within the first 7–14 days of an incident being noticed.
A tiered retention model works well for most production systems:
- Daily backups: keep 7–14 days
- Weekly snapshots (keep one per week): retain 4–8 weeks
- Monthly snapshots: retain 6–12 months (for compliance or long-tail rollbacks)
S3 lifecycle rules can enforce this automatically without manual cleanup:
aws s3api put-bucket-lifecycle-configuration \ --bucket your-bucket \ --lifecycle-configuration file://lifecycle.jsonWhere lifecycle.json defines transition and expiration rules per prefix. AWS documentation covers the full schema — it's worth spending an hour getting this right once rather than paying for years of accumulated snapshots.
Testing Restores: The Part Everyone Skips
An untested backup is a hypothesis. You think it will work. You don't know it will work.
Restore testing should be scheduled, documented, and treated like any other operational procedure. Here's a minimal restore test checklist:
- Restore files to an isolated staging environment — never production
- Verify file integrity with checksums if your backup tool supports them
- Restore the database and run a sanity check query — row counts, last-modified timestamps
- Confirm the application boots and serves requests correctly from restored data
- Log the time taken from restore initiation to verified working state — this is your actual RTO
Run this drill quarterly at minimum. Run it once after any significant infrastructure change. If your actual restore time consistently exceeds your target RTO, that's the signal to invest in faster storage, better tooling, or a warm standby environment.
The goal isn't a perfect backup system — it's a backup system you've proven works before you actually need it.
Database Backups Deserve Special Attention
File-system-level backups of a running database can produce inconsistent snapshots — you might catch a write mid-transaction. Use database-native tools instead:
For MySQL/MariaDB:
mysqldump --single-transaction --routines --triggers \ --all-databases | gzip > /backups/db/$(date +%Y-%m-%d_%H-%M).sql.gz--single-transaction uses a consistent snapshot for InnoDB tables without locking. Skip it and you risk a corrupt dump on a busy database.
For PostgreSQL:
pg_dumpall | gzip > /backups/db/$(date +%Y-%m-%d_%H-%M).sql.gzFor large databases where mysqldump or pg_dumpall is too slow, look at Percona XtraBackup (MySQL) or pg_basebackup with WAL archiving (PostgreSQL). Both support hot backups with minimal performance impact.
The One Thing That Actually Matters
You can have the most sophisticated backup pipeline in the world — multi-region replication, encrypted archives, versioned object storage — and still lose data if you've never verified a restore works end-to-end.
Build the schedule. Automate the offsite transfer. Set a quarterly calendar reminder to test a restore on staging. That last step is the one that separates teams that recover quickly from teams that spend 72 hours in a war room trying to reconstruct a database from application logs.
Backups are insurance. Test them before you need to file a claim.