From Legacy to Immutable: A Practical IaC Maturity Journey

By Anas Semesmieh · March 24, 2026 · IaC, Cloud Architecture

Immutable infrastructure is easy to praise and easy to misapply. In practice, most teams do not jump straight from long-lived mutable servers to golden images, short-lived nodes, and full replacement rollouts. They inherit live systems, handwritten fixes, environment drift, and a backlog of undocumented assumptions. The real journey is not from bad to perfect. It is from undocumented mutation to controlled replacement.

A system becomes meaningfully more immutable when three things are true: desired state is small and explicit, restore is automated, and operational changes flow back through versioned code instead of staying trapped on the host.

I have found that the best way to explain the idea is with real code rather than architecture slogans. In my public homelab repo, the goal is not to pretend the host is stateless. The goal is to make the host rebuildable: configurations are tracked, secrets are intentionally excluded, restore steps are scripted, and service bring-up follows a documented order. That is already much closer to immutable operations than "SSH in and fix it later."

1. Start by shrinking the definition of desired state

The first mistake teams make is trying to version-control the whole machine. That usually fails because runtime state, logs, downloaded media, database files, and ad hoc secrets do not belong in the same lifecycle as declarative config. If everything is "state," nothing is safely replaceable.

The homelab repo solves that with an explicit allowlist in inventory/include-paths.txt. Only restore-worthy configuration files are tracked. That is a small design choice with big consequences: it forces a clean boundary between declarative state and runtime state.

setup-repo.sh
arrstack/docker-compose.yml
homeassistant/config/configuration.yaml
homepage/config/services.yaml
traefik/docker-compose.yml
traefik/dynamic/tls.yaml
vaultwarden/docker-compose.yml
vaultwarden/backup-to-nas.sh

This is the kind of detail that makes immutability practical. A machine is not immutable because you say it is. It becomes replaceable when you can point to the specific files that define its behavior and say: these are the inputs required to rebuild service configuration, and everything else is either generated, cached, or backed up through a different path.

That is also why the repository README explicitly excludes media, logs, databases, raw credentials, and other runtime artifacts. Developers often skip that boundary and then wonder why their so-called infrastructure repo is impossible to restore cleanly.

2. The restore path is the real immutability test

A surprising amount of "immutable infrastructure" discussion never shows the rebuild path. That is a problem. If you cannot recreate the system from a repo plus a bounded set of secrets, you do not have an immutable practice. You have a theory.

In this repo, restore is not a wiki-only procedure. It is a containerized workflow. The helper stack in restore/docker-compose.restore.yml runs a purpose-built restore job against the tracked config set:

services:
  restore-configs:
    image: alpine:3.20
    user: "0:0"
    working_dir: /repo
    command: ["/bin/sh", "/repo/restore/apply-configs.sh"]
    environment:
      TARGET_HOME: ${TARGET_HOME:-/home/anas}
      DRY_RUN: ${DRY_RUN:-true}
      OVERWRITE: ${OVERWRITE:-false}
    volumes:
      - ..:/repo:ro
      - /:/host

That is developer-friendly for two reasons. First, the repo is mounted read-only, which protects the source of truth during restore. Second, the runtime behavior is controlled by explicit flags. The default mode is a dry run, not destructive copy. That is exactly the kind of safety rail mature IaC workflows need.

The restore script itself, restore/apply-configs.sh, is deliberately simple and that simplicity is a strength:

echo "[info] restore job starting"
echo "[info] target home: ${TARGET_HOME}"
echo "[info] dry run: ${DRY_RUN}"

while IFS= read -r rel; do
  src="${SRC_ROOT}/${rel}"
  dest="/host${TARGET_HOME}/${rel}"

  if [ "${DRY_RUN}" = "true" ]; then
    echo "[dry-run] copy ${src} -> ${dest}"
    continue
  fi

  mkdir -p "$(dirname "${dest}")"
  cp -f "${src}" "${dest}"
done < "${INCLUDE_FILE}"

The important part is not the shell syntax. It is the control flow. The script iterates only over the allowlisted file set, derives a deterministic destination, previews before applying, and supports explicit overwrite behavior. That is the kind of operational contract developers can trust during recovery.

cd ~/homelab-backup/restore
docker compose -f docker-compose.restore.yml run --rm \
  -e TARGET_HOME=/home/anas \
  -e DRY_RUN=true \
  restore-configs

If you can run that preview on a fresh host and see exactly what will be restored, you are no longer depending on tribal memory. You are operating from codified intent.

3. Mutable changes still happen, so capture them before they become snowflakes

This is where many teams get stuck. They know the right answer is to rebuild rather than patch in place, but real systems still need updates, token refreshes, config changes, and operational fixes. If those changes happen on the host and never flow back into the repo, the infrastructure starts drifting immediately.

The backup pipeline in scripts/backup-and-push.sh is a useful bridge pattern between mutable reality and immutable goals:

LOCK_FILE="${REPO_ROOT}/.backup.lock"

./scripts/sync-configs.sh
./scripts/redact-secrets.sh
./scripts/scan-secrets.sh

git add -A

if git diff --cached --quiet; then
  echo "[info] no changes to commit"
  exit 0
fi

commit_msg="backup: automated snapshot $(date -u +'%Y-%m-%dT%H:%M:%SZ')"
git commit -m "${commit_msg}"
git pull --rebase --autostash origin main
git push origin main

There are several strong engineering choices in that short script:

A lock file prevents overlapping backup jobs from corrupting the workflow.
Config sync happens before redaction and secret scanning, so repo state is sanitized before commit.
The job is idempotent when nothing changed.
The rebase step reduces divergence between local automation and remote history.

No, that is not full image-based immutability. But it is a disciplined anti-drift mechanism. It shortens the gap between "the server changed" and "the desired state repo reflects the change," which is exactly how teams mature toward more replaceable systems.

4. Secrets and data are where immature immutable stories usually fall apart

Another common failure mode is pretending secrets and persistent data will somehow solve themselves later. They will not. Strong immutable design depends on being explicit about what gets rebuilt, what gets restored, and what gets rehydrated.

The restore runbook in docs/RESTORE.md makes this boundary visible. The repo is cloned, an optional weekly snapshot tag can be checked out, then secrets are re-injected into the sanitized configs before any services are brought online. For example, redacted placeholders in tracked files are replaced only on the recovery target:

source ~/.homelab-secrets.env

sed -i "s|WIREGUARD_PRIVATE_KEY: REDACTED|WIREGUARD_PRIVATE_KEY: $WIREGUARD_PRIVATE_KEY|g" \
  configs/arrstack/docker-compose.yml

sed -i "s|^AUTHENTIK_SECRET_KEY=REDACTED|AUTHENTIK_SECRET_KEY=$AUTHENTIK_SECRET_KEY|" \
  configs/authentik/.env

That pattern is worth calling out because it demonstrates real maturity. The repo is safe to publish and clone, but still precise enough to restore the system once local secrets are rehydrated. Immutable infrastructure is not about ignoring sensitive state. It is about making the boundary between tracked configuration and sensitive runtime inputs explicit and automatable.

The same principle applies to persistent data. Databases and media are not treated as if they belong in the config snapshot. They use separate backup flows. That separation is what keeps the config restore path simple and predictable.

5. Rebuild-first thinking is more important than any one tool

The phrase immutable infrastructure often gets reduced to tooling debates: Packer versus Docker, Terraform versus Pulumi, VM images versus Kubernetes, cloud-init versus Ansible. Those choices matter, but they are not the first maturity milestone. The first milestone is whether engineers optimize for in-place repair or for reproducible replacement.

A practical maturity ladder looks more like this:

Stage	Behavior	Risk
Mutable Ops	SSH, patch, edit live files, hope nothing important was forgotten	High drift, weak recovery, poor auditability
Tracked Config	Version controlled compose files, config files, and setup scripts	Better visibility, but rebuild path may still be manual
Scripted Restore	Dry-run restore jobs and documented bring-up order	Lower recovery risk and clearer change boundaries
Rebuild First	Replace hosts or workloads from codified inputs whenever practical	Low drift and stronger rollback confidence

That is why I like the term journey in the title. Most environments are hybrid for a long time. Some parts are rebuilt from code, some are restored from snapshots, and some legacy components still need careful in-place handling. The win is not ideological purity. The win is steadily reducing the amount of infrastructure whose behavior depends on undocumented mutation.

6. What developers should take from this

If you are building or operating services, the practical question is not "are we immutable yet?" The better question is: what would I need, in Git, to recreate this safely on a new host? That question tends to force the right engineering decisions:

Separate config from runtime data.
Track only the files that define service behavior.
Make restore executable and previewable.
Rehydrate secrets outside the public source of truth.
Continuously pull host changes back into version control until rebuild-only workflows are possible.

That is how teams move from "pets" to something closer to cattle without pretending their current systems are already there. Good immutable practice is usually a series of smaller engineering choices that make replacement safer than repair.

Closing thought

Immutable infrastructure is not a slogan about never logging into servers. It is a discipline of reducing hidden state, codifying restore, and making replacement the default recovery path. The more your system can be rebuilt from a small, explicit, versioned definition plus a controlled secret set, the more operational confidence you gain.