Skip to main content

Restore and migration

This page covers restoring from Velero backups and migrating to a new cluster. For restoring the Ziti database from a local snapshot (without Velero), use nf-restore-snapshot — see Backup overview.

Restoring from a Velero backup

The included restore script walks you through selecting and restoring from a Velero backup:

./velero/velero_restore.sh

The script will:

  1. Verify AWS credentials are available.
  2. Install the Velero plugin if not already present.
  3. Display available backups and prompt you to select one.
  4. Restore the selected backup.
Restoring the Ziti controller PVC

In order for Velero to restore the Ziti controller PVC from the backup, it first needs to delete the existing PVC. The restore script will prompt for this option. If n is selected, the restore will skip restoring the PVC but restore all other resources. By default, Velero will skip restoring a resource if it already exists. See the Velero restore reference documentation for more information.

Restores can also be run manually if you need to use specific Velero flags:

velero restore create --from-backup <backup-name>

Migrating to a new cluster

Migration uses the same backup and restore workflow to move a NetFoundry Self-Hosted installation from one cluster to another.

Preserve your controller advertise address

The controller's advertise address must remain the same after migration. The controller's TLS certificates are issued for this DNS name, and every Ziti client, router, and identity is configured to reach the controller at this address. If the DNS name changes, certificates will be invalid and all clients will lose connectivity.

When migrating, update your DNS records to point the same advertise address at the new cluster's Load Balancer or node IP. Do not change the advertise address itself.

Step 1: Back up the existing cluster

  1. Ensure AWS credentials are loaded into the environment or saved to the credentials file.

  2. Install Velero if not already present:

    K3s:

    velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.12.2 \
    --bucket <S3_BUCKET_NAME> --features=EnableRestic --default-volumes-to-fs-backup --use-node-agent \
    --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
    --secret-file <credentials-file>

    EKS / multi-node:

    velero install --provider aws --plugins velero/velero-plugin-for-aws:v1.12.2 \
    --bucket <S3_BUCKET_NAME> --features=EnableCSI --use-volume-snapshots=true \
    --backup-location-config region=us-east-1 --snapshot-location-config region=us-east-1 \
    --secret-file <credentials-file>
  3. Back up all resources including persistent volumes:

    velero backup create <backup-name> --include-cluster-resources
  4. Destroy the existing cluster.

Step 2: Restore to the new cluster

  1. Create the new cluster.

  2. Load AWS credentials and install Velero (same commands as above).

  3. Run the restore script:

    ./velero/velero_restore.sh
Migration-specific notes
  • EKS: The new cluster will have new Load Balancer addresses. Update your DNS records so that the existing controller and router advertise addresses point to the new Load Balancer endpoints. Do not change the advertise addresses themselves.
  • K3s: Update your DNS records so that the existing controller and router advertise addresses point to the new node's IP. The new cluster should use the same node configuration and default storage class.

Verifying the restore

Run nf-status to confirm all deployments are healthy:

nf-status

All deployments should show the expected replica count in the READY column. For more detail:

kubectl get pods -n ziti
kubectl get pods -n cert-manager
kubectl get pods -n support

Known issues after restore

Common

  • The ziti-edge-tunnel deployment in the support namespace may need to be restarted, since the tunneler can come back online before the Ziti controller is ready:

    kubectl rollout restart deployment ziti-edge-tunnel -n support
  • If the DNS address changes for the controller or router advertise address, it may take a few minutes for client resources to reconnect. Restarting hosting routers or identities will accelerate recovery.

EKS

  • Load Balancer addresses will likely change after restoring from backup. Update the DNS entries for the controller and router advertise addresses. The ziti-router-1 deployment will not come back online until it can reach the controller over its advertise address — this is expected during a restore.

K3s

  • The trust-manager deployment in cert-manager can fail with: Error: container has runAsNonRoot and image has non-numeric user (cnb), cannot verify user is non-root

    To fix, edit the deployment and add runAsUser: 1000 under the securityContext block:

    kubectl edit deployment/trust-manager -n cert-manager
            securityContext:
    # add this
    runAsUser: 1000

    Then restart:

    kubectl rollout restart deployment trust-manager -n cert-manager
  • The elasticsearch-es-elastic-nodes statefulset can fail to start, causing Kibana to show "Kibana server is not ready yet." To fix:

    kubectl rollout restart statefulset elasticsearch-es-elastic-nodes -n support

Stalled restore jobs

If the restore appears to have worked but the restore job seems hung and never completes:

kubectl delete restore -n velero <restore-name>
# If the above command hangs, cancel it and run:
kubectl rollout restart deployment velero -n velero