Backing Up and Restoring Dash Enterprise (Multi-node)

This documentation is for Dash Enterprise.
Dash Enterprise is the fastest way to write & deploy Dash apps and
Jupyter notebooks.
10% of the Fortune 500 uses Dash Enterprise to productionize AI and
data science apps. Find out if your company is using Dash Enterprise.

This guide can help you create a backup of your Dash Enterprise instance and use it to restore Dash Enterprise to a previous state on a different cluster.

You’ll back up everything inside your dedicated Dash Enterprise cluster—including Dash apps deployed by members of your organization, user information, your Dash Enterprise license, and more—so that Dash Enterprise is ready shortly after a restore.

The officially supported way to back up Dash Enterprise is with Velero, an open-source set of backup and restore tools. In this guide, we refer to the cluster you are currently using as your source cluster. In the event that you need to restore from a backup of the source cluster, you perform the restore on a destination cluster. Restoring Dash Enterprise on the same cluster where you backed it up from is not currently supported.

This guide assumes that you will be storing your backups using the storage solution corresponding to the cloud provider your organization selected when creating your source cluster.

Prerequisites

Setting Up Velero

In order to use Velero, you’ll need to install the Velero CLI as well as the Velero server components.

The server components need to be configured with a location to store the backups, so you’ll first create this storage. To allow Velero to store data in this location, you’ll create an entity for Velero in your cloud provider and attach the appropriate policy. You can consider this entity a service account—that is, it exists for the Velero service and doesn’t belong to any particular individual in your organization.

Creating a Storage Location for Your Backups

Creating and Configuring a Service Account for Velero

Installing the Velero CLI

In this step, you’ll install the Velero CLI so that you can run Velero commands.

Velero CLI version 1.10.3 is required for compatibility with the Velero server components version that you’ll install in a later step.

If your source cluster is airgapped, you’ll also prepare the Velero images in your private container registry.

Installing the Velero Server Components on the Source Cluster

In this step, you’ll use the Velero CLI to install the Velero server components on your cluster.

To verify that the Velero installation was successful, check for running pods in the velero namespace: kubectl get pods -n velero.

Managing Backups

With Velero, you can create single backups and configure backup creation on a schedule. All your backups are stored in your dedicated storage location.

By default, backups expire after 30 days and Velero deletes all resources associated with expired backups, including the files in your storage location. You can change when backups expire by setting their TTL (time to live)
with the flag --ttl <duration> in a backup creation command. Replace <duration> with the TTL that you want to set, using the format 24h0m0s.

Important: Rotating the nodes in the cluster can sometimes interfere with Velero backups. If your organization periodically rotates nodes, we recommend verifying that scheduled backups still succeed after each node rotation. If you experience errors with individual or scheduled backups after a node rotation, contact us for support.

Creating a Backup

To create a backup for Dash Enterprise:

  1. Verify that your backup location is available:
    sh velero backup-location get

If PHASE is Available, your storage location is ready for new backups.

If PHASE is Unavailable, review your network settings. If your source cluster is behind a firewall, additional firewall rules may be needed to allow Velero to create backups.

  1. Create the backup:
    sh velero backup create <backup-name> --include-namespaces dash-apps,dash-services,plotly-system --exclude-resources validatingwebhookconfiguration,mutatingwebhookconfiguration,endpointslices
    where <backup-name> is the name you want to give to this backup.

Scheduling Backups

To schedule Dash Enterprise backups:

  1. Verify that your backup location is available:
    sh velero backup-location get

If PHASE is Available, your storage location is ready for new backups.

If PHASE is Unavailable, review your network settings. If your source cluster is behind a firewall, additional firewall rules may be needed to allow Velero to create backups.

  1. Create the backup schedule:
    sh velero schedule create <backup-name> --schedule="* * * * *" --include-namespaces dash-apps,dash-services,plotly-system --exclude-resources validatingwebhookconfiguration,mutatingwebhookconfiguration,endpointslices
    where <backup-name> is the name you want to give to the scheduled backups and * * * * * is a cron expression that defines the schedule.
    For example, 0 0 * * * creates a backup every day at midnight.

For more information, refer to the Velero backup reference.

Velero does not alert you if a scheduled backup fails to create. We recommend setting up monitoring or alerts for your storage location using your cloud provider’s official services.

Examining Your Backups

To list your Dash Enterprise backups: * sh velero backup get

The output looks similar to:
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR backup1 Completed 0 0 2023-04-21 15:39:00 -0400 EDT 24d default <none> backup2 Completed 0 0 2023-04-21 15:47:48 -0400 EDT 24d default <none> backup3 Completed 0 0 2023-04-20 16:02:52 -0400 EDT 23d default <none>

To view the logs for a specific backup: * sh velero backup logs <backup-name>
where <backup-name> is the name of the backup whose logs you want to view.

To view more information about a specific backup: * sh velero describe backup <backup-name>
where <backup-name> is the name of the backup you want to inspect.

The output looks similar to:
```
Name: backup4
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.24.9
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=24

Phase: Completed

Errors: 0
Warnings: 0

Namespaces:
Included: dash-apps, dash-services, plotly-system
Excluded: <none>

Resources:
Included: * Excluded: validatingwebhookconfiguration, mutatingwebhookconfiguration, endpointslices
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1.1.0

Started: 2023-06-16 15:10:15 -0400 EDT
Completed: 2023-06-16 15:11:21 -0400 EDT

Expiration: 2023-07-16 15:10:15 -0400 EDT

Total items to be backed up: 574
Items backed up: 574

Velero-Native Snapshots: 10 of 10 snapshots completed successfully (specify –details for more information)

restic Backups (specify –details for more information):
Completed: 7
```

If the Phase is Completed and there are no errors, it is safe to restore Dash Enterprise using this backup.

When you’re ready to perform a restore, you can choose from any available backup.

Preparing Your Destination Cluster

Dash Enterprise cannot be restored on the same cluster that you backed it up from. In this step, you’ll provision a new cluster,
called the destination cluster, to use for the restore. You’ll install the Velero server components on it so that you can
run the restore command.

The destination cluster has the following requirements:

Provisioning a New Cluster

Installing the Velero Server Components on the Destination Cluster

In this step, you’ll install the Velero server components on your destination cluster and configure it to have access to the
storage location where your backups are stored.

To verify that Velero the installation was successful, check for running pods in the velero namespace: kubectl get pods -n velero.

Airgapped Troubleshooting

If at any point you need to uninstall Velero server components and your destination cluster is airgapped, make sure to run the following command to check that a Velero config map exists before you reinstall Velero:

kubectl -n velero get configmap fs-restore-action-config

If a config map doesn’t exist, run the following commands to create one, and then reinstall Velero.

kubectl create namespace velero --dry-run=client -o yaml | kubectl apply -f -
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: fs-restore-action-config
  namespace: velero
  labels:
    velero.io/plugin-config: ""
    velero.io/pod-volume-restore: RestoreItemAction
data:
  image: &lt;container-registry&gt;/velero-restore-helper:v1.10.3
EOF

where &lt;container-registry&gt; is your private container registry hostname.

Restoring Dash Enterprise on the Destination Cluster

Performing a restore involves DNS changes. Plan your restore accordingly.

Removing Your DNS

Before creating a restore, remove the DNS entry for your Dash Enterprise base domain and wait for the change to propagate. This is to prevent the system from pulling images from your source cluster when restoring apps.

Creating a Restore

To create a restore on the destination cluster:

  1. Change your backup storage to read-only mode (this prevents backup objects from being created or deleted in the backup storage location during the restore process):
    sh kubectl patch backupstoragelocation default --namespace velero --type merge --patch '{"spec":{"accessMode":"ReadOnly"}}'

  2. Create the restore:
    sh velero restore create &lt;restore-name&gt; --from-backup &lt;backup-name&gt;
    where &lt;restore-name&gt; is the name you want to give to the restore and &lt;backup-name&gt; is the name of the backup you want to use.

The output is similar to
txt Restore request "restore1" submitted successfully. Run velero restore describe restore1 or velero restore logs restore1 for more details

If the Phase is Completed when describing your restore with velero restore describe, the restore was successful. Note that some warnings similar to “could not restore… already exists” and
“the in-cluster version is different than the backed-up version” are expected.

  1. Return your backup storage to read-write mode:
    sh kubectl patch backupstoragelocation default --namespace velero --type merge --patch '{"spec":{"accessMode":"ReadWrite"}}'

Updating Your DNS

In this step, you’ll re-add your base domain entry using the new load balancer hostname (multi-node). This causes https://&lt;your-dash-enterprise-server&gt; to serve your restored instance of Dash Enterprise.

Rotating Resources and Redeploying

In this step, you’ll reapply metadata that Velero didn’t persist with the restore and rotate parts of the Dash Enterprise core system. Then, you’ll use the KOTS Admin Console to redeploy Dash Enterprise. The KOTS Admin Console is not available at its usual URL
due to changes to the DNS and core system, so you’ll access it via port-forwarding.

To rotate resources and redeploy Dash Enterprise:

  1. Reapply metadata and rotate core system resources:

Note: The commands below use environment variables with Unix syntax. If your workstation uses Windows, run these commands in a terminal like Git Bash or adapt the environment variable syntax.

```shell
# Scale up cert-injection-webhook (if present)

if $(kubectl get deploy -n cert-injection-webhook | grep -q cert-injection-webhook); then
kubectl scale deployment -n cert-injection-webhook –replicas=1 –all
fi

# Extract owner reference UID

DE_UID=$(kubectl get dashenterprise dash-enterprise -n plotly-system -ojsonpath=’{.metadata.uid}’)
BUILDSTACK_UID=$(kubectl get buildstack de-buildstack -n plotly-system -ojsonpath=’{.metadata.uid}’)
HARBOR_UID=$(kubectl get harborcluster de -n plotly-system -ojsonpath=’{.metadata.uid}’)
REDPANDA_UID=$(kubectl get cluster redpanda -n plotly-system -ojsonpath=’{.metadata.uid}’)

# Apply owner reference

kubectl patch appstack de-appstack -n plotly-system –type merge –patch ‘{“metadata”: {“ownerReferences”: [{“apiVersion”: “dash.plotly.com/v1alpha1”, “kind”: “DashEnterprise”, “blockOwnerDeletion”: true, “controller”: true, “name”: “dash-enterprise”, “uid”: ‘"“$DE_UID”"‘}]}}’
kubectl patch buildstack de-buildstack -n plotly-system –type merge –patch ‘{“metadata”: {“finalizers”: [“csc.dash.plotly.com/finalizer”], “ownerReferences”: [{“apiVersion”: “dash.plotly.com/v1alpha1”, “kind”: “DashEnterprise”, “blockOwnerDeletion”: true, “controller”: true, “name”: “dash-enterprise”, “uid”: ‘"“$DE_UID”"‘}]}}’
kubectl patch gitea gitea-cluster -n plotly-system –type merge –patch ‘{“metadata”: {“ownerReferences”: [{“apiVersion”: “dash.plotly.com/v1alpha1”, “kind”: “Buildstack”, “blockOwnerDeletion”: true, “controller”: true, “name”: “de-buildstack”, “uid”: ‘"“$BUILDSTACK_UID”"‘}]}}’
kubectl patch harborcluster de -n plotly-system –type merge –patch ‘{“metadata”: {“ownerReferences”: [{“apiVersion”: “dash.plotly.com/v1alpha1”, “kind”: “Buildstack”, “blockOwnerDeletion”: true, “controller”: true, “name”: “de-buildstack”, “uid”: ‘"“$BUILDSTACK_UID”"‘}]}}’
kubectl patch harbor de-harbor -n plotly-system –type merge –patch ‘{“metadata”: {“ownerReferences”: [{“apiVersion”: “goharbor.io/v1beta1”, “kind”: “HarborCluster”, “blockOwnerDeletion”: true, “controller”: true, “name”: “de”, “uid”: ‘"“$HARBOR_UID”"‘}]}}’
kubectl patch statefulset redpanda -n plotly-system –type merge –patch ‘{“metadata”: {“ownerReferences”: [{“apiVersion”: “redpanda.vectorized.io/v1alpha1”, “kind”: “Cluster”, “blockOwnerDeletion”: true, “controller”: true, “name”: “redpanda”, “uid”: ‘"“$REDPANDA_UID”"‘}]}}’

# Clean up resources

kubectl delete builder dash-app-builder -n plotly-system
kubectl delete appstack de-appstack -n plotly-system
kubectl delete deployment ingress-nginx-controller -n plotly-system
```

  1. Port-forward the KOTS Admin Console:
    sh kubectl port-forward -n plotly-system svc/kotsadm 8800:3000

  2. Go to http://localhost:8800

  3. Enter the KOTS Admin Console password that was stored as part of your Dash Enterprise installation.

  4. Next to the Dash Enterprise version labelled “Currently deployed version,” select Redeploy.

<img>

  1. If your cluster is airgapped, wait for imageswap pods to be up and running. You can check with the following command. If your cluster is internet-connected, skip to step 8.

shell kubectl get deploy imageswap -n imageswap-system

Once the imageswap pods are running, you’ll see an output that looks something like this:

shell NAME READY UP-TO-DATE AVAILABLE AGE imageswap 2/2 2 2 3h9m

  1. If your cluster is airgapped, rotate Harbor pods by running the following commands in order:

```shell
kubectl -n plotly-system delete pod -l core.goharbor.io/name=de-harbor-harbor-core
kubectl -n plotly-system get pod -l core.goharbor.io/name=de-harbor-harbor-core

kubectl -n plotly-system delete pod -l portal.goharbor.io/name=de-harbor-harbor-portal
kubectl -n plotly-system get pod -l portal.goharbor.io/name=de-harbor-harbor-portal

kubectl -n plotly-system delete pod -l registry.goharbor.io/name=de-harbor-harbor-registry
kubectl -n plotly-system get pod -l registry.goharbor.io/name=de-harbor-harbor-registry

kubectl -n plotly-system delete pod -l jobservice.goharbor.io/name=de-harbor-harbor-jobservice
kubectl -n plotly-system get pod -l jobservice.goharbor.io/name=de-harbor-harbor-jobservice
```

  1. Wait for the status to change to Ready.

Known issue: If the status does not change to Ready after 10 minutes, the dash-app-builder may have been unable to resolve the registry domain name after the DNS was updated.

To check the status of the dash-app-builder, run kubectl get builder -n plotly-system in a new terminal. If READY is False, run kubectl delete builder dash-app-builder -n plotly-system
to trigger a reconnection to the registry.

  1. Go to https://&lt;your-dash-enterprise-server&gt; to access your restored instance of Dash Enterprise.

  2. Verify that your restored instance behaves the way you’d expect before deleting your source cluster. We recommend checking the health of the core system with:
    sh kubectl get dashenterprise -n plotly-system

If both Appstack and Buildstack are healthy, this indicates that Dash Enterprise is running normally.

A provisioning status indicates that the appstack or buildstack is in the process of provisioning what it needs to be in a healthy state.

A pending status indicates that the appstack or buildstack is waiting for another resource before it can change to a provisioning or healthy state.

Updating the credentials-velero File

If the contents of your credentials-velero become outdated (for example, after key rotation), backups will fail.

To refresh the credentials-velero file and apply it to Dash Enterprise:

  1. Make the necessary change in the credentials-velero file.
  2. From the directory containing your updated credentials-velero file, update the secret on the cluster to use the new values:
    txt kubectl patch -n velero secret cloud-credentials -p '{"data": {"cloud": "'$(base64 -w 0 credentials-velero)'"}}'

Uninstalling Velero

We recommend keeping the Velero server components installed, but if you need to uninstall them from your source or destination cluster, use:

kubectl delete namespace/velero clusterrolebinding/velero
kubectl delete crds -l component=velero