Scaling Your App

The are two ways to scale your app:

Horizontally, by increasing the number of replicas for your app processes.
Vertically, by adding workers to your gunicorn web process.

Scaling with Replicas

Scaling your app horizontally by adding replicas is more resource-intensive, but has many advantages.

About App Replicas

You can think of a replica as a copy of your Dash app. More replicas for your app’s web process provides:

Increased availability: When multiple replicas are available to be served to your app users, your app’s availability increases.
Increased performance: With multiple replicas, callbacks are run in parallel, meaning your app runs faster. For example, if you scale an app that has the following four callbacks from one replica to four replicas, the callbacks will complete four times faster:
```python
app.layout = html.Div([
html.Div(id=’btn’, children=’Run’),
dcc.Graph(id=’graph-1’),
dcc.Graph(id=’graph-2’),
dcc.Graph(id=’graph-3’),
dcc.Graph(id=’graph-4’),
])

@callback(Output(‘graph-1’, ‘figure’), Input(‘btn’, ‘n_clicks’))
def update(_):
time.sleep(2) # This simulates a long-running callback
now = datetime.datetime.now()
return px.bar(x=[now - datetime.timedelta(minutes=1), now], y=[random.random(), random.random()])

@callback(Output(‘graph-2’, ‘figure’), Input(‘btn’, ‘n_clicks’))
def update(_):
time.sleep(2) # This simulates a long-running callback
now = datetime.datetime.now()
return px.bar(x=[now - datetime.timedelta(minutes=1), now], y=[random.random(), random.random()])

@callback(Output(‘graph-3’, ‘figure’), Input(‘btn’, ‘n_clicks’))
def update(_):
time.sleep(2) # This simulates a long-running callback
now = datetime.datetime.now()
return px.bar(x=[now - datetime.timedelta(minutes=1), now], y=[random.random(), random.random()])

@callback(Output(‘graph-4’, ‘figure’), Input(‘btn’, ‘n_clicks’))
def update(_):
time.sleep(2) # This simulates a long-running callback
now = datetime.datetime.now()
return px.bar(x=[now - datetime.timedelta(minutes=1), now], y=[random.random(), random.random()])
```

Replicas from a system perspective

For each new replica, an additional pod is created for the process. Scheduling for new pods follows the same rules as other pods on the system, meaning that they are not necessarily spread out across different nodes on the system (if multiple nodes exist).

Remember that the server that Dash Enterprise is installed on has a pod limit. A high number of replicas causes Dash Enterprise to reach the pod limit faster.

Managing Replicas

You can manage replicas in the Scale tab of your App Info. Each process can be scaled up to a maximum of 10 replicas.

By default, each process type defined in your Procfile runs as one replica.

For example, with the following Procfile, you’ll see a single Web process card in the Scale tab:

web: gunicorn app:server --workers 4

With this Procfile, you’ll see one card for the Web process and one card for the Worker process:

web: gunicorn app:server --workers 4
worker: celery -A app:celery_instance worker

<img>

Scaling for the Web process and non-Web processes work differently: * Web process: By default, Dash Enterprise automatically adds and removes replicas (or autoscales) according to your app’s traffic. Learn more about customizing this autoscaling behavior in the next section. * Non-Web processes: Manually set the number of replicas that you want for these processes by updating the replica count in the Scale tab.

A higher replica count increases your app’s overall memory usage. When memory usage on Dash Enterprise reaches 80% of the licensed memory limit, autoscaling and manually editing replica counts for all processes are disabled. If memory usage increased past 80% because of an app running multiple replicas and you need to reduce its replica count, stop the app first.

See Memory Usage Considerations for more information on scaling and memory.

Autoscaling

Autoscaling is the process by which Dash Enterprise adds or removes replicas according to the demand for your app. The higher the demand, the more replicas are added. Demand is measured by: * The number of users visiting your app. (See also: App Viewer Analytics) * The number of callbacks executed by app users, especially computationally expensive callbacks.

Autoscaling is only available for your app’s Web process. It leverages the Kubernetes-native HorizontalPodAutoscaler (HPA).

Autoscaling is advantageous both for your app users, who experience reduced loading times, and for the system, which is able to free up memory when demand for your app decreases.

You can define the minimum and maximum replica counts that Dash Enterprise uses when autoscaling your app. For new apps, the minimum replica count is 1 and the maximum is 10 by default.

If autoscaling is not appropriate for your app, you can disable it by setting the same number of replicas for both the minimum and maximum. To update minimum and maximum replica counts, go to the Scale tab and select Edit Resources.

Tip: Autoscaling can take a few seconds or minutes (depending on the app code) to add a new replica when the scale-up condition is met. If you expect traffic on your app to suddenly increase, such as during a large interactive presentation, we recommend increasing the minimum replica count ahead of time. This ensures that users are able to start using your app with little to no wait.

If you are the app owner or have the admin role, you can view the current replica count for autoscaling apps on the Monitoring page of the App Manager.

In addition, if you are an administrator with access to the server that Dash Enterprise is installed on, you can view replica and autoscaling information with kubectl.

To view the current replica count for an app with kubectl:

kubectl get -n dash-apps hpa/&lt;app-name&gt;-----web

where <app-name> is the name of the app whose replica count you want to view.

The output is similar to

NAME                      REFERENCE                            TARGETS          MINPODS   MAXPODS   REPLICAS   AGE
clinical-trial-----web    Deployment/clinical-trial-----web    25%/500%         1         10        1          46s

In the example above, there is currently one replica (REPLICAS column) and autoscaling will be triggered when the target value in the TARGETS column is met.

To watch for all app autoscaling on Dash Enterprise:

kubectl get hpa -n dash-apps --watch

Dash Enterprise does not autoscale your app below one replica (that is, your app is not automatically stopped when there is no traffic).

Scaling with Gunicorn

You can vertically scale your app by increasing the number of workers for the Web process defined in your Procfile. To do this, modify the value for the --workers flag, and then redeploy your app.

web: gunicorn app:server --workers 4

We recommend 4 workers for most production Dash apps. Note that a very high number of workers can lead to resource thrashing issues at the system level.

See the App Structure page for more details on Procfile.

Memory Usage Considerations

By default, each replica and workspace has a memory usage limit of 8,192 MiB, and administrators can configure this default.

To monitor your app’s overall memory usage, use the Memory Usage information in the Scale tab. This information displays the current memory usage for your app’s combined processes and workspace. The maximum memory represents the sum of the memory limits for your app’s replicas and workspace. For example, with the default 8,192 MiB memory limit, an app with one web process and one workspace has an overall maximum memory limit of 16,384 MiB. Memory usage for managed Redis and Postgres databases is not shown.

<img>

If your app reaches its memory limit, your app is stopped. You have two options:

Try to reduce memory usage. Some strategies include:
Using the --preload flag in your Procfile. For example:
web: gunicorn app:server --workers 4 --preload

Do not use the --preload flag if you are using shared database connection pools (see Database Connections). For more information on preloading, refer to the Gunicorn docs.
* Reducing the amount of workers defined in your Procfile.
* Using a file type for your dataset that supports memory mapping, like Arrow, and reading it with libraries like Vaex.
* Performing data queries and aggregations within the query on a database layer instead of in memory with Python. * Ask your administrator to increase your app’s memory limit. If they set a custom memory limit, your app is restarted. If your app uses the default memory limit and your administrator opts to instead increase this default, restart your app using the Start <img> button in the Overview tab.