Persistent Filesystem

Dash Enterprise’s persistent filesystem feature provides persistent, fast storage for data files for apps and workspaces.

In many cases, it is faster to store a copy of your data close to your app rather than download it on-the-fly
within your app code. When these datasets are large, it isn’t always possible to store them in Git or in memory.

Dash Enterprise’s persistent filesystem allows you to store these datasets in a filesystem that is shared
between your workspace and your deployed app (including every process declared in your Procfile).

Use a persistent filesystem to store:
- Large datasets that are too big to store in Git and/or in memory. In particular, HDF5 or Parquet files analyzed with Vaex (a file-based, out-of-memory dataframe library).
- Cached files that can be persisted across deploys.
- Dynamic datasets that are updated by periodic background processes and read by your app.
- Manually updated datasets that you update in your workspace without redeploying your app.

Each app and its workspace share a persistent filesystem. The persistent filesystem is retained even if you delete the workspace.

Persistent filesystem from a system perspective

Enabling the persistent filesystem before creating a workspace adds a persistent volume (PV) to the Dash Enterprise server. (Workspaces also require a PV, so no additional PV is created if enabling the persistent filesystem when a workspace exists).

The PV is managed by OpenEBS.

In summary, files stored in the persistent filesystem folder are:
- Shared between the deployed app and the workspace
- Shared between every container process listed in the Procfile
- Not version controlled in Git
- Located at ~/mount or at ../mount relative to the app folder
- Limited to 25 GB in total
- Unique to the app and workspace. Not shared with other apps or workspaces.
- Persisted to a persistent volume internally

This page describes how to enable and disable the persistent filesystem using the App Manager, but you can also use the Dash Enterprise CLI.

Enabling a Persistent Filesystem

To enable a persistent filesystem:

Find and select the app in the App Manager.
Go to the Persistent Filesystem tab.
Select Edit Persistent Filesystem.

<img>
Select Enabled and then Save.

<img>

The persistent filesystem becomes available to your app and its workspace.

Disabling a Persistent Filesystem

When you disable a persistent filesystem, all files stored in it are removed and are not recoverable.

To disable a persistent filesystem:

Find and select the app in the App Manager.
Go to the Persistent Filesystem tab.
Select Edit Persistent Filesystem.

<img>
Select Disabled and then Save.

<img>

Adding Files to a Persistent Filesystem

Files can be added to a persistent filesystem manually within the workspace or
dynamically on deployment or in a background task in your app code.

Manually Creating Files in the Workspace UI

To add files to the persistent filesystem in the workspace UI:

Select the persistent filesystem from the menu:

<img>
Drag files from your device to the file explorer (you’ll see a README.md file already there).

You can also add files by going to File > Upload Files… and selecting files to upload.

Manually Creating Files in the Workspace Terminal or Python Console

The persistent filesystem is available at ../mount.

To add files to the persistent filesystem from the workspace terminal, you can run commands like curl
that retrieve data from a URL and save it to a csv file in the persistent filesystem.

curl -o ../mount/1962_2006_walmart_store_openings.csv  <a href="https://raw.githubusercontent.com/plotly/datasets/master/1962_2006_walmart_store_openings.csv">https://raw.githubusercontent.com/plotly/datasets/master/1962_2006_walmart_store_openings.csv</a>

You can also open a Python or IPython console within the workspace and write code that downloads data and writes it to a file:

$ python
>>> import pandas as pd
>>> pd.DataFrame({'a': [1, 2, 3], 'b': [3, 1, 2]}).to_csv('../mount/data.csv')

Or run a script or Jupyter notebook within the workspace.

Updating Files on Deploy

Instead of updating files manually within the workspace, you can update these files programmatically on deploy.

On App Boot

In your app code, your app can write or read files to the ../mount folder on boot:

app.py

from dash import Dash, html
import pandas as pd

# Write and/or read files from mount on app start
pd.DataFrame({'a': [1, 2, 3], 'b': [3, 1, 2]}).to_csv('../mount/data.csv')
df = pd.read_csv('../mount/data.csv')

app = Dash()

# ...

In Predeploy Script

Alternatively, you can run a Bash script before the web command is run by creating
a predeploy script that contains commands to fetch the data
and reference the script in a project.toml file.

Here, we add the command we ran in the terminal to a file fetchdata.sh, and then reference that file
under predeploy in our project.toml file. The system runs fetchdata.sh as a Bash script
before our app is deployed.

fetchdata.sh

curl -o ../mount/gapminderDataFiveYear.csv  <a href="https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv">https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv</a>

project.toml

[scripts]
predeploy = "fetchdata.sh"

Adding or Updating Files While Your App Is Running

Update files while the app is running if your datasets change over time.

To add or update a file when an app is running:

Write a script that creates or updates a file in the persistent filesystem.
Add the script in your Procfile.

Here, we write data periodically to a CSV file in the persistent filesystem in task.py.
The app, app.py, reads the data when an app user selects the Get Data button.
The Procfile has a line worker: python task.py that runs the task.py script in the background.

app.py

from dash import Dash, html, dcc, Input, Output, callback
import plotly.express as px
import pandas as pd
from pathlib import Path

app = Dash()
server = app.server

app.layout = html.Div(
    children=[
        html.Button("Get Data", id="get-data", n_clicks=0),
        dcc.Graph(
            id="graph",
        )
    ]
)

@callback(
    Output("graph", "figure"),
    Input("get-data", "n_clicks"),
)
def update_output(n_clicks):
    data_path = Path.cwd().parent / 'mount' / 'data.csv'
    data = pd.read_csv(data_path)
    figure = px.scatter(data, x="x", y="y")
    return figure


if __name__ == "__main__":
    app.run(debug=True)

task.py

## Writes random numbers to data.csv every 2 seconds

import random
import time
from pathlib import Path

data_path = Path.cwd().parent / 'mount' / 'data.csv'

with open(data_path, 'w+') as f:
    f.write(('x,y\n'))
while True:
    with open(data_path, 'a') as f:
        x = (random.randint(0,9))
        y = (random.randint(0,100))
        f.write((str(x) + ',' + str(y) + '\n'))
    time.sleep(2)

Procfile

web: gunicorn app:server --workers 4
worker: python task.py

Note that your app code will need to read the file on-the-fly in callback functions or a layout function
when it needs to use the data; if you load the data in advance on app boot, the data will only be read
into memory when the app is deployed and will not read the updated file until the app is restarted or redeployed.

Accessing Files in Your App Code

The persistent filesystem is available at ~/mount or ../mount relative to your app folder.

To access a file in the persistent filesystem, you can use any of Python’s built-in modules and functions for handling files.

These are all valid ways to read files from the persistent filesystem:

import pandas as pd

df = pd.read_csv('../mount/gapminderDataFiveYear.csv')

or using pathlib to construct the path of the file:

from pathlib import Path
import pandas as pd

gapminder_path = Path.cwd().parent / 'mount' / 'gapminderDataFiveYear.csv'

df = pd.read_csv(gapminder_path)

Vaex

If your dataset is too large to fit in Git, it may also be too large to fit in memory.

In this case, the Vaex library can be a good alternative to pandas.

Store your datasets as HDF5 files in the ../mount folder and use vaex instead of pandas
to process the data. Vaex processes the data “row by row” instead of reading the file
all at once into memory.

Local Development

The mount folder is not meant to be version controlled and so should not be part of
your app project folder. To run your app locally, you will need to
mimic the mount folder structure of Dash Enterprise by creating a mount folder
that is a sibling to your app project folder.

Without a ../mount folder, your project code might look like this:

└── my-project
    ├── Procfile
    ├── app.py
    ├── data.csv
    └── requirements.txt

where the root of my-project is where you run git commands (the entire folder is version controlled).

To use the ../mount folder, move the contents of my-project into a folder called app/ that
is on the same level as the new mount/ folder:

└── my-project
    ├── app
    │   ├── Procfile
    │   ├── app.py
    │   └── requirements.txt
    └── mount
        └── data.csv

With this structure, you run git commands at app/, ensuring that the app files
remain version controlled but mount/ is not.

Alternatively, you can keep the my-project/ folder name and create a new parent folder like
my-parent-project:

└── my-parent-project
    ├── my-project
    |   ├── Procfile
    |   ├── app.py
    |   └── requirements.txt
    └── mount
        └── data.csv

With this structure, you run git commands at my-project/, ensuring that the app files
remain version controlled but mount/ is not.

The folder containing the app code (app/ in the first example, my-project/ in the second example)
should not be referenced in your code so it can be named anything.

The mount/ folder’s name is referenced in code, so it needs to be called mount.
Its position is also referenced (app.py accesses it at ../mount), so it always needs to a sibling of your app folder.

App Code vs Persistent Filesystem Files

The files in the app folder that you commit and push to Dash Enterprise are not persisted between deploys.
On each deploy, Dash Enterprise creates a new Docker image using the most recent commit’s files
and discards the previous image.
Subsequent deploys use a new image with the new set of files and
any changes to the app folder in the previous deploy are discarded.

Because the files in the persistent filesystem are not version controlled (you only run git commands against
your app folder), the files in the persistent filesystem persist across app deployments.

Limitations

Persistent filesystems create a volume on the Dash Enterprise server. The server has
volume limits.
A persistent filesystem is shared between an app and its workspace,
but it can’t be shared with other apps and workspaces.
Storage is limited to 25 GB.