DataTable Filtering

As discussed in the interactivity chapter,
DataTable includes filtering capabilities.
Set filter_action='native' for clientside (front-end) filtering
or filter_action='custom' to perform your own filtering in Python.

filter_action='native' will work well up to 10,000-100,000 rows.
After which, you may want to use filter_action='custom' so that your
app sends less data over the network into the browser.

from dash import Dash, dash_table
import datetime

import pandas as pd


df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
df = df[['continent', 'country', 'pop', 'lifeExp']]  # prune columns for example
df['Mock Date'] = [
    datetime.datetime(2020, 1, 1, 0, 0, 0) + i * datetime.timedelta(hours=13)
    for i in range(len(df))
]

app = Dash()

app.layout = dash_table.DataTable(
    columns=[
        {'name': 'Continent', 'id': 'continent', 'type': 'numeric'},
        {'name': 'Country', 'id': 'country', 'type': 'text'},
        {'name': 'Population', 'id': 'pop', 'type': 'numeric'},
        {'name': 'Life Expectancy', 'id': 'lifeExp', 'type': 'numeric'},
        {'name': 'Mock Dates', 'id': 'Mock Date', 'type': 'datetime'}
    ],
    data=df.to_dict('records'),
    filter_action='native',

    style_table={
        'height': 400,
    },
    style_data={
        'width': '150px', 'minWidth': '150px', 'maxWidth': '150px',
        'overflow': 'hidden',
        'textOverflow': 'ellipsis',
    }
)


if __name__ == '__main__':
    app.run(debug=True)

Notes:
- As above, we recommend fixing column widths with filtering. Otherwise, the column
widths will grow or shrink depending on how wide the data is within the columns.
- There is a bug with fixed_rows that prevents horizontal scroll
when no filter results are returned. Until this bug is fixed, we recommend
avoiding fixed_rows. For updates, see plotly/dash-table#746
- The default filtering behavior will depend on the data type of the column (see below).
Data types are not inferred, so you have to set them manually.

The example below determines the datatype of the column automatically with Pandas:

from dash import Dash, dash_table
import datetime
import sys
import pandas as pd


df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
df = df[['continent', 'country', 'pop', 'lifeExp']]  # prune columns for example
df['Mock Date'] = [
    datetime.datetime(2020, 1, 1, 0, 0, 0) + i * datetime.timedelta(hours=13)
    for i in range(len(df))
]

app = Dash()

def table_type(df_column):
    # Note - this only works with Pandas >= 1.0.0

    if sys.version_info < (3, 0):  # Pandas 1.0.0 does not support Python 2
        return 'any'

    if isinstance(df_column.dtype, pd.DatetimeTZDtype):
        return 'datetime',
    elif (isinstance(df_column.dtype, pd.StringDtype) or
            isinstance(df_column.dtype, pd.BooleanDtype) or
            isinstance(df_column.dtype, pd.CategoricalDtype) or
            isinstance(df_column.dtype, pd.PeriodDtype)):
        return 'text'
    elif (isinstance(df_column.dtype, pd.SparseDtype) or
            isinstance(df_column.dtype, pd.IntervalDtype) or
            isinstance(df_column.dtype, pd.Int8Dtype) or
            isinstance(df_column.dtype, pd.Int16Dtype) or
            isinstance(df_column.dtype, pd.Int32Dtype) or
            isinstance(df_column.dtype, pd.Int64Dtype)):
        return 'numeric'
    else:
        return 'any'

app.layout = dash_table.DataTable(
    columns=[
        {'name': i, 'id': i, 'type': table_type(df[i])} for i in df.columns
    ],
    data=df.to_dict('records'),
    filter_action='native',

    css=[{
        'selector': 'table',
        'rule': 'table-layout: fixed'  # note - this does not work with fixed_rows
    }],
    style_table={'height': 400},
    style_data={
        'width': '{}%'.format(100. / len(df.columns)),
        'textOverflow': 'hidden'
    }
)


if __name__ == '__main__':
    app.run(debug=True)

Filtering Operators

The filtering syntax is data-type specific.
Data types are not inferred, they must be set manually.
If a type is not specified, then we assume it is a string (text).

Text & String Filtering

United
= United
United States
"United States"
= United States
= "United States"
contains United
> United
>= United
< United
<= United

By default, the columns with the “text” type use the
contains operator. So, searching United is the same as
contains United

For legacy purposes, eq can also be substituted for =.

>, >=, <, and <= compare strings in dictionary order,
with numbers and most symbols coming before letters,
and uppercase coming before lowercase.

If you have quotes in the string, you can use a different quote, or
escape the quote character. So eq 'Say "Yes!"' and
="Say \\"Yes!\\"" are the same.

Numeric Filtering

43.828
= 43.828
> 43.828
>= 43.828
< 43.828
<= 43.828

By default, columns with the numeric type use the = operator.
So, searching 43.828 is the same as = 43.828.

Datetime Filtering

2020
2020-01
2020-01-01
2020-01-01 04:01
2020-01-01 04:01:10
datestartswith 2020
datestartswith 2020-01
datestartswith 2020-01-01
datestartswith 2020-01-01 04:01
datestartswith 2020-01-01 04:01:10
> 2020-01
> 2020-01-20
>= 2020-01
>= 2020-01-20
< 2020-01
< 2020-01-20
<= 2020-01
<= 2020-01-20

Operators

Many operators have two forms: a symbol (=) and a word (eq) that
can be used interchangeably.

<table>
<tr>
<td><h4><p><code>=<code> <code>eq<code><p><h4>
Default operator for <code>number<code> columns<td>
<td>Are the two numbers equal? Regardless of type, will first try to
convert both sides to numbers and compare the numbers. If either
cannot be converted to a number, looks for an exact match.<td>
<tr>
<tr>
<td><h4><p><code>contains<code><p><h4>
Default operator for <code>text<code> and <code>any<code> columns<td>
<td>Does the text value contain the requested substring?
May match the beginning, end, or anywhere in the middle. The match
is case-sensitive and exact.<td>
<tr>
<tr>
<td><h4><p><code>datestartswith<code><p><h4>
Default operator for <code>datetime<code> columns’<td>
<td>Does the datetime start with the given parts? Enter a partial
datetime, this will match any date that has at least as much
precision and starts with the same pieces. For example,
<code>datestartswith ‘2018-03-01’<code> will match <code>‘2018-03-01 12:59’<code> but
not <code>‘2018-03’<code> even though we interpret <code>‘2018-03-01’<code> and
<code>‘2018-03’<code> both to mean the first instant of March, 2018.<td>
<tr>
<tr>
<td><h4><p><code>><code> <code>gt<code> <code><<code> <code>lt<code><br>
<code>>=<code> <code>ge<code> <code><=<code> <code>le<code><br>
<code>!=<code> <code>ne<code><p><h4><td>
<td>Comparison: greater than, less than, greater or equal, less or
equal, and not equal. Two strings compare by their dictionary
order, with numbers and most symbols coming before letters, and
uppercase coming before lowercase.<td>
<tr>
<table>

Back-end Filtering

For large dataframes, you can perform the filtering in Python instead
of the default clientside filtering. You can find more information on
performing operations in python in the
Python Callbacks chapter.

The syntax is (now) the same as front-end filtering, but it’s up to the
developer to implement the logic to apply these filters on the Python
side.
In the future we may accept any filter strings, to allow you to
write your own expression query language.

Example:

from dash import Dash, dash_table, Input, Output, callback
import pandas as pd


app = Dash()

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')

app.layout = dash_table.DataTable(
    id='table-filtering-be',
    columns=[
        {"name": i, "id": i} for i in sorted(df.columns)
    ],

    filter_action='custom',
    filter_query=''
)

operators = [['ge ', '>='],
             ['le ', '<='],
             ['lt ', '<'],
             ['gt ', '>'],
             ['ne ', '!='],
             ['eq ', '='],
             ['contains '],
             ['datestartswith ']]


def split_filter_part(filter_part):
    for operator_type in operators:
        for operator in operator_type:
            if operator in filter_part:
                name_part, value_part = filter_part.split(operator, 1)
                name = name_part[name_part.find('{') + 1: name_part.rfind('}')]

                value_part = value_part.strip()
                v0 = value_part[0]
                if (v0 == value_part[-1] and v0 in ("'", '"', '`')):
                    value = value_part[1: -1].replace('\\' + v0, v0)
                else:
                    try:
                        value = float(value_part)
                    except ValueError:
                        value = value_part

                # word operators need spaces after them in the filter string,
                # but we don't want these later
                return name, operator_type[0].strip(), value

    return [None] * 3


@callback(
    Output('table-filtering-be', "data"),
    Input('table-filtering-be', "filter_query"))
def update_table(filter):
    filtering_expressions = filter.split(' && ')
    dff = df
    for filter_part in filtering_expressions:
        col_name, operator, filter_value = split_filter_part(filter_part)

        if operator in ('eq', 'ne', 'lt', 'le', 'gt', 'ge'):
            # these operators match pandas series operator method names
            dff = dff.loc[getattr(dff[col_name], operator)(filter_value)]
        elif operator == 'contains':
            dff = dff.loc[dff[col_name].str.contains(filter_value)]
        elif operator == 'datestartswith':
            # this is a simplification of the front-end filtering logic,
            # only works with complete fields in standard format
            dff = dff.loc[dff[col_name].str.startswith(filter_value)]

    return dff.to_dict('records')


if __name__ == '__main__':
    app.run(debug=True)

Advanced filter usage

Filter queries can be as simple or as complicated as you want
them to be. When something is typed into a column filter, it
is automatically converted to a filter query on that column
only.

from dash import Dash, dcc, html, Input, Output, dash_table, callback
import pandas as pd
import json

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')

df['id'] = df['country']
df.set_index('id', inplace=True, drop=False)

app = Dash()

app.layout = html.Div([

    dcc.RadioItems(
        [{'label': 'Read filter_query', 'value': 'read'}, {'label': 'Write to filter_query', 'value': 'write'}],
        'read',
        id='filter-query-read-write',
    ),

    html.Br(),

    dcc.Input(id='filter-query-input', placeholder='Enter filter query'),

    html.Div(id='filter-query-output'),

    html.Hr(),

    dash_table.DataTable(
        id='datatable-advanced-filtering',
        columns=[
            {'name': i, 'id': i, 'deletable': True} for i in df.columns
            # omit the id column
            if i != 'id'
        ],
        data=df.to_dict('records'),
        editable=True,
        page_action='native',
        page_size=10,
        filter_action="native"
    ),
    html.Hr(),
    html.Div(id='datatable-query-structure', style={'whitespace': 'pre'})
])


@callback(
    Output('filter-query-input', 'style'),
    Output('filter-query-output', 'style'),
    Input('filter-query-read-write', 'value')
)
def query_input_output(val):
    input_style = {'width': '100%'}
    output_style = {}
    if val == 'read':
        input_style.update(display='none')
        output_style.update(display='inline-block')
    else:
        input_style.update(display='inline-block')
        output_style.update(display='none')
    return input_style, output_style


@callback(
    Output('datatable-advanced-filtering', 'filter_query'),
    Input('filter-query-input', 'value')
)
def write_query(query):
    if query is None:
        return ''
    return query


@callback(
    Output('filter-query-output', 'children'),
    Input('datatable-advanced-filtering', 'filter_query')
)
def read_query(query):
    if query is None:
        return "No filter query"
    return dcc.Markdown('`filter_query = "{}"`'.format(query))


@callback(
    Output('datatable-query-structure', 'children'),
    Input('datatable-advanced-filtering', 'derived_filter_query_structure')
)
def display_query(query):
    if query is None:
        return ''
    return html.Details([
        html.Summary('Derived filter query structure'),
        html.Div(dcc.Markdown('''```json
{}
```'''.format(json.dumps(query, indent=4))))
    ])


if __name__ == '__main__':
    app.run(debug=True)

The filter_query property is written to when the user
filters the data by using the column filters. For example, if
a user types ge 100000000 in the pop column filter, and
Asia in the continent column filter, filter_query will
look like this:

{pop} ge 100000000 && {continent} contains "Asia"

Try typing those values into the column filters in the app
above, and ensure that the “Read filter_query” option is
selected.

The filter_query property can also be written to. This might
be useful when performing more complex filtering,
like if you want to filter a column based on two (or more)
conditions. For instance, say that we want countries with a
population greater than 100 million, but less than 500
million. Then our filter_query would be as follows:

{pop} ge 100000000 and {pop} le 500000000

Select the “Write to filter_query” option in the app above,
and try it out by copying and pasting the filter query above
into the input box.

Say that we now want to get a bit more advanced, and
cross-filter between columns; for instance, we only want the
results that are located in Asia. Now, our filter query
becomes:

{pop} ge 100000000 and {pop} le 500000000 and {continent} eq "Asia"

We can make the expression even more complex. For example,
let’s say we want all of those countries with the populations
that fall within our boundaries and that are in Asia, but for
some reason we also want to include Singapore. This results in
a filter query that is a little more long-winded:

(({pop} ge 100000000 and {pop} le 500000000) or {country} eq "Singapore") and {continent} eq "Asia"

Note that we’ve grouped expressions together using
parentheses. This is part of the filtering syntax. Just as is
true in mathematical expressions, the expressions in the
innermost parentheses are evaluated first.

Symbol-based versus letter-based operators

An important thing to notice is that the two types of
relational operators that can be used in the column filters
(symbol-based, like >=, and letter-based, like ge) are not
converted into one another when filter_query is being
constructed from the values in the column filters. Therefore,
if using filter_query to implement backend filtering, it’s
necessary to take both of these forms of the
“greater-than-or-equal-to” operator into account when parsing
the query string (or ensure that the user only uses the ones
that the backend can parse).

However, in the case of the logical operator and/&&, when
the table is constructing the query string, the symbol-based
representation will always be used.

Derived filter query structure

The derived_filter_query_structure prop is a dictionary
representation of the query syntax tree. You can use the value
of this property to implement backend filtering.

For a query that describes a relationship between two values,
there are three components: the operation, the left-hand side,
and the right-hand side. For instance, take the following
query:

{pop} ge 100000000

The operation here is ge (i.e., >=), the left-hand side is
the field pop (corresponding to the column pop), and the
right-hand side is the value 100000000. As the queries
become increasingly complex, so do the query structures. Try
it out by expanding the “Derived filter query structure” in
the example app above.

Note that for all operators, there are two keys subType and
value that correspond to, respectively, the symbol-based
representation and the originally inputted representation of
the operator. So, in the case of the query above, subType
will be >= and value will be ge; if our query string
were {pop} >= 100000000 instead, both subType and value
will be >=.

Backend filtering with `pandas` and `derived_filter_query_structure`

It’s likely that your data are already in a pandas
dataframe. Using the derived_filter_query_structure in
conjunction with pandas filters can enable you to do some
pretty heavy lifting with the table! You can see an example of
this below.

from dash import Dash, Input, Output, callback
from dash.dash_table import DataTable

import json
import pandas as pd

types = {
    'id': 'numeric',
    'Complaint_ID': 'numeric',
    'ZIP_code': 'numeric',
    'Date_received': 'datetime',
    'Date_sent_to_company': 'datetime',
}

df = pd.read_csv('https://github.com/plotly/datasets/raw/master/26k-consumer-complaints.csv')
df['id'] = df['Unnamed: 0']
df = df.drop(['Unnamed: 0'], axis=1)
df = df.reindex(columns=['id']+df.columns[:-1].tolist())
df.columns = [column.replace(" ", "_") for column in df.columns]
df.columns = [column.replace("-", "_") for column in df.columns]

app = Dash()
app.scripts.config.serve_locally = True

app.layout = DataTable(
    id='demo-table',
    data=df.to_dict('records'),
    columns=[{ 'id': i, 'name': i, 'type': types.get(i, 'any')} for i in df.columns],
    filter_action='custom',
    page_action='native',
    page_size=15,
    virtualization=True,
    style_cell={
        'min-width': '100px'
    },
    css=[
        { 'selector': '.row-1', 'rule': 'min-height: 500px;' }
    ]
)


def to_string(filter):
    operator_type = filter.get('type')
    operator_subtype = filter.get('subType')

    if operator_type == 'relational-operator':
        if operator_subtype == '=':
            return '=='
        else:
            return operator_subtype
    elif operator_type == 'logical-operator':
        if operator_subtype == '&&':
            return '&'
        else:
            return '|'
    elif operator_type == 'expression' and operator_subtype == 'value' and type(filter.get('value')) == str:
        return '"{}"'.format(filter.get('value'))
    else:
        return filter.get('value')


def construct_filter(derived_query_structure, df, complexOperator=None):

    # there is no query; return an empty filter string and the
    # original dataframe
    if derived_query_structure is None:
        return ('', df)

    # the operator typed in by the user; can be both word-based or
    # symbol-based
    operator_type = derived_query_structure.get('type')

    # the symbol-based representation of the operator
    operator_subtype = derived_query_structure.get('subType')

    # the LHS and RHS of the query, which are both queries themselves
    left = derived_query_structure.get('left', None)
    right = derived_query_structure.get('right', None)

    # the base case
    if left is None and right is None:
        return (to_string(derived_query_structure), df)

    # recursively apply the filter on the LHS of the query to the
    # dataframe to generate a new dataframe
    (left_query, left_df) = construct_filter(left, df)

    # apply the filter on the RHS of the query to this new dataframe
    (right_query, right_df) = construct_filter(right, left_df)

    # 'datestartswith' and 'contains' can't be used within a pandas
    # filter string, so we have to do this filtering ourselves
    if complexOperator is not None:
        right_query = right.get('value')
        # perform the filtering to generate a new dataframe
        if complexOperator == 'datestartswith':
            return ('', right_df[right_df[left_query].astype(str).str.startswith(right_query)])
        elif complexOperator == 'contains':
            return ('', right_df[right_df[left_query].astype(str).str.contains(right_query)])

    if operator_type == 'relational-operator' and operator_subtype in ['contains', 'datestartswith']:
        return construct_filter(derived_query_structure, df, complexOperator=operator_subtype)

    # construct the query string; return it and the filtered dataframe
    return ('{} {} {}'.format(
        left_query,
        to_string(derived_query_structure) if left_query != '' and right_query != '' else '',
        right_query
    ).strip(), right_df)


@callback(
    Output("demo-table", "data"),
    Input("demo-table", "derived_filter_query_structure")
)
def onFilterUpdate(derived_query_structure):
    (pd_query_string, df_filtered) = construct_filter(derived_query_structure, df)

    if pd_query_string != '':
        df_filtered = df_filtered.query(pd_query_string)

    return df_filtered.to_dict('records')


if __name__ == "__main__":
    app.run(debug=True)