As discussed in the interactivity chapter,
DataTable
includes filtering capabilities.
Set filter_action='native'
for clientside (front-end) filtering
or filter_action='custom'
to perform your own filtering in Python.
filter_action='native'
will work well up to 10,000-100,000 rows.
After which, you may want to use filter_action='custom'
so that your
app sends less data over the network into the browser.
from dash import Dash, dash_table
import datetime
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
df = df[['continent', 'country', 'pop', 'lifeExp']] # prune columns for example
df['Mock Date'] = [
datetime.datetime(2020, 1, 1, 0, 0, 0) + i * datetime.timedelta(hours=13)
for i in range(len(df))
]
app = Dash(__name__)
app.layout = dash_table.DataTable(
columns=[
{'name': 'Continent', 'id': 'continent', 'type': 'numeric'},
{'name': 'Country', 'id': 'country', 'type': 'text'},
{'name': 'Population', 'id': 'pop', 'type': 'numeric'},
{'name': 'Life Expectancy', 'id': 'lifeExp', 'type': 'numeric'},
{'name': 'Mock Dates', 'id': 'Mock Date', 'type': 'datetime'}
],
data=df.to_dict('records'),
filter_action='native',
style_table={
'height': 400,
},
style_data={
'width': '150px', 'minWidth': '150px', 'maxWidth': '150px',
'overflow': 'hidden',
'textOverflow': 'ellipsis',
}
)
if __name__ == '__main__':
app.run(debug=True)
Notes:
- As above, we recommend fixing column widths with filtering. Otherwise, the column
widths will grow or shrink depending on how wide the data is within the columns.
- There is a bug with fixed_rows
that prevents horizontal scroll
when no filter results are returned. Until this bug is fixed, we recommend
avoiding fixed_rows
. For updates, see plotly/dash-table#746
- The default filtering behavior will depend on the data type of the column (see below).
Data types are not inferred, so you have to set them manually.
The example below determines the datatype of the column automatically with Pandas:
from dash import Dash, dash_table
import datetime
import sys
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
df = df[['continent', 'country', 'pop', 'lifeExp']] # prune columns for example
df['Mock Date'] = [
datetime.datetime(2020, 1, 1, 0, 0, 0) + i * datetime.timedelta(hours=13)
for i in range(len(df))
]
app = Dash(__name__)
def table_type(df_column):
# Note - this only works with Pandas >= 1.0.0
if sys.version_info < (3, 0): # Pandas 1.0.0 does not support Python 2
return 'any'
if isinstance(df_column.dtype, pd.DatetimeTZDtype):
return 'datetime',
elif (isinstance(df_column.dtype, pd.StringDtype) or
isinstance(df_column.dtype, pd.BooleanDtype) or
isinstance(df_column.dtype, pd.CategoricalDtype) or
isinstance(df_column.dtype, pd.PeriodDtype)):
return 'text'
elif (isinstance(df_column.dtype, pd.SparseDtype) or
isinstance(df_column.dtype, pd.IntervalDtype) or
isinstance(df_column.dtype, pd.Int8Dtype) or
isinstance(df_column.dtype, pd.Int16Dtype) or
isinstance(df_column.dtype, pd.Int32Dtype) or
isinstance(df_column.dtype, pd.Int64Dtype)):
return 'numeric'
else:
return 'any'
app.layout = dash_table.DataTable(
columns=[
{'name': i, 'id': i, 'type': table_type(df[i])} for i in df.columns
],
data=df.to_dict('records'),
filter_action='native',
css=[{
'selector': 'table',
'rule': 'table-layout: fixed' # note - this does not work with fixed_rows
}],
style_table={'height': 400},
style_data={
'width': '{}%'.format(100. / len(df.columns)),
'textOverflow': 'hidden'
}
)
if __name__ == '__main__':
app.run(debug=True)
The filtering syntax is data-type specific.
Data types are not inferred, they must be set manually.
If a type is not specified, then we assume it is a string (text).
Text & String Filtering
United
= United
United States
"United States"
= United States
= "United States"
contains United
> United
>= United
< United
<= United
By default, the columns with the “text” type use the
contains
operator. So, searching United
is the same as
contains United
For legacy purposes, eq
can also be substituted for =
.
>
, >=
, <
, and <=
compare strings in dictionary order,
with numbers and most symbols coming before letters,
and uppercase coming before lowercase.
If you have quotes in the string, you can use a different quote, or
escape the quote character. So eq 'Say "Yes!"'
and
="Say \\"Yes!\\""
are the same.
Numeric Filtering
43.828
= 43.828
> 43.828
>= 43.828
< 43.828
<= 43.828
By default, columns with the numeric
type use the =
operator.
So, searching 43.828
is the same as = 43.828
.
Datetime Filtering
2020
2020-01
2020-01-01
2020-01-01 04:01
2020-01-01 04:01:10
datestartswith 2020
datestartswith 2020-01
datestartswith 2020-01-01
datestartswith 2020-01-01 04:01
datestartswith 2020-01-01 04:01:10
> 2020-01
> 2020-01-20
>= 2020-01
>= 2020-01-20
< 2020-01
< 2020-01-20
<= 2020-01
<= 2020-01-20
Many operators have two forms: a symbol (=
) and a word (eq
) that
can be used interchangeably.
<table>
<tr>
<td><h4><p><code>=<code> <code>eq<code><p><h4>
Default operator for <code>number<code> columns<td>
<td>Are the two numbers equal? Regardless of type, will first try to
convert both sides to numbers and compare the numbers. If either
cannot be converted to a number, looks for an exact match.<td>
<tr>
<tr>
<td><h4><p><code>contains<code><p><h4>
Default operator for <code>text<code> and <code>any<code> columns<td>
<td>Does the text value contain the requested substring?
May match the beginning, end, or anywhere in the middle. The match
is case-sensitive and exact.<td>
<tr>
<tr>
<td><h4><p><code>datestartswith<code><p><h4>
Default operator for <code>datetime<code> columns’<td>
<td>Does the datetime start with the given parts? Enter a partial
datetime, this will match any date that has at least as much
precision and starts with the same pieces. For example,
<code>datestartswith ‘2018-03-01’<code> will match <code>‘2018-03-01 12:59’<code> but
not <code>‘2018-03’<code> even though we interpret <code>‘2018-03-01’<code> and
<code>‘2018-03’<code> both to mean the first instant of March, 2018.<td>
<tr>
<tr>
<td><h4><p><code>><code> <code>gt<code> <code><<code> <code>lt<code><br>
<code>>=<code> <code>ge<code> <code><=<code> <code>le<code><br>
<code>!=<code> <code>ne<code><p><h4><td>
<td>Comparison: greater than, less than, greater or equal, less or
equal, and not equal. Two strings compare by their dictionary
order, with numbers and most symbols coming before letters, and
uppercase coming before lowercase.<td>
<tr>
<table>
For large dataframes, you can perform the filtering in Python instead
of the default clientside filtering. You can find more information on
performing operations in python in the
Python Callbacks chapter.
The syntax is (now) the same as front-end filtering, but it’s up to the
developer to implement the logic to apply these filters on the Python
side.
In the future we may accept any filter strings, to allow you to
write your own expression query language.
Example:
from dash import Dash, dash_table, Input, Output, callback
import pandas as pd
app = Dash(__name__)
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
app.layout = dash_table.DataTable(
id='table-filtering-be',
columns=[
{"name": i, "id": i} for i in sorted(df.columns)
],
filter_action='custom',
filter_query=''
)
operators = [['ge ', '>='],
['le ', '<='],
['lt ', '<'],
['gt ', '>'],
['ne ', '!='],
['eq ', '='],
['contains '],
['datestartswith ']]
def split_filter_part(filter_part):
for operator_type in operators:
for operator in operator_type:
if operator in filter_part:
name_part, value_part = filter_part.split(operator, 1)
name = name_part[name_part.find('{') + 1: name_part.rfind('}')]
value_part = value_part.strip()
v0 = value_part[0]
if (v0 == value_part[-1] and v0 in ("'", '"', '`')):
value = value_part[1: -1].replace('\\' + v0, v0)
else:
try:
value = float(value_part)
except ValueError:
value = value_part
# word operators need spaces after them in the filter string,
# but we don't want these later
return name, operator_type[0].strip(), value
return [None] * 3
@callback(
Output('table-filtering-be', "data"),
Input('table-filtering-be', "filter_query"))
def update_table(filter):
filtering_expressions = filter.split(' && ')
dff = df
for filter_part in filtering_expressions:
col_name, operator, filter_value = split_filter_part(filter_part)
if operator in ('eq', 'ne', 'lt', 'le', 'gt', 'ge'):
# these operators match pandas series operator method names
dff = dff.loc[getattr(dff[col_name], operator)(filter_value)]
elif operator == 'contains':
dff = dff.loc[dff[col_name].str.contains(filter_value)]
elif operator == 'datestartswith':
# this is a simplification of the front-end filtering logic,
# only works with complete fields in standard format
dff = dff.loc[dff[col_name].str.startswith(filter_value)]
return dff.to_dict('records')
if __name__ == '__main__':
app.run(debug=True)
Filter queries can be as simple or as complicated as you want
them to be. When something is typed into a column filter, it
is automatically converted to a filter query on that column
only.
from dash import Dash, dcc, html, Input, Output, dash_table, callback
import pandas as pd
import json
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/gapminder2007.csv')
df['id'] = df['country']
df.set_index('id', inplace=True, drop=False)
app = Dash(__name__)
app.layout = html.Div([
dcc.RadioItems(
[{'label': 'Read filter_query', 'value': 'read'}, {'label': 'Write to filter_query', 'value': 'write'}],
'read',
id='filter-query-read-write',
),
html.Br(),
dcc.Input(id='filter-query-input', placeholder='Enter filter query'),
html.Div(id='filter-query-output'),
html.Hr(),
dash_table.DataTable(
id='datatable-advanced-filtering',
columns=[
{'name': i, 'id': i, 'deletable': True} for i in df.columns
# omit the id column
if i != 'id'
],
data=df.to_dict('records'),
editable=True,
page_action='native',
page_size=10,
filter_action="native"
),
html.Hr(),
html.Div(id='datatable-query-structure', style={'whitespace': 'pre'})
])
@callback(
Output('filter-query-input', 'style'),
Output('filter-query-output', 'style'),
Input('filter-query-read-write', 'value')
)
def query_input_output(val):
input_style = {'width': '100%'}
output_style = {}
if val == 'read':
input_style.update(display='none')
output_style.update(display='inline-block')
else:
input_style.update(display='inline-block')
output_style.update(display='none')
return input_style, output_style
@callback(
Output('datatable-advanced-filtering', 'filter_query'),
Input('filter-query-input', 'value')
)
def write_query(query):
if query is None:
return ''
return query
@callback(
Output('filter-query-output', 'children'),
Input('datatable-advanced-filtering', 'filter_query')
)
def read_query(query):
if query is None:
return "No filter query"
return dcc.Markdown('`filter_query = "{}"`'.format(query))
@callback(
Output('datatable-query-structure', 'children'),
Input('datatable-advanced-filtering', 'derived_filter_query_structure')
)
def display_query(query):
if query is None:
return ''
return html.Details([
html.Summary('Derived filter query structure'),
html.Div(dcc.Markdown('''```json
{}
```'''.format(json.dumps(query, indent=4))))
])
if __name__ == '__main__':
app.run(debug=True)
The filter_query
property is written to when the user
filters the data by using the column filters. For example, if
a user types ge 100000000
in the pop
column filter, and
Asia
in the continent
column filter, filter_query
will
look like this:
{pop} ge 100000000 && {continent} contains "Asia"
Try typing those values into the column filters in the app
above, and ensure that the “Read filter_query” option is
selected.
The filter_query
property can also be written to. This might
be useful when performing more complex filtering,
like if you want to filter a column based on two (or more)
conditions. For instance, say that we want countries with a
population greater than 100 million, but less than 500
million. Then our filter_query
would be as follows:
{pop} ge 100000000 and {pop} le 500000000
Select the “Write to filter_query” option in the app above,
and try it out by copying and pasting the filter query above
into the input box.
Say that we now want to get a bit more advanced, and
cross-filter between columns; for instance, we only want the
results that are located in Asia. Now, our filter query
becomes:
{pop} ge 100000000 and {pop} le 500000000 and {continent} eq "Asia"
We can make the expression even more complex. For example,
let’s say we want all of those countries with the populations
that fall within our boundaries and that are in Asia, but for
some reason we also want to include Singapore. This results in
a filter query that is a little more long-winded:
(({pop} ge 100000000 and {pop} le 500000000) or {country} eq "Singapore") and {continent} eq "Asia"
Note that we’ve grouped expressions together using
parentheses. This is part of the filtering syntax. Just as is
true in mathematical expressions, the expressions in the
innermost parentheses are evaluated first.
An important thing to notice is that the two types of
relational operators that can be used in the column filters
(symbol-based, like >=
, and letter-based, like ge
) are not
converted into one another when filter_query
is being
constructed from the values in the column filters. Therefore,
if using filter_query
to implement backend filtering, it’s
necessary to take both of these forms of the
“greater-than-or-equal-to” operator into account when parsing
the query string (or ensure that the user only uses the ones
that the backend can parse).
However, in the case of the logical operator and/&&
, when
the table is constructing the query string, the symbol-based
representation will always be used.
The derived_filter_query_structure
prop is a dictionary
representation of the query syntax tree. You can use the value
of this property to implement backend filtering.
For a query that describes a relationship between two values,
there are three components: the operation, the left-hand side,
and the right-hand side. For instance, take the following
query:
{pop} ge 100000000
The operation here is ge
(i.e., >=
), the left-hand side is
the field pop
(corresponding to the column pop
), and the
right-hand side is the value 100000000
. As the queries
become increasingly complex, so do the query structures. Try
it out by expanding the “Derived filter query structure” in
the example app above.
Note that for all operators, there are two keys subType
and
value
that correspond to, respectively, the symbol-based
representation and the originally inputted representation of
the operator. So, in the case of the query above, subType
will be >=
and value
will be ge
; if our query string
were {pop} >= 100000000
instead, both subType
and value
will be >=
.
pandas
and derived_filter_query_structure
It’s likely that your data are already in a pandas
dataframe. Using the derived_filter_query_structure
in
conjunction with pandas
filters can enable you to do some
pretty heavy lifting with the table! You can see an example of
this below.
from dash import Dash, Input, Output, callback
from dash.dash_table import DataTable
import json
import pandas as pd
types = {
'id': 'numeric',
'Complaint_ID': 'numeric',
'ZIP_code': 'numeric',
'Date_received': 'datetime',
'Date_sent_to_company': 'datetime',
}
df = pd.read_csv('https://github.com/plotly/datasets/raw/master/26k-consumer-complaints.csv')
df['id'] = df['Unnamed: 0']
df = df.drop(['Unnamed: 0'], axis=1)
df = df.reindex(columns=['id']+df.columns[:-1].tolist())
df.columns = [column.replace(" ", "_") for column in df.columns]
df.columns = [column.replace("-", "_") for column in df.columns]
app = Dash()
app.scripts.config.serve_locally = True
app.layout = DataTable(
id='demo-table',
data=df.to_dict('records'),
columns=[{ 'id': i, 'name': i, 'type': types.get(i, 'any')} for i in df.columns],
filter_action='custom',
page_action='native',
page_size=15,
virtualization=True,
style_cell={
'min-width': '100px'
},
css=[
{ 'selector': '.row-1', 'rule': 'min-height: 500px;' }
]
)
def to_string(filter):
operator_type = filter.get('type')
operator_subtype = filter.get('subType')
if operator_type == 'relational-operator':
if operator_subtype == '=':
return '=='
else:
return operator_subtype
elif operator_type == 'logical-operator':
if operator_subtype == '&&':
return '&'
else:
return '|'
elif operator_type == 'expression' and operator_subtype == 'value' and type(filter.get('value')) == str:
return '"{}"'.format(filter.get('value'))
else:
return filter.get('value')
def construct_filter(derived_query_structure, df, complexOperator=None):
# there is no query; return an empty filter string and the
# original dataframe
if derived_query_structure is None:
return ('', df)
# the operator typed in by the user; can be both word-based or
# symbol-based
operator_type = derived_query_structure.get('type')
# the symbol-based representation of the operator
operator_subtype = derived_query_structure.get('subType')
# the LHS and RHS of the query, which are both queries themselves
left = derived_query_structure.get('left', None)
right = derived_query_structure.get('right', None)
# the base case
if left is None and right is None:
return (to_string(derived_query_structure), df)
# recursively apply the filter on the LHS of the query to the
# dataframe to generate a new dataframe
(left_query, left_df) = construct_filter(left, df)
# apply the filter on the RHS of the query to this new dataframe
(right_query, right_df) = construct_filter(right, left_df)
# 'datestartswith' and 'contains' can't be used within a pandas
# filter string, so we have to do this filtering ourselves
if complexOperator is not None:
right_query = right.get('value')
# perform the filtering to generate a new dataframe
if complexOperator == 'datestartswith':
return ('', right_df[right_df[left_query].astype(str).str.startswith(right_query)])
elif complexOperator == 'contains':
return ('', right_df[right_df[left_query].astype(str).str.contains(right_query)])
if operator_type == 'relational-operator' and operator_subtype in ['contains', 'datestartswith']:
return construct_filter(derived_query_structure, df, complexOperator=operator_subtype)
# construct the query string; return it and the filtered dataframe
return ('{} {} {}'.format(
left_query,
to_string(derived_query_structure) if left_query != '' and right_query != '' else '',
right_query
).strip(), right_df)
@callback(
Output("demo-table", "data"),
Input("demo-table", "derived_filter_query_structure")
)
def onFilterUpdate(derived_query_structure):
(pd_query_string, df_filtered) = construct_filter(derived_query_structure, df)
if pd_query_string != '':
df_filtered = df_filtered.query(pd_query_string)
return df_filtered.to_dict('records')
if __name__ == "__main__":
app.run(debug=True)