plotly module#

The plotly module in the visualization_toolkit contains higher-order-functions to generate plotly charts that match company branding and visual-best practices. Core charts such as line, bar, area, and combo charts can be creating using this module.

While good defaults are used to style charts, many options exist to customize the visualizations to craft effective visual content. The styling defaults are stored in the plotly.helpers.theme module.

Core Charting Functions#

Function-based building blocks to create a majority of charting content.

visualization_toolkit.helpers.plotly.chart(df, chart_series, x_axis, y1_axis=None, y2_axis=None, annotations=None, shaded_regions=None, include_logo=False, theme=None, custom_options=None)[source]#

Standard function generate a plotly-based chard with standard company styling. The chart function works closely with the other toolkit building blocks for charts: axis, series, and annotations.

The input data to the chart can be passed in as a spark dataframe, pandas dataframe, or a list of dictionaries. The input data is normalized to handle the remaining operations to be supplied to plotly to render the chart.

Parameters:
  • df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. The columns from this data will be used for the chart_series and various axes.

  • chart_series (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – A list of series instances that will be plotted on this chart figure.

  • x_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – The axis configuration for the x-axis of this chart figure.

  • y1_axis (Optional[visualization_toolkit.helpers.plotly.charts.core.axis.Axis]) – The axis configuration for the y1-axis of this chart figure.

  • y2_axis (Optional[visualization_toolkit.helpers.plotly.charts.core.axis.Axis]) – The axis configuration for the y2-axis of this chart figure. By default, no y2 axis is included.

  • annotations (Optional[list[visualization_toolkit.helpers.plotly.charts.core.annotation.Annotation]]) – A list of annotations to include on the chart.

  • shaded_regions (Optional[list[visualization_toolkit.helpers.plotly.charts.core.shading.ShadeX | visualization_toolkit.helpers.plotly.charts.core.shading.ShadeY]]) – A list of shade_x and/or shade_y to include on the chart to shade regions.

  • include_logo (bool) – Optionally include the YipitData logo at the bottom right of the chart. Default is False (no logo added).

  • theme (Optional[dict]) – Optionally changes the theme used to format the chart. Defaults to the YD_CLASSIC_THEME if not set.

  • custom_options (Optional[dict]) – Optionally include any options to pass into figure.update_layout. This is an escape hatch for final adjustments to a chart.

Returns:

Plotly figure object that can be displayed in databricks or supplied to a dash app as a property of the dcc.Graph component.

Return type:

plotly.graph_objects.Figure

Examples#

Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="australia",
            label="Australia",
        ),
        series(
            column_name="new_zealand",
            label="New Zealand",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
)

display(fig)
visualization_toolkit.helpers.plotly.axis(column_name=None, location='y1', label=None, axis_type=None, tick_format=None, tick_values=None, tick_labels=None, tick_interval=None, number_of_ticks=None, tick_angle=-45, axis_min=None, axis_max=None, currency_symbol='$', start_at_non_null_values=False, extra_options=<factory>)[source]#

Control axis behavior by using this function. Options to control the ticks, title, tick format, and overall range is possible. Define an axis for each relavant axis of the chart (ex: 1 x-axis and y-axis is 2 axis function calls).

Axis will attempt to look as good as possible with minimum customization needed. The function will identify the minimum and maximum bounds of the input data to ensure all data is inside the plotted frame.

In addition, the axis will try to generate as even as possible tick values given the dataset. If specific tick steps are preferred specify the tick_interval and axis_min.

Parameters:
  • column_name (str, default: None) – The column of the input data of a chart function call to use.

  • location (Literal['x', 'y1', 'y2'], default: 'y1') – Controls whether this axis is the x, y1, or y2 axis of the chart. This is not needed to be specified, as the chart function will set this value automatically.

  • label (str, default: None) – Set an axis title for the chart. Default is no title is added.

  • axis_type (Literal['date', 'category', 'number', 'currency', 'percent'], default: None) – The numerical type of the data on this axis. It is important to specify this value, as it will control the default axis behavior and formatting.

  • tick_format (str, default: None) – The numerical format to style tick values on the axis. If not specified, a default will be chosen based on the axis_type parameter.

  • tick_interval (int | float | relativedelta, default: None) – Tick values will be incremented by a standard value starting from the axis_min or axis_max.

  • axis_min (int | float | date, default: None) – The lowest point of the axis will be this value. Default is this value is automatically determined by the input data on this axis.

  • axis_max (int | float | date, default: None) – The greatest point of the axis will be this value. Default is this value is automatically determined by the input data on this axis.

  • number_of_ticks (Optional[int], default: None) – The total number of ticks will be this value. Default is 6 for y-axes and 12 for the x-axis.

  • tick_angle (int, default: -45) – Controls the angle that ticks are placed on the axies. Only applies to the x-axis.

  • currency_symbol (str, default: '$') – The currency symbol prefixed to tick labels. Only applied if the axis_type=='currency' and no custom tick_format is used.

  • start_at_non_null_values (bool, default: False) – Used for x-axes, but when set to True, will filter data to the first available data point that has at least one non-null column based on the series plotted for a chart.

  • extra_options (dict, default: <factory>) – Additional options to pass to the axis figure object in Plotly.

  • tick_values (list[int | float | str])

  • tick_labels (list[str])

Return type:

None

Examples#

Example of creating axes and using in a chart function. Notice each axis is assigned to x/y1/y2 position in the chart function.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="lifeExp",
            category_name="country",
            location="y1",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
)

display(fig)
visualization_toolkit.helpers.plotly.series(column_name=None, category_name=None, color=None, color_scale='default', location='y1', label=None, mode='lines', shape=None, hover_format=None, y_data=None, is_stacked=False, connect_gaps=False, show_in_legend=True, shade_series=None, pivot_column_name=None, include_all_categories=False, category_sort_column_name=None, color_mapping=None, extra_options=<factory>)[source]#

A series is used to define a line, bar, area plot on a graph. Each series represents one column of the dataset. If multiple pivoted “series” exist on a column, then the category_name argument can be used to generate or specify each pivoted series.

Each series will be colored based on the company colors if not otherwise specified. Series are by default line plots but can be customized via the mode attribute.

Series will always be on the x-axis and one of the y-axes, specified by location.

Parameters:
  • column_name (str, default: None) – The column name for the series y-values based on the input data for the chart function.

  • category_name (str, default: None) – The column name for a categorical value within pivoted chart data to be used for this specific series. By default, this value is None, and should only be used if pivoting is enabled via pivot_column_name and include_all_categories=False.

  • color (str, default: None) – The color of the plot for this series. Default is that the color is automatically determined based on company colors.

  • location (Literal['y1', 'y2'], default: 'y1') – The Y-axis location of the series. Default is the Y1 axis.

  • label (str, default: None) – The series legend label. Default is the column_name for the series. Legends are only displayed when multiple series exit on a chart.

  • mode (Literal['lines', 'bar', 'area', 'lines+markers', 'markers', 'clustered_bar', 'line', 'line+marker', 'marker', 'scatter'], default: 'lines') – The type of plot for plotly (ex: lines, bar, area, lines+markers, markers) used for the series. Default is “lines” for a line chart.

  • shape (Optional[Literal['dash', 'spline', 'dot', 'stripe']], default: None) – The shape (ex: dashed, dotted, striped fill) of the plot which is behaves differently based on the mode. Default is a solid series plot.

  • hover_format (str, default: None) – Control the data label format when hovering over this series. Default is a standard format based on the corresponding axis’ axis_type.

  • is_stacked (bool, default: False) – Flag to control if pivoted chart data shoud be used to generate multiple series dynamically in the chart function. (Default is False and should not be used unless pivoting is used in the chart function)

  • connect_gaps (bool, default: False) – If True, then the series will be plotted for null or missing data. Default is False, i.e. series is not plotted for null values.

  • shade_series (Optional[ShadeSeries], default: None) – If specified, a shaded region will be applied to the series

  • pivot_column_name (str, default: None) – Column name of the input df to pivot, where each category in the pivot column can then be used as a seperate series of the chart.

  • include_all_categories (bool, default: False) – If True, all categories will be expanded as separate series automatically instead of needing to specify series individually. This is meant to be a convenience flag, but will limit the format customization options available. The default is False.

  • category_sort_column_name (Optional[str], default: None) – If specified, the column name will be used to aggregate the unique categorie and sort them in the legend. To specify a descending order, prefix with a -, ex: category_sort_column_name=-yd_income_bucket. Default behavior is alphabetical sorting based on category name. This field is only used when include_all_categories=True. When False, the series are ordered by how their position in the chart_series list passed to the chart function.

  • color_mapping (Optional[dict[str, str]], default: None) – Optional color mapping of key names to color labels, hexcodes, or rgba values. The keys should be either the column_name or category_name of the series.

  • extra_options (dict, default: <factory>) – Additional options to be passed to the Plotly figure object to format this series.

  • color_scale (Literal['default', 50, 100, 200, 300, 400, 500, 600, 700, 800, 900])

  • y_data (Series)

  • show_in_legend (bool)

Return type:

None

Examples#

Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="australia",
            label="Australia",
        ),
        series(
            column_name="new_zealand",
            label="New Zealand",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
)

display(fig)
Alternative example of creating series dynamically using the category_column_name. In this case each unique country of the input data will be plotted on a parallel series.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="lifeExp",
            category_name="New Zealand",
        ),
        series(
            column_name="lifeExp",
            category_name="Australia",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
    pivot_column_name="country",
)

display(fig)
visualization_toolkit.helpers.plotly.shade_series(boundary_column_names, label=None, color=None, opacity=0.15)[source]#

Adds shaded area for an existing series instance for a chart. The shaded area is based on two columns that need to be present in the input dataframe and are specified via the boundary_column_names argument as a tuple.

Parameters:
  • boundary_column_names ((str, str)) – Tuple of two columns that indicate the shaded boundary for the series. Should be in the format of (lower column, upper column).

  • label (str, default: None) – Label for the shaded range that should be used in the legend and while hovering. When hovering, the label will be suffied with Lower Bound and Upper Bound to indicate the range of the shaded area.

  • color (str, default: None) – Optional fill color of the shaded region. Defaults to parent series color.

  • opacity (float, default: 0.15) – Optional opacity of the shaded region fill color. Defaults to standard theme opacity.

Return type:

None

Examples#

Example of adding a shaded area around a line chart. The columns yy, lower_bound and upper_bound must all exist on the input dataframe.#
from visualization_toolkit.helpers.plotly import chart, axis, series, shade_series

fig = chart(
    df,
    x_axis=axis(column_name="fiscal_qy", label="Fiscal Quarter"),
    y1_axis=axis(label="Downloads Growth Rate", axis_type="percent"),
    chart_series=[
        series(
            column_name="yy",
            label="Y/Y Growth",
            color="dark-blue",
            shade_series=shade_series(
                boundary_column_names=("lower_bound", "upper_bound"),
                label="Margin of Error",
                color="light-blue",
                opacity=.3,
            ),
        ),
    ],
)

display(fig)
visualization_toolkit.helpers.plotly.annotation(text, value, axis_location=None, position='top right', color=None, line_shape='dot', extra_options=<factory>)[source]#

Define text-based annotations to be rendered on charts. Depending on the axis_location set, the annotation will include a horizontal or vertical line. The position of the annotation is controlled by the value which equals the value on the corresponding the axis where an annotation is desired.

The annotation uses either the Plotly hline, vline, or annotation object depending on the axis_location specified.

Parameters:
  • text (str) – Text that will be rendered on the annotation

  • value (Union[int, float, date, datetime, Tuple[int | float | date | datetime, int | float | date | datetime]]) – The value on the corresponding axis that the annotation will be placed. If no axis_location is set, a floating annotation position can be specified by a tuple (x,y) pair for this argument.

  • axis_location (Optional[Literal['x', 'y1', 'y2']], default: None) – Specify whether the annotation should be oriented by the x axis, y1 / y2 axis, or if None a floating annotation is used. Default None.

  • position (Literal['bottom right', 'bottom center', 'bottom left', 'middle right', 'middle center', 'middle left', 'top right', 'top center', 'top left'], default: 'top right') – The relative position of the annotation text, relevant to the drawn line for the annotation. If a floating annotation is desired, this value has no effect. Default is top-right of the annotation line.

  • color (str, default: None) – The color of the annotation line and text. If None, the standard theme is used. Default None.

  • line_shape (Optional[Literal['dash', 'dot', 'dashdot']], default: 'dot') – Control if the annotation line is drawn with dashes, dots, or both. Default is dot.

  • extra_options (dict, default: <factory>) – Additional options to style the underlying Plotly figure object.

Return type:

None

Examples#

Add a vertical annotation on a line chart. The annotation would be placed where 1970 falls on the x-axis.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="lifeExp",
            category_column_name="country",
            location="y1",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
    annotations=[
        annotation(
            text="Example Annotation",
            value=1970,
            axis_location="x",
        ),
    ],
)

display(fig)
visualization_toolkit.helpers.plotly.shade_x(start, end, color='#b8b8b8')[source]#

Add a shaded rectangle region that spans vertically on the chart between the start and end values on the x-axis.

Parameters:
  • start (int | float | date | datetime) – The starting value where the shaded region begins on the x-axis.

  • end (int | float | date | datetime) – The ending value where the shaded region ends on the x-axis.

  • color (str, default: '#b8b8b8') – Fill color of the shaded region. Defaults to light-grey.

Return type:

None

Examples#

Example of creating a chart with shaded vertical areas. Note that the start and end values correspond to the x-axis.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation, shade_x

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="lifeExp",
            category_column_name="country",
            location="y1",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
    shaded_regions=[
        shade_x(start=1950, end=1955, color="green"),
        shade_x(start=1970, end=1975, color="red"),
    ],
)

display(fig)
visualization_toolkit.helpers.plotly.shade_y(start, end, color='#b8b8b8', location='y1')[source]#

Add a shaded rectangle region that spans horizontally on the chart between the start and end values on the y-axis.

Parameters:
  • start (int | float | date | datetime) – The starting value where the shaded region begins on the y-axis.

  • end (int | float | date | datetime) – The ending value where the shaded region ends on the y-axis.

  • color (str, default: '#b8b8b8') – Fill color of the shaded region. Defaults to light-grey.

  • location (Literal['y1', 'y2'])

Return type:

None

Examples#

Example of creating a chart with shaded horizontal areas. Note that the start and end values correspond to the y-axis.#
from visualization_toolkit.helpers.plotly import chart, axis, series, annotation, shade_y

fig = chart(
    df,
    x_axis=axis(
        column_name="year",
        label="Year",
    ),
    chart_series=[
        series(
            column_name="lifeExp",
            category_column_name="country",
            location="y1",
        ),
    ],
    y1_axis=axis(
        label="Life Expectancy",
        axis_type="number",
    ),
    shaded_regions=[
        shade_y(start=50, end=55, color="green"),
        shade_y(start=70, end=75, color="red"),
    ],
)

display(fig)

Advanced Charting Functions#

Specific functions to create custom chart types such as waterfall charts or heatmaps

visualization_toolkit.helpers.plotly.heatmap_chart(df, category_column_name, z_axis, theme=YD_CLASSIC_THEME, colorscale=POSITIVE_COLORSCALE)[source]#

Create a heatmap visualization to compare two sets of categories in a table-like view.

Cells in the heatmap are colored based on their value with respect to the colorscale provided. Cell values are formatted based on the z_axis provided. A category_column_name indicates which column contains one set of categories to compare. It is assumed all other columns represent the opposite set of categories to be compared.

The data must have one categorical column with all other columns each representing a category to be compared. Values within each category pair will appear in the heatmap as a cell.

Parameters:
  • df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. Only one column type can include categorical values while all other columns must be numerical in nature.

  • category_column_name (str) – Name of the categorical column in the df to plot on one axis of the heatmap.

  • z_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – An axis object that defines how values within the heatmap should be formatted.

  • theme (dict) – Optionally changes the theme used to format the chart. Defaults to the YD_CLASSIC_THEME if not set.

  • colorscale (list[list[float, str]]) – Optionally changes the colorscale used to format heatmap cells. Defaults to POSITIVE_COLORSCALE of various shades of blue.

Returns:

Examples#

Example of creating a heatmap chart. Notice the z_axis is used to format cells in the heatmap with the axis_type argument.#
from visualization_toolkit.helpers.plotly import heatmap_chart, axis, series

fig = heatmap_chart(
    pdf,
    category_column_name="fiscal_qy",
    z_axis=axis(label="Downloads", axis_type="number"),
)

display(fig)
visualization_toolkit.helpers.plotly.waterfall_chart(df, y1_axis, totals=None, additions=None, subtractions=None, theme=YD_CLASSIC_THEME)[source]#

Create a Waterfall chart that can summarize the effects of multiple positive or negative adjustments to a starting and ending balance. The chart will automatically style additions and subtractions series so that the starting value from totals equals the ending value of totals.

Data must be provided as a single row where the columns indicate the starting and ending totals along with any adjustments that are to be plotted.

Parameters:
  • df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. The columns from this data will be used for the totals, additions, and subtractions series and the y1_axis.

  • y1_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – The axis that the balances are plotted on

  • totals (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Two series containing the starting and ending balance for the data. The order matters so pass in the totals in the correct order.

  • additions (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Any number of series indicating positive adjustments to the starting balance.

  • subtractions (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Any number of series indicating negative adjustments to the starting balance.

  • theme (dict) – Optionally changes the theme used to format the chart. Defaults to the YD_CLASSIC_THEME if not set.

Returns:

Return type:

plotly.graph_objs.Figure

Examples#

Example of creating a waterfall chart. Notice the shape of data that is used. It is acceptable to pass data as a dataframe, pandas dataframe, or a single-item list of dictionaries.#
from visualization_toolkit.helpers.plotly import waterfall_chart, axis, series

data = [
    {
        "prior_period": 186994,
        "expansion": 36691,
        "contraction": -36489,
        "churn": -40530,
        "ending_period": 146656,
    }
]

fig = waterfall_chart(
    data,
    y1_axis=axis(label="Spend", axis_type="currency"),
    totals=[
        series(
            column_name="prior_period",
            label="Prior Period",
        ),
        series(
            column_name="ending_period",
            label="Ending Period",
        ),
    ],
    additions=[
        series(
            column_name="expansion",
            label="Expansion",
        ),
    ],
    subtractions=[
        series(
            column_name="contraction",
            label="Contraction",
        ),
        series(
            column_name="churn",
            label="Churn",
        ),
    ],
)

display(fig)

Plotly Base Functions#

General utilities to build custom visualizations with plotly are found here. These include lower-level functions that chart will build upon.

visualization_toolkit.helpers.plotly.generate_base_figure(height=None, theme=None)[source]#

Returns an empty Plotly figure that is pre-configured to match the YD branding and style. Traces added to this figure will generally respect the branding, but additional configuration may be needed.

Parameters:
  • height (int, default: None) – Optionally pass the starting height of the figure. Will default to standard layout height if not specified.

  • theme (Optional[dict], default: None) – Optionally set a standard theme from visualization_toolkit. Defaults to the classic YipitData report theme

Return type:

Figure

Examples#

Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#
from visualization_toolkit.helpers.plotly import generate_base_figure

# Generate figure and then add traces, shapes, etc. to the figure via plotly to create a visualization
fig = generate_base_figure()
fig.add_trace(...)
Use the ATLAS_THEME to customize styles for interactive visualizations on the portal.#
from visualization_toolkit.helpers.plotly import generate_base_figure
from visualization_toolkit.helpers.plotly.theme import ATLAS_THEME

# Generate figure and then add traces, shapes, etc. to the figure via plotly to create a visualization
fig = generate_base_figure(theme=ATLAS_THEME)
fig.add_trace(...)
visualization_toolkit.helpers.plotly.highlight_trace(fig, trace_index, selected_opacity=1, unselected_opacity=0.4)[source]#

Highlight a trace in a plotly figure. This works by setting the opacity of the trace to 1 and the other traces to 0.4.

Parameters:
  • fig (Figure) – Plotly figure to highlight a trace in

  • trace_index (int) – Index of the trace to highlight

  • selected_opacity (float, default: 1) – Opacity of the selected trace

  • unselected_opacity (float, default: 0.4) – Opacity of the unselected traces

Returns:

Examples#

visualization_toolkit.helpers.plotly.disable_hover_labels(fig)[source]#

Disable hover labels for a plotly figure. This works by removing or hiding hover labels, spikelines, and hovertext. The hoverData parameter will still trigger dash callbacks.

Warning

When disabling hover labels, you will also need to set the following css attributes on a parent dash element:

& .hovertext {
    fill-opacity: 0;
    stroke-opacity: 0;
}
Parameters:

fig (Figure) – Plotly figure to disable hover labels for

Returns:

Examples#

Disable plotly hover labels for a figure#
from visualization_toolkit.helpers.plotly import generate_pdf_from_figure, chart

# Assume a chart is created with series
fig = chart(...)

# Disable hover labels for the figure
disable_hover_labels(fig)

display(fig)