plotly module#
The plotly module in the visualization_toolkit contains higher-order-functions to generate
plotly charts that match company branding and visual-best practices. Core charts such as line, bar, area, and
combo charts can be creating using this module.
While good defaults are used to style charts, many options exist to customize the visualizations to craft effective visual content. The styling defaults are stored in the plotly.helpers.theme module.
Core Charting Functions#
Function-based building blocks to create a majority of charting content.
- visualization_toolkit.helpers.plotly.chart(df, chart_series, x_axis, y1_axis=None, y2_axis=None, annotations=None, shaded_regions=None, include_logo=False, theme=None, custom_options=None)[source]#
Standard function generate a plotly-based chard with standard company styling. The
chartfunction works closely with the other toolkit building blocks for charts:axis,series, andannotations.The input data to the chart can be passed in as a spark dataframe, pandas dataframe, or a list of dictionaries. The input data is normalized to handle the remaining operations to be supplied to plotly to render the chart.
- Parameters:
df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. The columns from this data will be used for the
chart_seriesand various axes.chart_series (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – A list of
seriesinstances that will be plotted on this chart figure.x_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – The axis configuration for the x-axis of this chart figure.
y1_axis (Optional[visualization_toolkit.helpers.plotly.charts.core.axis.Axis]) – The axis configuration for the y1-axis of this chart figure.
y2_axis (Optional[visualization_toolkit.helpers.plotly.charts.core.axis.Axis]) – The axis configuration for the y2-axis of this chart figure. By default, no y2 axis is included.
annotations (Optional[list[visualization_toolkit.helpers.plotly.charts.core.annotation.Annotation]]) – A list of
annotationsto include on the chart.shaded_regions (Optional[list[visualization_toolkit.helpers.plotly.charts.core.shading.ShadeX | visualization_toolkit.helpers.plotly.charts.core.shading.ShadeY]]) – A list of
shade_xand/orshade_yto include on the chart to shade regions.include_logo (bool) – Optionally include the YipitData logo at the bottom right of the chart. Default is False (no logo added).
theme (Optional[dict]) – Optionally changes the theme used to format the chart. Defaults to the
YD_CLASSIC_THEMEif not set.custom_options (Optional[dict]) – Optionally include any options to pass into figure.update_layout. This is an escape hatch for final adjustments to a chart.
- Returns:
Plotly
figureobject that can be displayed in databricks or supplied to a dash app as a property of thedcc.Graphcomponent.- Return type:
plotly.graph_objects.Figure
Examples#
Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="australia", label="Australia", ), series( column_name="new_zealand", label="New Zealand", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), ) display(fig)
- visualization_toolkit.helpers.plotly.axis(column_name=None, location='y1', label=None, axis_type=None, tick_format=None, tick_values=None, tick_labels=None, tick_interval=None, number_of_ticks=None, tick_angle=-45, axis_min=None, axis_max=None, currency_symbol='$', start_at_non_null_values=False, extra_options=<factory>)[source]#
Control axis behavior by using this function. Options to control the ticks, title, tick format, and overall range is possible. Define an axis for each relavant axis of the chart (ex: 1 x-axis and y-axis is 2 axis function calls).
Axis will attempt to look as good as possible with minimum customization needed. The function will identify the minimum and maximum bounds of the input data to ensure all data is inside the plotted frame.
In addition, the axis will try to generate as even as possible tick values given the dataset. If specific tick steps are preferred specify the
tick_intervalandaxis_min.- Parameters:
column_name (
str, default:None) – The column of the input data of a chart function call to use.location (
Literal['x','y1','y2'], default:'y1') – Controls whether this axis is the x, y1, or y2 axis of the chart. This is not needed to be specified, as thechartfunction will set this value automatically.label (
str, default:None) – Set an axis title for the chart. Default is no title is added.axis_type (
Literal['date','category','number','currency','percent'], default:None) – The numerical type of the data on this axis. It is important to specify this value, as it will control the default axis behavior and formatting.tick_format (
str, default:None) – The numerical format to style tick values on the axis. If not specified, a default will be chosen based on theaxis_typeparameter.tick_interval (
int|float|relativedelta, default:None) – Tick values will be incremented by a standard value starting from theaxis_minoraxis_max.axis_min (
int|float|date, default:None) – The lowest point of the axis will be this value. Default is this value is automatically determined by the input data on this axis.axis_max (
int|float|date, default:None) – The greatest point of the axis will be this value. Default is this value is automatically determined by the input data on this axis.number_of_ticks (
Optional[int], default:None) – The total number of ticks will be this value. Default is 6 for y-axes and 12 for the x-axis.tick_angle (
int, default:-45) – Controls the angle that ticks are placed on the axies. Only applies to the x-axis.currency_symbol (
str, default:'$') – The currency symbol prefixed to tick labels. Only applied if theaxis_type=='currency'and no customtick_formatis used.start_at_non_null_values (
bool, default:False) – Used for x-axes, but when set toTrue, will filter data to the first available data point that has at least one non-null column based on the series plotted for a chart.extra_options (
dict, default:<factory>) – Additional options to pass to the axis figure object in Plotly.tick_values (list[int | float | str])
tick_labels (list[str])
- Return type:
None
Examples#
Example of creating axes and using in achartfunction. Notice each axis is assigned to x/y1/y2 position in the chart function.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="lifeExp", category_name="country", location="y1", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), ) display(fig)
- visualization_toolkit.helpers.plotly.series(column_name=None, category_name=None, color=None, color_scale='default', location='y1', label=None, mode='lines', shape=None, hover_format=None, y_data=None, is_stacked=False, connect_gaps=False, show_in_legend=True, shade_series=None, pivot_column_name=None, include_all_categories=False, category_sort_column_name=None, color_mapping=None, extra_options=<factory>)[source]#
A series is used to define a line, bar, area plot on a graph. Each series represents one column of the dataset. If multiple pivoted “series” exist on a column, then the category_name argument can be used to generate or specify each pivoted series.
Each series will be colored based on the company colors if not otherwise specified. Series are by default line plots but can be customized via the mode attribute.
Series will always be on the x-axis and one of the y-axes, specified by location.
- Parameters:
column_name (
str, default:None) – The column name for the series y-values based on the input data for thechartfunction.category_name (
str, default:None) – The column name for a categorical value within pivoted chart data to be used for this specific series. By default, this value isNone, and should only be used if pivoting is enabled viapivot_column_nameandinclude_all_categories=False.color (
str, default:None) – The color of the plot for this series. Default is that the color is automatically determined based on company colors.location (
Literal['y1','y2'], default:'y1') – The Y-axis location of the series. Default is the Y1 axis.label (
str, default:None) – The series legend label. Default is the column_name for the series. Legends are only displayed when multiple series exit on a chart.mode (
Literal['lines','bar','area','lines+markers','markers','clustered_bar','line','line+marker','marker','scatter'], default:'lines') – The type of plot for plotly (ex: lines, bar, area, lines+markers, markers) used for the series. Default is “lines” for a line chart.shape (
Optional[Literal['dash','spline','dot','stripe']], default:None) – The shape (ex: dashed, dotted, striped fill) of the plot which is behaves differently based on themode. Default is a solid series plot.hover_format (
str, default:None) – Control the data label format when hovering over this series. Default is a standard format based on the corresponding axis’axis_type.is_stacked (
bool, default:False) – Flag to control if pivoted chart data shoud be used to generate multiple series dynamically in thechartfunction. (Default is False and should not be used unless pivoting is used in thechartfunction)connect_gaps (
bool, default:False) – If True, then the series will be plotted for null or missing data. Default is False, i.e. series is not plotted for null values.shade_series (
Optional[ShadeSeries], default:None) – If specified, a shaded region will be applied to the seriespivot_column_name (
str, default:None) – Column name of the inputdfto pivot, where each category in the pivot column can then be used as a seperateseriesof the chart.include_all_categories (
bool, default:False) – IfTrue, all categories will be expanded as separateseriesautomatically instead of needing to specifyseriesindividually. This is meant to be a convenience flag, but will limit the format customization options available. The default isFalse.category_sort_column_name (
Optional[str], default:None) – If specified, the column name will be used to aggregate the unique categorie and sort them in the legend. To specify a descending order, prefix with a-, ex:category_sort_column_name=-yd_income_bucket. Default behavior is alphabetical sorting based on category name. This field is only used wheninclude_all_categories=True. When False, the series are ordered by how their position in the chart_series list passed to thechartfunction.color_mapping (
Optional[dict[str,str]], default:None) – Optional color mapping of key names to color labels, hexcodes, or rgba values. The keys should be either thecolumn_nameorcategory_nameof the series.extra_options (
dict, default:<factory>) – Additional options to be passed to the Plotly figure object to format this series.color_scale (Literal['default', 50, 100, 200, 300, 400, 500, 600, 700, 800, 900])
y_data (Series)
show_in_legend (bool)
- Return type:
None
Examples#
Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="australia", label="Australia", ), series( column_name="new_zealand", label="New Zealand", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), ) display(fig)
Alternative example of creating series dynamically using the category_column_name. In this case each unique country of the input data will be plotted on a parallel series.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="lifeExp", category_name="New Zealand", ), series( column_name="lifeExp", category_name="Australia", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), pivot_column_name="country", ) display(fig)
- visualization_toolkit.helpers.plotly.shade_series(boundary_column_names, label=None, color=None, opacity=0.15)[source]#
Adds shaded area for an existing
seriesinstance for achart. The shaded area is based on two columns that need to be present in the input dataframe and are specified via theboundary_column_namesargument as a tuple.- Parameters:
boundary_column_names ((
str,str)) – Tuple of two columns that indicate the shaded boundary for the series. Should be in the format of(lower column, upper column).label (
str, default:None) – Label for the shaded range that should be used in the legend and while hovering. When hovering, the label will be suffied withLower BoundandUpper Boundto indicate the range of the shaded area.color (
str, default:None) – Optional fill color of the shaded region. Defaults to parent series color.opacity (
float, default:0.15) – Optional opacity of the shaded region fill color. Defaults to standard theme opacity.
- Return type:
None
Examples#
Example of adding a shaded area around a line chart. The columnsyy,lower_boundandupper_boundmust all exist on the input dataframe.#from visualization_toolkit.helpers.plotly import chart, axis, series, shade_series fig = chart( df, x_axis=axis(column_name="fiscal_qy", label="Fiscal Quarter"), y1_axis=axis(label="Downloads Growth Rate", axis_type="percent"), chart_series=[ series( column_name="yy", label="Y/Y Growth", color="dark-blue", shade_series=shade_series( boundary_column_names=("lower_bound", "upper_bound"), label="Margin of Error", color="light-blue", opacity=.3, ), ), ], ) display(fig)
- visualization_toolkit.helpers.plotly.annotation(text, value, axis_location=None, position='top right', color=None, line_shape='dot', extra_options=<factory>)[source]#
Define text-based annotations to be rendered on charts. Depending on the axis_location set, the annotation will include a horizontal or vertical line. The position of the annotation is controlled by the value which equals the value on the corresponding the axis where an annotation is desired.
The annotation uses either the Plotly
hline,vline, orannotationobject depending on the axis_location specified.- Parameters:
text (
str) – Text that will be rendered on the annotationvalue (
Union[int,float,date,datetime,Tuple[int|float|date|datetime,int|float|date|datetime]]) – The value on the corresponding axis that the annotation will be placed. If no axis_location is set, a floating annotation position can be specified by a tuple (x,y) pair for this argument.axis_location (
Optional[Literal['x','y1','y2']], default:None) – Specify whether the annotation should be oriented by the x axis, y1 / y2 axis, or if None a floating annotation is used. Default None.position (
Literal['bottom right','bottom center','bottom left','middle right','middle center','middle left','top right','top center','top left'], default:'top right') – The relative position of the annotation text, relevant to the drawn line for the annotation. If a floating annotation is desired, this value has no effect. Default is top-right of the annotation line.color (
str, default:None) – The color of the annotation line and text. If None, the standard theme is used. Default None.line_shape (
Optional[Literal['dash','dot','dashdot']], default:'dot') – Control if the annotation line is drawn with dashes, dots, or both. Default is dot.extra_options (
dict, default:<factory>) – Additional options to style the underlying Plotly figure object.
- Return type:
None
Examples#
Add a vertical annotation on a line chart. The annotation would be placed where 1970 falls on the x-axis.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="lifeExp", category_column_name="country", location="y1", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), annotations=[ annotation( text="Example Annotation", value=1970, axis_location="x", ), ], ) display(fig)
- visualization_toolkit.helpers.plotly.shade_x(start, end, color='#b8b8b8')[source]#
Add a shaded rectangle region that spans vertically on the chart between the
startandendvalues on the x-axis.- Parameters:
start (
int|float|date|datetime) – The starting value where the shaded region begins on the x-axis.end (
int|float|date|datetime) – The ending value where the shaded region ends on the x-axis.color (
str, default:'#b8b8b8') – Fill color of the shaded region. Defaults to light-grey.
- Return type:
None
Examples#
Example of creating a chart with shaded vertical areas. Note that the start and end values correspond to the x-axis.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation, shade_x fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="lifeExp", category_column_name="country", location="y1", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), shaded_regions=[ shade_x(start=1950, end=1955, color="green"), shade_x(start=1970, end=1975, color="red"), ], ) display(fig)
- visualization_toolkit.helpers.plotly.shade_y(start, end, color='#b8b8b8', location='y1')[source]#
Add a shaded rectangle region that spans horizontally on the chart between the
startandendvalues on the y-axis.- Parameters:
start (
int|float|date|datetime) – The starting value where the shaded region begins on the y-axis.end (
int|float|date|datetime) – The ending value where the shaded region ends on the y-axis.color (
str, default:'#b8b8b8') – Fill color of the shaded region. Defaults to light-grey.location (Literal['y1', 'y2'])
- Return type:
None
Examples#
Example of creating a chart with shaded horizontal areas. Note that the start and end values correspond to the y-axis.#from visualization_toolkit.helpers.plotly import chart, axis, series, annotation, shade_y fig = chart( df, x_axis=axis( column_name="year", label="Year", ), chart_series=[ series( column_name="lifeExp", category_column_name="country", location="y1", ), ], y1_axis=axis( label="Life Expectancy", axis_type="number", ), shaded_regions=[ shade_y(start=50, end=55, color="green"), shade_y(start=70, end=75, color="red"), ], ) display(fig)
Advanced Charting Functions#
Specific functions to create custom chart types such as waterfall charts or heatmaps
- visualization_toolkit.helpers.plotly.heatmap_chart(df, category_column_name, z_axis, theme=YD_CLASSIC_THEME, colorscale=POSITIVE_COLORSCALE)[source]#
Create a heatmap visualization to compare two sets of categories in a table-like view.
Cells in the heatmap are colored based on their value with respect to the
colorscaleprovided. Cell values are formatted based on thez_axisprovided. Acategory_column_nameindicates which column contains one set of categories to compare. It is assumed all other columns represent the opposite set of categories to be compared.The data must have one categorical column with all other columns each representing a category to be compared. Values within each category pair will appear in the heatmap as a cell.
- Parameters:
df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. Only one column type can include categorical values while all other columns must be numerical in nature.
category_column_name (str) – Name of the categorical column in the
dfto plot on one axis of the heatmap.z_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – An
axisobject that defines how values within the heatmap should be formatted.theme (dict) – Optionally changes the theme used to format the chart. Defaults to the
YD_CLASSIC_THEMEif not set.colorscale (list[list[float, str]]) – Optionally changes the colorscale used to format heatmap cells. Defaults to
POSITIVE_COLORSCALEof various shades of blue.
- Returns:
Examples#
Example of creating a heatmap chart. Notice thez_axisis used to format cells in the heatmap with theaxis_typeargument.#from visualization_toolkit.helpers.plotly import heatmap_chart, axis, series fig = heatmap_chart( pdf, category_column_name="fiscal_qy", z_axis=axis(label="Downloads", axis_type="number"), ) display(fig)
- visualization_toolkit.helpers.plotly.waterfall_chart(df, y1_axis, totals=None, additions=None, subtractions=None, theme=YD_CLASSIC_THEME)[source]#
Create a Waterfall chart that can summarize the effects of multiple positive or negative adjustments to a starting and ending balance. The chart will automatically style
additionsandsubtractionsseries so that the starting value fromtotalsequals the ending value oftotals.Data must be provided as a single row where the columns indicate the starting and ending totals along with any adjustments that are to be plotted.
- Parameters:
df (pyspark.sql.DataFrame | pandas.DataFrame | list[dict[str, Any]]) – Input data to visualize. The columns from this data will be used for the
totals,additions, andsubtractionsseries and they1_axis.y1_axis (visualization_toolkit.helpers.plotly.charts.core.axis.Axis) – The
axisthat the balances are plotted ontotals (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Two
seriescontaining the starting and ending balance for the data. The order matters so pass in the totals in the correct order.additions (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Any number of
seriesindicating positive adjustments to the starting balance.subtractions (list[visualization_toolkit.helpers.plotly.charts.core.series.Series]) – Any number of
seriesindicating negative adjustments to the starting balance.theme (dict) – Optionally changes the theme used to format the chart. Defaults to the
YD_CLASSIC_THEMEif not set.
- Returns:
- Return type:
plotly.graph_objs.Figure
Examples#
Example of creating a waterfall chart. Notice the shape of data that is used. It is acceptable to pass data as a dataframe, pandas dataframe, or a single-item list of dictionaries.#from visualization_toolkit.helpers.plotly import waterfall_chart, axis, series data = [ { "prior_period": 186994, "expansion": 36691, "contraction": -36489, "churn": -40530, "ending_period": 146656, } ] fig = waterfall_chart( data, y1_axis=axis(label="Spend", axis_type="currency"), totals=[ series( column_name="prior_period", label="Prior Period", ), series( column_name="ending_period", label="Ending Period", ), ], additions=[ series( column_name="expansion", label="Expansion", ), ], subtractions=[ series( column_name="contraction", label="Contraction", ), series( column_name="churn", label="Churn", ), ], ) display(fig)
Plotly Base Functions#
General utilities to build custom visualizations with plotly are found here. These include lower-level functions that chart will build upon.
- visualization_toolkit.helpers.plotly.generate_base_figure(height=None, theme=None)[source]#
Returns an empty Plotly figure that is pre-configured to match the YD branding and style. Traces added to this figure will generally respect the branding, but additional configuration may be needed.
- Parameters:
height (
int, default:None) – Optionally pass the starting height of the figure. Will default to standard layout height if not specified.theme (
Optional[dict], default:None) – Optionally set a standard theme fromvisualization_toolkit. Defaults to the classic YipitData report theme
- Return type:
Figure
Examples#
Common example of creating multiple series on a chart. Notice that each series corresponds to a different column on the dataset.#from visualization_toolkit.helpers.plotly import generate_base_figure # Generate figure and then add traces, shapes, etc. to the figure via plotly to create a visualization fig = generate_base_figure() fig.add_trace(...)
Use theATLAS_THEMEto customize styles for interactive visualizations on the portal.#from visualization_toolkit.helpers.plotly import generate_base_figure from visualization_toolkit.helpers.plotly.theme import ATLAS_THEME # Generate figure and then add traces, shapes, etc. to the figure via plotly to create a visualization fig = generate_base_figure(theme=ATLAS_THEME) fig.add_trace(...)
- visualization_toolkit.helpers.plotly.highlight_trace(fig, trace_index, selected_opacity=1, unselected_opacity=0.4)[source]#
Highlight a trace in a plotly figure. This works by setting the opacity of the trace to 1 and the other traces to 0.4.
- Parameters:
fig (
Figure) – Plotly figure to highlight a trace intrace_index (
int) – Index of the trace to highlightselected_opacity (
float, default:1) – Opacity of the selected traceunselected_opacity (
float, default:0.4) – Opacity of the unselected traces
- Returns:
Examples#
- visualization_toolkit.helpers.plotly.disable_hover_labels(fig)[source]#
Disable hover labels for a plotly figure. This works by removing or hiding hover labels, spikelines, and hovertext. The hoverData parameter will still trigger dash callbacks.
Warning
When disabling hover labels, you will also need to set the following css attributes on a parent dash element:
& .hovertext { fill-opacity: 0; stroke-opacity: 0; }
- Parameters:
fig (
Figure) – Plotly figure to disable hover labels for- Returns:
Examples#
Disable plotly hover labels for a figure#from visualization_toolkit.helpers.plotly import generate_pdf_from_figure, chart # Assume a chart is created with series fig = chart(...) # Disable hover labels for the figure disable_hover_labels(fig) display(fig)