bar_chart_race API documentation

Package bar_chart_race

Expand source code
from ._make_chart import bar_chart_race, load_dataset, prepare_wide_data, prepare_long_data
__version__ = "0.0.7"

all = ['bar_chart_race', 'load_dataset', 'prepare_wide_data', 'prepare_long_data']

Functions

def bar_chart_race(df, filename=None, orientation='h', sort='desc', n_bars=None, fixed_order=False, fixed_max=False, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=0.95, period_label=True, period_fmt=None, period_summary_func=None, perpendicular_bar_func=None, figsize=(6, 3.5), cmap='dark24', title=None, title_size=None, bar_label_size=7, tick_label_size=7, shared_fontdict=None, scale='linear', writer=None, fig=None, dpi=144, bar_kwargs=None, filter_column_colors=False)

Create an animated bar chart race using matplotlib. Data must be in 'wide' format where each row represents a single time period and each column represents a distinct category. Optionally, the index can label the time period.

Bar height and location change linearly from one time period to the next.

If no filename is given, an HTML string is returned, otherwise the animation is saved to disk.

You must have ffmpeg installed on your machine to save files to disk. Get ffmpeg here: https://www.ffmpeg.org/download.html

To save .gif files you'll need to install ImageMagick.

This is resource intensive - Start with just a few rows of data to test.

Parameters

df : pandas DataFrame
Must be a 'wide' DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period. The index can be of any type.
filename : None or str, default None
If None return animation as an HTML5 string. If a string, save animation to that filename location. Use .mp4, .gif, .html, .mpeg, .mov and any other extensions supported by ffmpeg or ImageMagick.
orientation : 'h' or 'v', default 'h'
Bar orientation - horizontal or vertical
sort : 'desc' or 'asc', default 'desc'
Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars : int, default None
Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the edge of the axes.
fixed_order : bool or list, default False
When False, bar order changes every time period to correspond with sort. When True, bars remained fixed according to their final value corresponding with sort. Otherwise, provide a list of the exact order of the categories for the entire duration.
fixed_max : bool, default False

Whether to fix the maximum value of the axis containing the values. When False, the axis for the values will have its maximum (xlim/ylim) just after the largest bar of the current time period. The axis maximum will change along with the data.

When True, the maximum axis value will remain constant for the duration of the animation. For example, in a horizontal bar chart, if the largest bar has a value of 100 for the first time period and 10,000 for the last time period. The xlim maximum will be 10,000 for each frame.

steps_per_period : int, default 10
The number of steps to go from one time period to the next. The bars will grow linearly between each period.
period_length : int, default 500
Number of milliseconds to animate each period (row). Default is 500ms (half of a second)
interpolate_period : bool, default False
Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to True, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with steps_per_period set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00
label_bars : bool, default True
Whether to label the bars with their value on their right
bar_size : float, default .95
Height/width of bars for horizontal/vertical bar charts. Use a number between 0 and 1 Represents the fraction of space that each bar takes up. When equal to 1, no gap remains between the bars.
period_label : bool or dict, default True

If True or dict, use the index as a large text label on the axes whose value changes

Use a dictionary to supply the exact position of the period along with any valid parameters of the matplotlib text method. At a minimum, you must supply both 'x' and 'y' in axes coordinates

Example: { 'x': .99, 'y': .8, 'ha': 'right', 'va': 'center' }

If False - don't place label on axes

period_fmt : str, default None

Either a string with date directives or a new-style (Python 3.6+) formatted string

For a string with a date directive, find the complete list here https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

Example of string with date directives '%B %d, %Y' Will change 2020/03/29 to March 29, 2020

For new-style formatted string. Use curly braces and the variable x, which will be passed the current period's index value. Example: 'Period {x:10.2f}'

Date directives will only be used for datetime indexes.

period_summary_func : function, default None

Custom text added to the axes each period. Create a user-defined function that accepts two pandas Series of the current time period's values and ranks. It must return a dictionary containing at a minimum the keys "x", "y", and "s" which will be passed to the matplotlib text method.

Example: def func(values, ranks): total = values.sum() s = f'Worldwide deaths: {total}' return {'x': .85, 'y': .2, 's': s, 'ha': 'right', 'size': 11}

perpendicular_bar_func : function or str, default None

Creates a single bar perpendicular to the main bars that spans the length of the axis.

Use either a string that the DataFrame agg method understands or a user-defined function.

DataFrame strings - 'mean', 'median', 'max', 'min', etc..

The function is passed two pandas Series of the current time period's data and ranks. It must return a single value.

def func(values, ranks): return values.quantile(.75)

figsize : two-item tuple of numbers, default (6, 3.5)
matplotlib figure size in inches. Will be overridden if figure supplied to fig.
cmap : str, matplotlib colormap instance, or list of colors, default 'dark24'
Colors to be used for the bars. All matplotlib and plotly colormaps are available by string name. Colors will repeat if there are more bars than colors.
title : str, default None
Title of plot
title_size : number or str, default plt.rcParams['axes.titlesize']
Size in points of title or relative size str. See Font Help below.
bar_label_size : number or str, default 7
Size in points or relative size str of numeric labels just outside of the bars. See Font Help below.
tick_label_size : number or str, default 7
Size in points of tick labels. See Font Help below. See Font Help below
shared_fontdict : dict, default None

Dictionary of font properties shared across the tick labels, bar labels, period labels, and title. The only property not shared is size. It will be ignored if you try to set it.

Possible keys are: 'family', 'weight', 'color', 'style', 'stretch', 'weight', 'variant' Here is an example dictionary: { 'family' : 'Helvetica', 'weight' : 'bold', 'color' : 'rebeccapurple' }

scale : 'linear' or 'log', default 'linear'
Type of scaling to use for the axis containing the values
writer : str or matplotlib Writer instance

This argument is passed to the matplotlib FuncAnimation.save method.

By default, the writer will be 'ffmpeg' unless creating a gif, then it will be 'imagemagick', or an html file, then it will be 'html'.

Find all of the availabe Writers:

from matplotlib import animation animation.writers.list()

You must have ffmpeg or ImageMagick installed in order

fig : matplotlib Figure, default None
For greater control over the aesthetics, supply your own figure.
dpi : int, default 144
Dots per Inch of the matplotlib figure
bar_kwargs : dict, default None</code> (alpha=.8)
Other keyword arguments (within a dictionary) forwarded to the matplotlib barh/bar function. If no value for 'alpha' is given, then it is set to .8 by default. Some examples: ec - edgecolor - color of edge of bar. Default is 'white' lw - width of edge in points. Default is 1.5 alpha - opacity of bars, 0 to 1
filter_column_colors : bool, default False

When setting n_bars, it's possible that some columns never appear in the animation. Regardless, all columns get assigned a color by default.

For instance, suppose you have 100 columns in your DataFrame, set n_bars to 10, and 15 different columns make at least one appearance in the animation. Even if your colormap has at least 15 colors, it's possible that many bars will be the same color, since each of the 100 columns is assigned of the colormaps colors.

Setting this to True will map your colormap to just those columns that make an appearance in the animation, helping avoid duplication of colors.

Setting this to True will also have the (possibly unintended) consequence of changing the colors of each color every time a new integer for n_bars is used.

EXPERIMENTAL This parameter is experimental and may be changed/removed in a later version.

Returns

When filename is left as None, an HTML5 video is returned as a string. Otherwise, a file of the animation is saved and None is returned.

Notes

It is possible for some bars to be out of order momentarily during a transition since both height and location change linearly and not directly with respect to their current value. This keeps all the transitions identical.

Examples

Use the load_data function to get an example dataset to create an animation.

df = bcr.load_dataset('covid19') bcr.bar_chart_race( df=df, filename='covid19_horiz_desc.mp4', orientation='h', sort='desc', n_bars=8, fixed_order=False, fixed_max=True, steps_per_period=10, period_length=500, interpolate_period=False, label_bars=True, bar_size=.95, period_label={'x': .99, 'y': .8, 'ha': 'right', 'va': 'center'}, period_fmt='%B %d, %Y', period_summary_func=lambda v, r: {'x': .85, 'y': .2, 's': f'Total deaths: {v.sum()}', 'ha': 'right', 'size': 11}, perpendicular_bar_func='median', figsize=(5, 3), dpi=144, cmap='dark24', title='COVID-19 Deaths by Country', title_size='', bar_label_size=7, tick_label_size=7, shared_fontdict={'family' : 'Helvetica', 'weight' : 'bold', 'color' : '.1'}, scale='linear', writer=None, fig=None, bar_kwargs={'alpha': .7}, filter_column_colors=False)

Font Help

Font size can also be a string - 'xx-small', 'x-small', 'small',
'medium', 'large', 'x-large', 'xx-large', 'smaller', 'larger' These sizes are relative to plt.rcParams['font.size'].

Expand source code
def bar_chart_race(df, filename=None, orientation='h', sort='desc', n_bars=None, 
                   fixed_order=False, fixed_max=False, steps_per_period=10, 
                   period_length=500, interpolate_period=False, label_bars=True, 
                   bar_size=.95, period_label=True, period_fmt=None, 
                   period_summary_func=None, perpendicular_bar_func=None, figsize=(6, 3.5),
                   cmap='dark24', title=None, title_size=None, bar_label_size=7, 
                   tick_label_size=7, shared_fontdict=None, scale='linear', writer=None, 
                   fig=None, dpi=144, bar_kwargs=None, filter_column_colors=False):
    '''
    Create an animated bar chart race using matplotlib. Data must be in 
    'wide' format where each row represents a single time period and each 
    column represents a distinct category. Optionally, the index can label 
    the time period.

Bar height and location change linearly from one time period to the next.

If no `filename` is given, an HTML string is returned, otherwise the 
animation is saved to disk.

You must have ffmpeg installed on your machine to save files to disk.
Get ffmpeg here: https://www.ffmpeg.org/download.html

To save .gif files you&#39;ll need to install ImageMagick.

This is resource intensive - Start with just a few rows of data to test.


Parameters
----------
df : pandas DataFrame
    Must be a &#39;wide&#39; DataFrame where each row represents a single period 
    of time. Each column contains the values of the bars for that 
    category. Optionally, use the index to label each time period.
    The index can be of any type.

filename : `None` or str, default None
    If `None` return animation as an HTML5 string.
    If a string, save animation to that filename location. 
    Use .mp4, .gif, .html, .mpeg, .mov and any other extensions supported
    by ffmpeg or ImageMagick.

orientation : &#39;h&#39; or &#39;v&#39;, default &#39;h&#39;
    Bar orientation - horizontal or vertical

sort : &#39;desc&#39; or &#39;asc&#39;, default &#39;desc&#39;
    Choose how to sort the bars. Use &#39;desc&#39; to put largest bars on top 
    and &#39;asc&#39; to place largest bars on bottom.

n_bars : int, default None
    Choose the maximum number of bars to display on the graph. 
    By default, use all bars. New bars entering the race will appear 
    from the edge of the axes.

fixed_order : bool or list, default False
    When `False`, bar order changes every time period to correspond 
    with `sort`. When `True`, bars remained fixed according to their 
    final value corresponding with `sort`. Otherwise, provide a list 
    of the exact order of the categories for the entire duration.

fixed_max : bool, default False
    Whether to fix the maximum value of the axis containing the values.
    When `False`, the axis for the values will have its maximum (xlim/ylim)
    just after the largest bar of the current time period. 
    The axis maximum will change along with the data.

    When True, the maximum axis value will remain constant for the 
    duration of the animation. For example, in a horizontal bar chart, 
    if the largest bar has a value of 100 for the first time period and 
    10,000 for the last time period. The xlim maximum will be 10,000 
    for each frame.

steps_per_period : int, default 10
    The number of steps to go from one time period to the next. 
    The bars will grow linearly between each period.

period_length : int, default 500
    Number of milliseconds to animate each period (row). 
    Default is 500ms (half of a second)

interpolate_period : bool, default `False`
    Whether to interpolate the period. Only valid for datetime or
    numeric indexes. When set to `True`, for example, 
    the two consecutive periods 2020-03-29 and 2020-03-30 with 
    `steps_per_period` set to 4 would yield a new index of
    2020-03-29 00:00:00
    2020-03-29 06:00:00
    2020-03-29 12:00:00
    2020-03-29 18:00:00
    2020-03-30 00:00:00

label_bars : bool, default `True`
    Whether to label the bars with their value on their right

bar_size : float, default .95
    Height/width of bars for horizontal/vertical bar charts. 
    Use a number between 0 and 1
    Represents the fraction of space that each bar takes up. 
    When equal to 1, no gap remains between the bars.

period_label : bool or dict, default `True`
    If `True` or dict, use the index as a large text label
    on the axes whose value changes

    Use a dictionary to supply the exact position of the period
    along with any valid parameters of the matplotlib `text` method.
    At a minimum, you must supply both &#39;x&#39; and &#39;y&#39; in axes coordinates

    Example:
    {
        &#39;x&#39;: .99,
        &#39;y&#39;: .8,
        &#39;ha&#39;: &#39;right&#39;,
        &#39;va&#39;: &#39;center&#39;
    }

    If `False` - don&#39;t place label on axes

period_fmt : str, default `None`
    Either a string with date directives or 
    a new-style (Python 3.6+) formatted string

    For a string with a date directive, find the complete list here
    https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes

    Example of string with date directives
        &#39;%B %d, %Y&#39;
    Will change 2020/03/29 to March 29, 2020

    For new-style formatted string. Use curly braces and the variable `x`, 
    which will be passed the current period&#39;s index value.
    Example:
        &#39;Period {x:10.2f}&#39;

    Date directives will only be used for datetime indexes.

period_summary_func : function, default None
    Custom text added to the axes each period.
    Create a user-defined function that accepts two pandas Series of the 
    current time period&#39;s values and ranks. It must return a dictionary 
    containing at a minimum the keys &#34;x&#34;, &#34;y&#34;, and &#34;s&#34; which will be 
    passed to the matplotlib `text` method.

    Example:
    def func(values, ranks):
        total = values.sum()
        s = f&#39;Worldwide deaths: {total}&#39;
        return {&#39;x&#39;: .85, &#39;y&#39;: .2, &#39;s&#39;: s, &#39;ha&#39;: &#39;right&#39;, &#39;size&#39;: 11}

perpendicular_bar_func : function or str, default None
    Creates a single bar perpendicular to the main bars that spans the 
    length of the axis.

    Use either a string that the DataFrame `agg` method understands or a 
    user-defined function.

    DataFrame strings - &#39;mean&#39;, &#39;median&#39;, &#39;max&#39;, &#39;min&#39;, etc..

    The function is passed two pandas Series of the current time period&#39;s
    data and ranks. It must return a single value.

    def func(values, ranks):
        return values.quantile(.75)

figsize : two-item tuple of numbers, default (6, 3.5)
    matplotlib figure size in inches. Will be overridden if figure 
    supplied to `fig`.

cmap : str, matplotlib colormap instance, or list of colors, default &#39;dark24&#39;
    Colors to be used for the bars. All matplotlib and plotly colormaps are 
    available by string name. Colors will repeat if there are more bars than colors.

title : str, default None
    Title of plot

title_size : number or str, default plt.rcParams[&#39;axes.titlesize&#39;]
    Size in points of title or relative size str. See Font Help below.

bar_label_size : number or str, default 7
    Size in points or relative size str of numeric labels 
    just outside of the bars. See Font Help below.

tick_label_size : number or str, default 7
    Size in points of tick labels. See Font Help below. 
    See Font Help below

shared_fontdict : dict, default None
    Dictionary of font properties shared across the tick labels, 
    bar labels, period labels, and title. The only property not shared 
    is `size`. It will be ignored if you try to set it.

    Possible keys are:
        &#39;family&#39;, &#39;weight&#39;, &#39;color&#39;, &#39;style&#39;, &#39;stretch&#39;, &#39;weight&#39;, &#39;variant&#39;
    Here is an example dictionary:
    {
        &#39;family&#39; : &#39;Helvetica&#39;,
        &#39;weight&#39; : &#39;bold&#39;,
        &#39;color&#39; : &#39;rebeccapurple&#39;
    }

scale : &#39;linear&#39; or &#39;log&#39;, default &#39;linear&#39;
    Type of scaling to use for the axis containing the values

writer : str or matplotlib Writer instance
    This argument is passed to the matplotlib FuncAnimation.save method.

    By default, the writer will be &#39;ffmpeg&#39; unless creating a gif,
    then it will be &#39;imagemagick&#39;, or an html file, then it 
    will be &#39;html&#39;.

    Find all of the availabe Writers:
    >>> from matplotlib import animation
    >>> animation.writers.list()

    You must have ffmpeg or ImageMagick installed in order

fig : matplotlib Figure, default None
    For greater control over the aesthetics, supply your own figure.

dpi : int, default 144
    Dots per Inch of the matplotlib figure

bar_kwargs : dict, default `None` (alpha=.8)
    Other keyword arguments (within a dictionary) forwarded to the 
    matplotlib `barh`/`bar` function. If no value for &#39;alpha&#39; is given,
    then it is set to .8 by default.
    Some examples:
        `ec` - edgecolor - color of edge of bar. Default is &#39;white&#39;
        `lw` - width of edge in points. Default is 1.5
        `alpha` - opacity of bars, 0 to 1

filter_column_colors : bool, default `False`
    When setting n_bars, it&#39;s possible that some columns never 
    appear in the animation. Regardless, all columns get assigned
    a color by default.

    For instance, suppose you have 100 columns 
    in your DataFrame, set n_bars to 10, and 15 different columns 
    make at least one appearance in the animation. Even if your 
    colormap has at least 15 colors, it&#39;s possible that many 
    bars will be the same color, since each of the 100 columns is
    assigned of the colormaps colors.

    Setting this to `True` will map your colormap to just those 
    columns that make an appearance in the animation, helping
    avoid duplication of colors.

    Setting this to `True` will also have the (possibly unintended)
    consequence of changing the colors of each color every time a 
    new integer for n_bars is used.

    EXPERIMENTAL
    This parameter is experimental and may be changed/removed
    in a later version.


Returns
-------
When `filename` is left as `None`, an HTML5 video is returned as a string.
Otherwise, a file of the animation is saved and `None` is returned.

Notes
-----
It is possible for some bars to be out of order momentarily during a 
transition since both height and location change linearly and not 
directly with respect to their current value. This keeps all the 
transitions identical.

Examples
--------
Use the `load_data` function to get an example dataset to 
create an animation.

df = bcr.load_dataset(&#39;covid19&#39;)
bcr.bar_chart_race(
    df=df, 
    filename=&#39;covid19_horiz_desc.mp4&#39;, 
    orientation=&#39;h&#39;, 
    sort=&#39;desc&#39;, 
    n_bars=8, 
    fixed_order=False, 
    fixed_max=True, 
    steps_per_period=10, 
    period_length=500, 
    interpolate_period=False, 
    label_bars=True, 
    bar_size=.95, 
    period_label={&#39;x&#39;: .99, &#39;y&#39;: .8, &#39;ha&#39;: &#39;right&#39;, &#39;va&#39;: &#39;center&#39;}, 
    period_fmt=&#39;%B %d, %Y&#39;, 
    period_summary_func=lambda v, r: {&#39;x&#39;: .85, &#39;y&#39;: .2, 
                                      &#39;s&#39;: f&#39;Total deaths: {v.sum()}&#39;, 
                                      &#39;ha&#39;: &#39;right&#39;, &#39;size&#39;: 11}, 
    perpendicular_bar_func=&#39;median&#39;, 
    figsize=(5, 3), 
    dpi=144,
    cmap=&#39;dark24&#39;, 
    title=&#39;COVID-19 Deaths by Country&#39;, 
    title_size=&#39;&#39;, 
    bar_label_size=7, 
    tick_label_size=7, 
    shared_fontdict={&#39;family&#39; : &#39;Helvetica&#39;, &#39;weight&#39; : &#39;bold&#39;, &#39;color&#39; : &#39;.1&#39;}, 
    scale=&#39;linear&#39;, 
    writer=None, 
    fig=None, 
    bar_kwargs={&#39;alpha&#39;: .7},
    filter_column_colors=False)

Font Help
---------
Font size can also be a string - &#39;xx-small&#39;, &#39;x-small&#39;, &#39;small&#39;,  
    &#39;medium&#39;, &#39;large&#39;, &#39;x-large&#39;, &#39;xx-large&#39;, &#39;smaller&#39;, &#39;larger&#39;
These sizes are relative to plt.rcParams[&#39;font.size&#39;].
&#39;&#39;&#39;
bcr = _BarChartRace(df, filename, orientation, sort, n_bars, fixed_order, fixed_max,
                    steps_per_period, period_length, interpolate_period, label_bars, bar_size, 
                    period_label, period_fmt, period_summary_func, perpendicular_bar_func, 
                    figsize, cmap, title, title_size, bar_label_size, tick_label_size, 
                    shared_fontdict, scale, writer, fig, dpi, bar_kwargs, filter_column_colors)
return bcr.make_animation()</code></pre>

def load_dataset(name='covid19')

Return a pandas DataFrame suitable for immediate use in bar_chart_race(). Must be connected to the internet

Parameters

name : str, default 'covid19'
Name of dataset to load. Either 'covid19' or 'urban_pop'

Returns

pandas DataFrame
 
Expand source code
def load_dataset(name='covid19'):
    '''
    Return a pandas DataFrame suitable for immediate use in bar_chart_race.
    Must be connected to the internet

Parameters
----------
name : str, default &#39;covid19&#39;
    Name of dataset to load. Either &#39;covid19&#39; or &#39;urban_pop&#39;

Returns
-------
pandas DataFrame
&#39;&#39;&#39;
url = f&#39;https://raw.githubusercontent.com/dexplo/bar_chart_race/master/data/{name}.csv&#39;

index_dict = {&#39;covid19_tutorial&#39;: &#39;date&#39;,
            &#39;covid19&#39;: &#39;date&#39;,
             &#39;urban_pop&#39;: &#39;year&#39;}
index_col = index_dict[name]
return pd.read_csv(url, index_col=index_col, parse_dates=[index_col])</code></pre>

def prepare_long_data(df, index, columns, values, aggfunc='sum', orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True)

Prepares 'long' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks

You (currently) cannot pass long data to bar_chart_race() directly. Use this function to create your wide data first before passing it to bar_chart_race().

Parameters

df : pandas DataFrame

Must be a 'long' pandas DataFrame where one column contains the period, another the categories, and the third the values of each category for each period.

This DataFrame will be passed to the pivot_table method to turn it into a wide DataFrame. It will then be passed to the prepare_wide_data() function.

index : str
Name of column used for the time period. It will be placed in the index
columns : str
Name of column containing the categories for each time period. This column will get pivoted so that each unique value is a column.
values : str
Name of column holding the values for each time period of each category. This column will become the values of the resulting DataFrame
aggfunc : str or aggregation function, default 'sum'
String name of aggregation function ('sum', 'min', 'mean', 'max, etc…) or actual function (np.sum, np.min, etc…). Categories that have multiple values for the same time period must be aggregated for the animation to work.
orientation : 'h' or 'v', default 'h'
Bar orientation - horizontal or vertical
sort : 'desc' or 'asc', default 'desc'
Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars : int, default None
Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top.
interpolate_period : bool, default False
Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to True, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with steps_per_period set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00
steps_per_period : int, default 10
The number of steps to go from one time period to the next. The bars will grow linearly between each period.
compute_ranks : bool, default True
When True return both the interpolated values and ranks DataFrames Otherwise just return the values

Returns

A tuple of DataFrames. The first is the interpolated values and the second
 

is the interpolated ranks.

Examples

df_values, df_ranks = bcr.prepare_long_data(df) bcr.bar_chart_race(df_values, steps_per_period=1, period_length=50)

Expand source code
def prepare_long_data(df, index, columns, values, aggfunc='sum', orientation='h', 
                      sort='desc', n_bars=None, interpolate_period=False, 
                      steps_per_period=10, compute_ranks=True):
    '''
    Prepares 'long' data for bar chart animation. 
    Returns two DataFrames - the interpolated values and the interpolated ranks

You (currently) cannot pass long data to `bar_chart_race` directly. Use this function
to create your wide data first before passing it to `bar_chart_race`.

Parameters
----------
df : pandas DataFrame
    Must be a &#39;long&#39; pandas DataFrame where one column contains 
    the period, another the categories, and the third the values 
    of each category for each period.

    This DataFrame will be passed to the `pivot_table` method to turn 
    it into a wide DataFrame. It will then be passed to the 
    `prepare_wide_data` function.

index : str
    Name of column used for the time period. It will be placed in the index

columns : str
    Name of column containing the categories for each time period. This column
    will get pivoted so that each unique value is a column.

values : str
    Name of column holding the values for each time period of each category.
    This column will become the values of the resulting DataFrame

aggfunc : str or aggregation function, default &#39;sum&#39;
    String name of aggregation function (&#39;sum&#39;, &#39;min&#39;, &#39;mean&#39;, &#39;max, etc...) 
    or actual function (np.sum, np.min, etc...). 
    Categories that have multiple values for the same time period must be 
    aggregated for the animation to work.

orientation : &#39;h&#39; or &#39;v&#39;, default &#39;h&#39;
    Bar orientation - horizontal or vertical

sort : &#39;desc&#39; or &#39;asc&#39;, default &#39;desc&#39;
    Choose how to sort the bars. Use &#39;desc&#39; to put largest bars on 
    top and &#39;asc&#39; to place largest bars on bottom.

n_bars : int, default None
    Choose the maximum number of bars to display on the graph.
    By default, use all bars. New bars entering the race will 
    appear from the bottom or top.

interpolate_period : bool, default `False`
    Whether to interpolate the period. Only valid for datetime or
    numeric indexes. When set to `True`, for example, 
    the two consecutive periods 2020-03-29 and 2020-03-30 with 
    `steps_per_period` set to 4 would yield a new index of
    2020-03-29 00:00:00
    2020-03-29 06:00:00
    2020-03-29 12:00:00
    2020-03-29 18:00:00
    2020-03-30 00:00:00

steps_per_period : int, default 10
    The number of steps to go from one time period to the next. 
    The bars will grow linearly between each period.

compute_ranks : bool, default True
    When `True` return both the interpolated values and ranks DataFrames
    Otherwise just return the values

Returns
-------
A tuple of DataFrames. The first is the interpolated values and the second
is the interpolated ranks.

Examples
--------
df_values, df_ranks = bcr.prepare_long_data(df)
bcr.bar_chart_race(df_values, steps_per_period=1, period_length=50)
&#39;&#39;&#39;
df_wide = df.pivot_table(index=index, columns=columns, values=values, aggfunc=aggfunc)
return prepare_wide_data(df_wide, orientation, sort, n_bars, interpolate_period,
                         steps_per_period, compute_ranks)</code></pre>

def prepare_wide_data(df, orientation='h', sort='desc', n_bars=None, interpolate_period=False, steps_per_period=10, compute_ranks=True)

Prepares 'wide' data for bar chart animation. Returns two DataFrames - the interpolated values and the interpolated ranks

There is no need to use this function directly to create the animation. You can pass your DataFrame directly to bar_chart_race().

This function is useful if you want to view the prepared data without creating an animation.

Parameters

df : pandas DataFrame
Must be a 'wide' pandas DataFrame where each row represents a single period of time. Each column contains the values of the bars for that category. Optionally, use the index to label each time period.
orientation : 'h' or 'v', default 'h'
Bar orientation - horizontal or vertical
sort : 'desc' or 'asc', default 'desc'
Choose how to sort the bars. Use 'desc' to put largest bars on top and 'asc' to place largest bars on bottom.
n_bars : int, default None
Choose the maximum number of bars to display on the graph. By default, use all bars. New bars entering the race will appear from the bottom or top.
interpolate_period : bool, default False
Whether to interpolate the period. Only valid for datetime or numeric indexes. When set to True, for example, the two consecutive periods 2020-03-29 and 2020-03-30 with steps_per_period set to 4 would yield a new index of 2020-03-29 00:00:00 2020-03-29 06:00:00 2020-03-29 12:00:00 2020-03-29 18:00:00 2020-03-30 00:00:00
steps_per_period : int, default 10
The number of steps to go from one time period to the next. The bars will grow linearly between each period.
compute_ranks : bool, default True
When True return both the interpolated values and ranks DataFrames Otherwise just return the values

Returns

A tuple of DataFrames. The first is the interpolated values and the second
 

is the interpolated ranks.

Examples

df_values, df_ranks = bcr.prepare_wide_data(df)

Expand source code
def prepare_wide_data(df, orientation='h', sort='desc', n_bars=None, interpolate_period=False, 
                      steps_per_period=10, compute_ranks=True):
    '''
    Prepares 'wide' data for bar chart animation. 
    Returns two DataFrames - the interpolated values and the interpolated ranks

There is no need to use this function directly to create the animation. 
You can pass your DataFrame directly to `bar_chart_race`.

This function is useful if you want to view the prepared data without 
creating an animation.

Parameters
----------
df : pandas DataFrame
    Must be a &#39;wide&#39; pandas DataFrame where each row represents a 
    single period of time. 
    Each column contains the values of the bars for that category. 
    Optionally, use the index to label each time period.

orientation : &#39;h&#39; or &#39;v&#39;, default &#39;h&#39;
    Bar orientation - horizontal or vertical

sort : &#39;desc&#39; or &#39;asc&#39;, default &#39;desc&#39;
    Choose how to sort the bars. Use &#39;desc&#39; to put largest bars on 
    top and &#39;asc&#39; to place largest bars on bottom.

n_bars : int, default None
    Choose the maximum number of bars to display on the graph.
    By default, use all bars. New bars entering the race will 
    appear from the bottom or top.

interpolate_period : bool, default `False`
    Whether to interpolate the period. Only valid for datetime or
    numeric indexes. When set to `True`, for example, 
    the two consecutive periods 2020-03-29 and 2020-03-30 with 
    `steps_per_period` set to 4 would yield a new index of
    2020-03-29 00:00:00
    2020-03-29 06:00:00
    2020-03-29 12:00:00
    2020-03-29 18:00:00
    2020-03-30 00:00:00

steps_per_period : int, default 10
    The number of steps to go from one time period to the next. 
    The bars will grow linearly between each period.

compute_ranks : bool, default True
    When `True` return both the interpolated values and ranks DataFrames
    Otherwise just return the values

Returns
-------
A tuple of DataFrames. The first is the interpolated values and the second
is the interpolated ranks.

Examples
--------
df_values, df_ranks = bcr.prepare_wide_data(df)
&#39;&#39;&#39;
if n_bars is None:
    n_bars = df.shape[1]

df_values = df.reset_index()
df_values.index = df_values.index * steps_per_period
new_index = range(df_values.index[-1] + 1)
df_values = df_values.reindex(new_index)
if interpolate_period:
    if df_values.iloc[:, 0].dtype.kind == &#39;M&#39;:
        first, last = df_values.iloc[[0, -1], 0]
        dr = pd.date_range(first, last, periods=len(df_values))
        df_values.iloc[:, 0] = dr
    else:
        df_values.iloc[:, 0] = df_values.iloc[:, 0].interpolate()
else:
    df_values.iloc[:, 0] = df_values.iloc[:, 0].fillna(method=&#39;ffill&#39;)

df_values = df_values.set_index(df_values.columns[0])
if compute_ranks:
    df_ranks = df_values.rank(axis=1, method=&#39;first&#39;, ascending=False).clip(upper=n_bars + 1)
    if (sort == &#39;desc&#39; and orientation == &#39;h&#39;) or (sort == &#39;asc&#39; and orientation == &#39;v&#39;):
        df_ranks = n_bars + 1 - df_ranks
    df_ranks = df_ranks.interpolate()

df_values = df_values.interpolate()
if compute_ranks:
    return df_values, df_ranks
return df_values</code></pre>