The reason I am using Altair for most of my visualization in Python

home · about · subscribe

May 04, 2019 · Untitled

Sadly, in Python, we do not have a ggplot2.

Python’s go to visualization library, matplotlib, is very powerfulmatplotlib recently came into the spotlight again for being attributed the first black hole image.

but has severe limitations. At times its flexibility is a blessing, but it is easy to get frustrated adding a small feature to your graph. Also, matplotlib dual object oriented and state-based interface is confusing. I still don’t completely grasp it even though I have been using matplotlib for years. Lastly, it is not easy to make interactive charts.

Altair and the grammar of graphics

Enter Altair. Altair is a wrapper for Vega-Lite, a JavaScript high-level visualization library. One of Vega-LiteIn the rest of the article, I will mainly refer to Altair, but Vega-Lite deserves as much (or more) credit.

most important features is that its API is based in the grammar of graphics.

Grammar of graphics may sound like an abstract feature, but it is the main difference between Altair and other Python visualization libraries. Altair matches the way we reason about visualizing data.

Altair only needs three main parameters:

Based on these Altair will pick sensible defaults to display your data.

My favorite example of Altair’s sensibility is how it chooses colors. If you tell Altair to color a quantitative variable then it will use a continuous color scale (light blue, blue, dark blue). If you tell Altair to color a categorical variableVega-Lite has two types of categorical data: nominal and ordinal. Nominal are categories where the order doesn’t have meaning. For example, the continents which are Europe, Asia, Africa, America, and Oceania (for me America is a continent, not the USA). Ordinal are categories where the order has meaning. For example, an Amazon review can be one, two, three, four or five stars.

then it will use a different color for each category (red, yellow, blue).

Let’s see a concrete example:

I made up 6 countries and population numbers. The data looks like this:

country_id population income
1 1 50
2 100 50
3 200 200
4 300 300
5 400 300
6 500 450

We will first plot the population data for each country:

Does this coloring makes sense?
Does this coloring makes sense?

Altair picked a continuous color scale. That doesn’t make sense! The problem is that we defined the country_id as a quantitative variable, but it is really a categorical one.

This makes more sense! Each country should be represented by its own distinctive color!
This makes more sense! Each country should be represented by its own distinctive color!

We only changed the encoding of the variable country_id. Instead of using Q (Quantitative) we use N (Nominal). That’s enough for Altair to know that it shouldn’t use a continuous color scale.

Extending your graphs

Another beauty of Altair than usually you easily build-up from an existing graph. For example, let’s say that now we want to add income to our graph. We simply tell Altair to map the y-axis to income:

Want to add tooltips? One line is all you need:

Is that all?

At first, I was skeptical of using a wrapper of another library as my main visualization tool. Wrappers are often a bad idea. For example, there are many wrappers for ggplot2 that haven’t been widely adopted by the Python community. It is hard to create one that is feature complete and up to date. But Altair is different:

Gif showing Altair interactivity
Gif showing Altair interactivity
Combination of line, circle, and text marks. The output can easily be made interactive.
Combination of line, circle, and text marks. The output can easily be made interactive.

Altair main disadvantages

If this got you excited (or at least curious) I highly recommend Altair’s documentation. It is a concise and clear place to start. Don’t forget to check out the example gallery and the details of Altair internals.

Thanks to Ilya Altshteyn for comments on an earlier version.

Fernando Irarrázaval

Copyright, 2019