Data Science Workshops

← All posts by Jeroen

Heuristics for Translating Ggplot2 Code to Plotnine Code

Jeroen Janssens
Dec 13, 2019 • 4 min read

Because ggplot2 is the de-facto package for creating high-quality data visualizations in R, and has been for a long time, there exists many excellent resources for learning ggplot2, including:

Two days ago, I published the tutorial Plotnine: Grammar of Graphics for Python, which is a translation of the visualization chapters from “R for Data Science” to Python using plotnine and pandas. plotnine code is bound to be different from ggplot2 code, due to Python and R having different syntax and mechanics. Moreover, since plotnine is still young (but actively being developed) some features are not yet implemented.

Does that mean we cannot make use of the above-mentioned resources? Of course not! First of all, the underlying grammar of graphics is still the same. Secondly, when it comes to the syntax, you can easily translate 95% of ggplot2 code to plotnine code if you take into account the heuristics listed below. But first, an example.

An example

This R and ggplot2 code:

library(ggplot2)

ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE, method = "lm") +
guides(colour = guide_legend(override.aes = list(size = 4)))

Can be translated into the following Python and plotnine code:

from plotnine import *
from plotnine.data import mpg

ggplot(mpg, aes("displ", "hwy")) +\
geom_point(aes(colour="class")) +\
geom_smooth(se=False, method="lm") +\
guides(colour=guide_legend(override_aes={"size": 4}))

Simple replacements

  • Change boolean values, i.e., replace TRUE with True and FALSE with False.
  • Replace NULL with None.
  • Quote all column names, e.g., replace Species with "Species". Python unfortunately doesn’t have this thing called non-standard evaluation.
  • Remove spaces around equal signs, e.g., replace mapping = aes(...) with mapping=aes(...). Style is important.
  • Replace the assignment operator, i.e., <- with =.
  • Replace dots with underscores, e.g., replace show.legend with show_legend. In Python, names cannot contain dots.
  • Replace hjust and vjust with ha and va, respectively. This is inherited from matplotlib, which is used under the hood by plotnine.
  • If the code consists of multiple lines, add a continuation character, i.e., replace + with +\. Alternatively, wrap the entire expression in parentheses.

Miscellaneous

  • Quote inline expressions in its entirety, such as "factor(col)" and "col < 5".

  • Quote the facet specification in its entirety, such as facet_wrap("~ class") and facet_grid("drv ~ cyl").

  • To suppress labels you cannot use labels=None but you need to pass a list with as many empty strings as there are values. A helper function is useful here:

    def no_labels(values):
    return [""] * len(values)
  • To prevent text labels from overlapping in ggplot2, you would use geom_text_repel or geom_label_repel functions from the ggrepel package. In plotnine, you simply use geom_text or geom_label and specify the adjust_text argument. For example: geom_label(adjust_text={'expand_points': (1.5, 1.5), 'arrowprops': {'arrowstyle': '-'}}).

Features not yet implemented

  • Unlike with ggplot2, in plotnine you cannot assign literal values to your aesthetics; all values need to refer column names. For example, aes(color="blue") results in an error if blue is not a column in the DataFrame.
  • plotnine is currently missing the following functions: coord_quickmap() and coord_polar().
  • The function labs() does not support a subtitle or a caption.

Let me know if you think anything can be added to (or removed from!) this list of heuristics. Now go plot!

— Jeroen

About Jeroen

Jeroen Janssens, PhD, is a data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about helping and teaching others to do such things.

Since 2013, Jeroen runs Data Science Workshops, a training and coaching firm that organizes open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. Clients include Amazon, eHealth Africa, Schiphol Airport, The New York Times, and T-Mobile.

Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. He is the author of Data Science at the Command Line (O’Reilly Media, 2021). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.

He lives with his wife and two kids in Rotterdam, the Netherlands.
If you would like to know more about his services, fees, and availability, then please email Jeroen. You can also find him on Twitter, GitHub, and LinkedIn.

Read more...

Subscribe to my newsletter

Stay up-to-date about new workshops, upcoming events, and other news about myself and Data Science Workshops.