The greatest value of a picture is when it forces us to notice what we never expected to see.
Replace ‘picture’ in the above quote with ‘data visualization’ and it will still ring true; maybe even more so. To provide valuable insights to package developers is exactly what Rperform strives to do through it’s visualization functions.
If you are new to Rperform, consider going through it’s Github README once.
In a nutshell, Rperform is an R package that allows package developers to track and visualize quantitative performance metrics of their code, over time. It focuses on providing changes in a package’s performance metrics, related to runtime and memory usage, over different git versions and across git branches.
Visualizing package performance across two branches
As discussed in a previous blog post, data visualizations, UI and Travis-CI integration are the focal points of this year’s GSoC for Rperform.
One month into the GSoC period, most of the work that I have put in so far has been directed towards improving Rperform’s visualization capabilities. A developer can now compare and visualize performance of a package across two git branches. This can be done using the
plot_branchmetrics() function. Two key parameters which this function takes are branch1 and branch2. It’s assumed that branch1 (this might be your development branch) is to be merged into branch2 (this might be the master branch), or that branch1 originated from branch2. The relationship between such 2 branches is depicted visually in the below figure.
Following is an example from the Rperform wiki which depicts the usage of
plot_branchmetrics() on the package, stringr:
## Warning: Always set the current directory to be the root directory of the package to be tested.
Rperform::plot_branchmetrics(test_path = "tests/testthat/test-interp.r", metric = "memory", branch1 = "rperform_test", branch2 = "master", save_data = F, save_plots = F)
The commit on the left-hand side (LHS) of the vertical line in the above plot is the latest commit from the branch provided as the parameter, branch2. The right-hand side (RHS) contains the commits from the branch provided as the parameter, branch1. The commits on the RHS run from branch1‘s latest commit until the first commit common to branch2.
To know more about how to visualize your package’s performance, check out this Github Wiki.
Grammar of Graphics and Interactivity
Grammar of Graphics is a framework that coherently ties together many aspects of designing, implementing, reading, and understanding a graphic. Created by Leland Wilkinson, it’s a systematic way of thinking about visualizations. ggplot2, developed by Hadley Wickham, is more or less an implementation of Wilkinson’s GoG framework. ggplot2 allows one to create a visualization in a layer-by-layer manner by associating data variables to visual properties (or aesthetics). This approach allows one to create an astounding variety of visualizations. Rperform uses ggplot2 under-the-hood to create plots such as the one shown above.
When I had started thinking about implementing interactivity in Rperform’s plots, I wanted a solution without giving up ggplot2’s capabilities and the GoG philosophy. The animint package, developed by Toby Dylan Hocking (one of my mentors), has proven to be a good fit since it works in tandem with ggplot2 to create interactive plots. This bit is still a work-in-progress but one can get a taste of how interactivity can be helpful through this example. Here, clicking on a point takes you to the github page for the commit which the point represented.
I will be writing another post after finishing implementation of the functions returning interactive visualizations. That’s all for now, folks!
If you are an R package developer, please try out Rperform on your code and provide feedback if possible. Drop me a mail, or hit me up on Twitter, Github or Quora.
If any problem arises, please open an issue on Github.
One thought on “Visualizing package performance using Rperform and Grammar of Graphics”