Rperform had started as a GSoC 2015 project with an aim to “to provide a package with functions that make it easy for R package developers to track quantitative performance metrics of their code, over time.” Much of the functionality required for the same was implemented over the course of last summer. This included various performance visualization functions and integration with the Travis-CI workflow, among other things.
The project has been accepted into the GSoC program again under the organization, R project for statistical computing. I will be working on it over the summer with my mentors, Toby Dylan Hocking and Joshua Ulrich.
Some of the most important questions that Rperform could answer are “does this Pull Request change the speed or memory usage of my R package?” and “which commit contains the bit of code that slowed down my R package?” Rperform needs to provide the answers to these questions in the best manner possible. Towards that end, the main focus of this year’s GSoC project is improvement of visualization functions, improving integration with Travis-CI and the development of a useful UI. Some of these aspects are explored below.
As of now, R package developers using Travis-CI for continuous integration on Github can integrate Rperform into their workflow. This can be done by including some simple scripts in their repo and setting up a gh-page.This will allow them to generate reports (in the form of a webpage) featuring memory and run-time analysis of their code. More details can be found on the project’s wiki.
However, there’s something even cooler which we plan to implement. We can use Rperform to be able to obtain the impact an incoming pull request (PR) would have on our packages’s performance. And that too without having to merge it first!
Sounds slick, right?! As it happens, that’s one of the trickier parts of this project. I will be exploring this particular issue in a separate post. (Hopefully, it would have been resolved by then.)
Below is a sample visualization generated by Rperform. It measures the runtime performance of a unit test from stringr package over the past 10 commits. stringr’s author, Hadley Wickham, happens to be one of the project mentors from last year.
One of the ways in which the above could be made more interactive is by providing user with details (date, author, etc) when hovering over a datapoint. This could be achieved by using a library such as dygraphs. Such are the additions which I seek to add to Rperform’s existing visualization capabilities.
Development of a coherent and useful UI is essential to allow developers to interact meaningfully with the analysis results obtained for their packages’ code. This part of the project derives its inspiration from similar projects in other languages (primarily Python) such as asv and codespeed, to name a few.
The end goal for the package remains the same as stated earlier. The work I intend to do in the next few months will hopefully result in the package becoming an integral part of the R developer toolbox.
If you are an R package developer, please try out Rperform on your code and provide feedback if possible. Drop me a mail, or hit me up on Twitter, Github or Quora.
If any problem arises, please open an issue on Github.
2 thoughts on “Rperform in Google Summer of Code 2016”