Five Things I Learned From My First R Programming Event | by Rory Spanton | May, 2023

By Jessie Hobb On May 11, 2023

Lessons about R, data science, and engaging an audience from SatRDays London

Last month, I did something I’d never done before. I attended an in-person event all about data science — specifically, doing data science in the R programming language.

Conferences aren’t new to me. As a researcher in psychology, I’ve attended and presented at many conferences in my field. But, despite being a long-time R enthusiast, I hadn’t ever got the chance to go to a non-academic conference about data analysis. So, when I got an opportunity to attend SatRDays London, a one-day event all about R, I threw myself at it.

It was a great decision. I learned a lot about R and its uses in many industries, some of which I’d never even heard of before. I also got to talk to lots of other people who are as geeky about R as me, which was a real treat.

Here are five of the most powerful things I learned from SatRDays.

Think you know all the things R can do? Think again.

I sometimes hear data scientists write off R as a specialist language with only a few uses. They say if you’re not doing bioinformatics, academic research, or hardcore statistics, no one in your field uses R. Python is the general-purpose, “do everything” language you should use instead.

This is true to a certain extent — other languages like Python are more widely used than R and are valuable in their own right. But that doesn’t mean that R can’t tackle many important tasks in various sectors.

The range of talks at SatRDays was incredible. Every speaker was from a different company, presenting new and interesting ways they use R for data analysis in their industry.

I listened to data journalists, financial auditors, internet performance analysts, air quality practitioners, and many more share their experiences with R. As an academic, half of these sectors weren’t even on my radar before the conference. Hearing from these people opened my eyes to the wealth of uses for R in various business contexts.

Sure, there aren’t so many R jobs as Python jobs floating around. But let no one tell you that you’re backing yourself into a corner by learning R. It’s becoming clear that more and more companies are adopting it, and there are many new use cases to be excited about.

This is a tip I picked up from Russ Hyde’s talk on good coding practices. It might be familiar to more experienced developers, but it was new and useful to me.

When programming, it’s often best to compartmentalize your code into user-defined functions. This helps to avoid repetition in your scripts and ensures your code is reusable and easy to maintain. But, some ways of splitting your code into different functions are better than others.

For instance, consider a short script that reads in some data, cleans it up, and then writes it to a new file. This script consists of many steps, but we can classify each of them into two categories: calculations and actions.

read_csv("sales_data.csv") %>%
select(date, transaction_id, category, item_price) %>%
filter(date == "2023-05-10") %>%
group_by(category) %>%
summarise(day_turnover = sum(item_price)) %>%
write_csv("sales_summaries/day_turnover_2023-05-10.csv")

Calculations are steps that will return the same output each time, given a certain input. Operations like filtering datasets and calculating summary statistics are good examples of this. Because they return predictable results without any side effects, it’s easy to write automated tests to confirm that they’re working properly.

Actions, by comparison, are much harder to test. They have side effects, such as writing data to a file, that are more difficult to contain. They may also be affected by RNG or other variables in the global environment. This makes testing them a challenge, often requiring dummy files or a controlled environment.

The tip is this: whenever possible, separate actions from calculations.

Here’s an example using the code I introduced earlier. Rather than having actions and calculations in a block, I separate them into two functions. All the actions that read and write data are contained in one function, and all my calculations are in another.

calculate_turnover <- function(sales_data, day) {
sales_data %>%
select(date, transaction_id, category, item_price) %>%
filter(date == day) %>%
group_by(category) %>%
summarise(day_turnover = sum(item_price))
}action_turnover <- function(day) {
read_csv("sales_data.csv") %>%
calculate_turnover(day) %>%
write_csv(paste0("sales_summarise/day_turnover_", day, ".csv"))
}
action_turnover("2023-05-10")

This makes for neat, compartmentalized code. Testing the calculations is a breeze, as they’re tidied away from any side effects of the actions. This separation also makes my actions easier to test.

“Oh yeah, they do that all the time in game dev, too”, replied one of my colleagues as I told him about this approach. If you come from a software development background, you likely might already know about this. But folks in the data industry arrive from all sorts of places and sometimes never learn those kinds of tricks. While I’ve taught myself some good coding principles, this was a new one for me, and I’ll make more use of it going forward.

Speaking of testing, it was a common theme across the whole conference.

One talk by Vyara Apostolova and Laura Cole from the National Audit Office had a heavy focus on tests and checks. Their work involves taking important government models and reproducing them in R. Along the way, they meticulously test every assumption in each model, checking for errors and discrepancies.

This can take years from start to finish. But, this meticulous work is vital for spotting costly errors. They revealed that their work has saved millions of pounds from being wasted on unnecessary costs resulting from mistakes in financial models. All because they test, check, and evaluate everything they code within a rigorous framework.

Throughout the rest of the talks, audience questions often related to testing. If someone presented a cool new way of doing something with R, the first thing people wanted to know was how they’d test it.

Although R is built into our academic teaching and practice in my workplace, testing is highly undervalued. Coming from an outside perspective, SatRDays made me realize just how important it is for data professionals to have robust, automated procedures for checking their work.

Whether you’re already working with data or trying to break into the industry, testing can cement the value of your work. Coming out of SatRDays, I’m planning to improve my software testing chops before looking for a job in a few months.

I’ve written before about how fun learning R can be. But I had never laughed out loud at a data science talk before SatRDays.

If I had to pick a favourite talk from the whole day, it’d be Andrew Collier’s “Sidekicks of the Tidyverse”. In this presentation, Andrew walked through some lesser-known functions in the tidyverse, and how they complement more commonly used functions.

Explaining how code works in a talk is a tough task. You walk a tightrope, balancing between being informative and not losing your audience in technicalities. All the talks at SatRDays achieved this balance, but Andrew did so with stand-out humour and style.

His talk was peppered with pop-culture references and jokes that perfectly gauged the taste of the audience. Besides being funny, these worked to relate new concepts to information the audience already knew.

Using humour in a presentation means putting yourself out there. Getting some laughs out of your audience is a uniquely warm and validating experience. The feeling you get when your jokes echo around a silent room is the complete opposite. Such are the highs and lows of presenting.

That said, humour is a great tool in technical presentations, even if doesn’t always work out. Rather than being an entertaining distraction, it can even make your explanations better if used well. Going forward, I’ll try to add a little more humour to my talks to make them stand out.

The R community is amazing.

This point shouldn’t come as a surprise to anyone who knows me well. I encourage community engagement when learning R and even when choosing which software packages to use.

But, I’d never occupied the same space as so many R users and professionals before. I didn’t know anyone at the start of the day, so I was banking on finding at least a couple of nice people to talk to. I was a little nervous, to tell the truth.

I needn’t have worried. Everyone I met was friendly, approachable, and interesting to talk to. I’m not an extrovert or a natural at networking, but I ended up staying on my feet and chatting with people during all the breaks. I know for a fact that’s not a given at all events — I’ve spent a few lonely lunches without company before, so I was especially glad of it here.

The R community is diversifying, too. Ella Kaye and Heather Turner talked about their ongoing work to get members of underrepresented groups involved in maintaining the R language. Right now, most major contributors to the language are Western men nearing the end of their careers. To keep R working and evolving with the times, it’s important to pass the torch to a more diverse set of contributors from around the world. Ella and Heather shared various initiatives and events they’re setting up to make this happen, all of which sounded like promising steps forward.

I was pleased to see this diversity reflected in the audience. Although I counted myself among many other white, bespectacled men in attendance, there were plenty of others with different genders, traits and backgrounds. I got a sense that everyone was included and that the organizers were actively facilitating this.

If there’s a moral here, it’s that you shouldn’t be afraid to connect with people, especially in the R community. They can give you invaluable insight into jobs, companies, and technologies that can move your career forward. And, at least in my experience, they’re really nice.