Explaining my favourite #TidyTuesday Projects | by Isaac Arroyo | Sep, 2022

By Jessie Hobb On Sep 13, 2022

A behind-the-scenes of some of my favourite data visualizations: explaining my idea or thought process

Collage of my favourite #TidyTuesday data visualizations. All data visualizations and the collage are made by the author.

I love data visualization, so when I saw people share their DataViz from a different dataset every week… I felt intimidated. However, I wanted to give it a try, so for the past ten weeks, I’ve been contributing* to the #TidyTuesday “challenge”** as a personal project, like many other #RStats people.

* I like to say I’m contributing to the #TidyTuesday project (despite not sending any dataset) because I think that, by sharing my code, I’m helping others like me: people who want to learn how to create attractive and creative data visualizations.

** Tidy Tuesday is a project from the R for Data Science Online Learning Community (@R4DScommunity) aimed at the R ecosystem. Every week the community shares a new dataset, and the whole R community is invited to practice their data wrangling and data visualization skills, using mainly the collection of tivdyverse packages. You can join by sharing a data visualization using the hashtag #TidyTuesday and also see and engage with the work of other users.

I’m writing this to share some of my favourite contributions and briefly explain my idea or thought process behind them to let people know how I approached the design or why I approached it that way.

Without further ado, here are my favourite data visualizations for #TidyTuesday (in ascending order, a.k.a. from least favourite to most favourite) up to September 2022 (the month I finished writing this).

How could these data visualization examples be useful?

Creativity is a skill that should always be welcome, especially in data jobs-related. Whether you’re a Data Scientist, Data Analyst or Data Journalist, creativity helps us tell better stories, deliver a different experience and stand out from the rest. That’s why I also share how to use it in real-world cases.

Important note
All the visualizations were entirely made with R, ggplot2 and ggplot2 extensions.

#TidyTuesday Data Visualization for Week 33. Created by the author.

People are multidimensional, and enclosing them into one single personality is complicated. The dataset from Week 35 compiles people’s opinions on the different personality traits that fictional characters could have.

I enjoyed creating this data visualization due to the idea I had in my mind: To see the characters of my favourite shows and find who is closer to whom. Therefore, I performed a PCA to reduce ten personality traits to project the data in a two-dimensional plane; the rest is history (and by history, I mean you can see my code).

Example in a real-world case: Showing similarities between groups given a set of features (behaviour or physical attributes).

Remeber that with PCA, you get an easier way to visualize data by exchanging the explainable variables. You can represent data (this way) in scenarios where people are familiar with the variables (so they don’t mind looking at “artificial variables”), and you want to highlight specific instances. Also, it doesn’t hurt to create some scatter plots of the original dataset to complement the story.

#TidyTuesday Data Visualization for Week 34. Created by the author.

I’m used to seeing scientific data due to the papers I had to read during my time at university (not so long ago). So, I saw the CHIPS dataset as an opportunity to be creative.

It’s well known that time series data is better represented with a straight line (vertically or horizontally). Still, I decided to take an unconventional approach by circularly wrapping the x-axis.

The result? Circles wrapped in a “more giant” ring (you will see that my favourite geometries are curves).

Example in a real-world case: Time series data, such as in finance, economics and the environment.

Circularly wrapping the x-axis is a creative way of standing out from the rest of the other time series visualizations. However, it comes with the cost of the chart’s readability; a circular layout makes it hard to estimate and compare whatever the y-axis tells precisely. So, keep that in mind if you’re taking that approach.

#TidyTuesday Data Visualization for Week 31. Created by the author.

Remember the famous TV show aired in the 2010s called “Gossip Girl”? I do, and I loved it. One of my favourite phrases was “SPOTTED” whenever a character was in a specific yet unexpected place.

Therefore, when I read that “Week 31” was about “Oregon Spotted Frog,” my mind could only think about the word SPOTTED. So, I tried to imitate Gossip Girl as if she was into Data Visualization.

To create the Gossip Girl logo, I had to use some basic Adobe Illustrator skills; the rest was R.

Example in a real-world case: Part-of-a-whole data, such as the number of different group members in a specific population (political parties and gender identities).

By this time, you may have intuited that my favourite geometries are circumferences and curves; the reason is that by using points (I think), people see the data as individuals and not just numbers. So, I recommend using this approach to highlight the relevance of the data; use it to represent people or any other living being.

#TidyTuesday Data Visualization for Week 35. Created by the author.

If I were someone who knows nothing about R and ggplot2, I would think that the data visualization was made with design software (like Illustrator or InDesign). However, it is entirely made with R.

I knew nothing about Pell Grants, Ivy League schools and the U.S. educational system (now I know a little less), so I thought about creating a visualization that people like me could understand.

This is my favourite data visualization by far. The reason? The amount of effort (and lines of code) I put into this data visualization is incredible. I had to tweak many parameters in order to achieve a report-looking data visualization that tells a story (or that’s what I think about it) and highlights what was interesting for me.

I struggled, but in the end, I was happy with the results.

Example in a real-world case: Show distributions and highlight their key elements.

I like representing distributions with more than one chart in the same place, hence the density distribution + dots and highlighted dots. It can tell the story of a few elements among a set of others.

I personally do not recommend creating a layout like this using code only; it takes a lot of time and is not reproducible with different data; for that task exists other software like BI tools (Power BI or Tableau) for dashboards or design software (Adobe Illustrator and Adobe InDesign) for reports.

However, it is a good exercise if you want to explore the horizons of ggplot2 and its extensions.

#TidyTuesday allows me to explore my creativity, test my patience and learn new things that R (alongside ggplot2 and its extensions) can do.

This is not the end, and I plan to continue contributing to #TidyTuesday with my data visualizations (code included) for as long as possible. Perhaps in the future, I can share other favourite data visualizations.

Here’s my GitHub repository where you can find all the code for these data visualizations.

Hello! I’m Isaac, a (part-time) Data Visualization Designer/Specialist. I’m passionate about this field, and I’m constantly learning how to deliver and design better ways to visualize data.
I love collaborating and aspire to use my data analysis and visualization skills in other fields. For example, social sciences and human rights 🧑‍🤝‍🧑👬👭, the arts 🎨, public policy 🏛️ and the environment 🌱🍃.
I constantly share my projects and data visualizations on Twitter (@unisaacarroyov) and Behance (also as unisaacarroyov).
You can contact me via Twitter or LinkedIn.