Techno Blender
Digitally Yours.

How to Write a Custom Function to Generate Multiple Plots in R | by Vivian Peng | Apr, 2023

0 41


Here’s a breakdown of the logic for creating a custom function:

1. Start with creating one visual first
2. Understand which variable you want to use to create multiple plots
3. Change the graphing code into a function
4. Loop through your unique values to generate multiple plots

Let’s work with the adorable Palmer Penguins dataset from Allison Horst. This dataset has three unique species of penguins — Chinstrap, Gentoo, Adelie:

Artwork by @allison_horst

Here’s how to load the data

# Load libraries
library(palmerpenguins)
library(tidyverse)

# Load in data
data(package = 'palmerpenguins')
# Write penguins to a `df` variable.
# I'm doing this simply because it's easier to type `df` than `penguins` each time.
df <- penguins

1. Start with creating one visual first

Let’s create a bar plot for the Adelie species to see their median body mass for each year.

# Create a summary table to calculate the median body mass by species and year
summary <- df %>%
group_by(species, year) %>%
summarise(median_body_mass = median(body_mass_g, na.rm =T))

# Create a Plotly bar chart for the median bass of the Adelie penguins
plot_ly(
data= {summary %>% filter(species == "Adelie")},
x = ~year,
y = ~median_body_mass,
color = ~year,
type = "bar",
showlegend = FALSE) %>%
layout(
yaxis = list(title = 'Median Body Mass (g)'),
xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
title = "Median Body Mass for Adelie Penguins") %>%
hide_colorbar() %>%
suppressWarnings()

A bar chart of the median body mass for Adelie Penguins for the years 2007, 2008, and 2009.

2. Understand which variable you want to use to create multiple plots

aka: what’s your `facet_wrap` variable?

Here’s the view of our summary table. We want to create the same bar graph for each species. In this example, our variable of interest is the species variable.

A view of our summary table that displays the median body mass for each penguin species — Adelie, Chinstrap, and Gentoo — for the years 2007, 2008, and 2009.

3. Change the graphing code into a function

Identify the components in your graphing code that need to be generalized. Now, we will swap out any instance of the species name Adelie with a generalized variable:

Description of our Plotly code that shows which variables we will want to generalize. In this example, we want to swap out any instance of the species name “Adelie” with a generalized variable so we can create the plot for each new species.

Transform the graphing code into a function. This function takes in one variable species_name which will be entered as string text. See how instead of the name Adelie here, we have replaced with the variable species_name:

plot_fx <- function(species_name){
plot_ly(
data= {summary %>% filter(species == species_name)},
x = ~year,
y = ~median_body_mass,
color = ~year,
type = "bar",
showlegend = FALSE) %>%
layout(
yaxis = list(title = 'Median Body Mass (g)'),
xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
title = paste("Median Body Mass for", species_name, "Penguins")) %>%
hide_colorbar() %>%
suppressWarnings()
}

Here’s an example of how to run the function to generate your new plot. Let’s make the same bar chart for the species Chinstrap:

# Run function for species name "Chinstrap"
plot_fx("Chinstrap")
A bar chart of the median body mass for Chinstrap Penguins for the years 2007, 2008, and 2009. This was generated by the custom function we created in the post.

4. Loop through your unique values to generate multiple plots

From here, you need a list of all the unique species to loop through for your function. We get that with unique(summary$species)

Start with creating an empty list to store all your plots

# Create an empty list for all your plots
plot_list = list()

Loop through the unique species variable to generate a plot for each species. Then, add it to the plot_list

# Run the plotting function for all the species
for (i in unique(summary$species)){
plot_list[[i]] = plot_fx(i)
}

# Now you have a list of three plots - one for each species.
# You can see the plots by changing the value within the square brackes from 1 to 3
plot_list[[1]]

Now visualize all the plots in one grid with the subplot function in Plotly:

# Plot all three visuals in one grid
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE)
Three bar charts of the median body mass for Adelie, Chinstrap, and Gentoo Penguins for the years 2007, 2008, and 2009. This was generated by looping through each unique species in our dataset for our custom graphing function.

We did it!

I know that’s a lot more work than using the facet_wrap function in ggplot2, but understanding how to create functions helps with automating reports and creating more dynamic dashboards and visuals!

Bonus Step! Adding Annotations to Get a Title for Each Plot

To get the titles on each of the subplot in the last visual, you have to use annotations in Plotly.

# Create a list of annotations
# The x value is where it lies on the entire subplot grid
# The y value is where it lies on the entire subplot grid

my_annotations = list(
list(
x = 0.1,
y = 0.978,
font = list(size = 16),
text = unique(summary$species)[[1]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
),
list(
x = 0.1,
y = 0.615,
font = list(size = 16),
text = unique(summary$species)[[2]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
),
list(
x = 0.1,
y = 0.285,
font = list(size = 16),
text = unique(summary$species)[[3]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
))

This is kind of a messy, trial-and-error process, because you have to hard code in the positions. Here’s a breakdown for how to do so:

  1. Create a list of annotations for each subplot title: The annotations will be a list of lists. Each element is a list that includes all the information for each subplot. In our example, I want one title that displays the species name for each subplot, so I will have a list with 3 elements. Here’s what goes into each element:
Description of our annotations code that shows what the ‘x’ , ‘y’, and ‘text’ variables correspond to.
  • x: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the left end and 1 at the right end.
  • y: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the bottom and 1 at the top.
  • text: This is the text you want to display for each of the subplot titles.
  • xref and yref: You have options to select ‘paper’ which means the position refers to the distance from the left of the plotting area in normalized coordinates where “0” (“1”) corresponds to the left (right). Alternatively, you can select ‘domain’ that will correspond to the domain for each individual subplot.
  • xanchor: Sets the text box’s horizontal position anchor. This anchor binds the `x` position to the “left”, “center” or “right” side of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on xanchor alignment for Plotly layout.
  • yanchor: Sets the text box’s vertical position anchor. This anchor binds the `y` position to the “top”, “middle” or “bottom” of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on yanchor alignment for Plotly layout.
  • showarrow: Plotly can draw an arrow that points to the location of your annotation using TRUE or FALSE options . This is helpful if you want to label a specific point on a scatter plot. Since we are just adding text labels onto each subplot, the arrow is unnecessary in this example.

2. Add the layout option to your subplot code: You can add layout options with the layout() function.

# Run the subpot line including a layout
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE) %>%
layout(annotations = my_annotations,
title = "Median Body Mass for Palmer Penguins",
xaxis = list(tickvals = list(2007, 2008, 2009)),
xaxis2 = list(tickvals = list(2007, 2008, 2009)),
xaxis3 = list(tickvals = list(2007, 2008, 2009)))

Here are some options you can specify:

  • annotations: The list of annotations you created that include all the information for the text and position of each label
  • title: This is the text for the title of the entire grid
  • xaxis, xaxis2, xaxis3: In Plotly, each subplot has its own x axis properties. xaxis refers to the first subplot. In this example, the one for the Adelie penguin species. The remaining x axes can be referenced by numbering each one. Here I am specifying the label for the tick values so that we have standardized years.

Conclusion

While this is a simple example, I hope this helps open up more possibilities for improving your data science workflow by using custom functions! You can take the steps we took here and generalize it to writing custom functions overall by:

  • Starting with a simplified example
  • Swapping out your variable into a generalized variable
  • Applying the function to the rest of your data

Once you have the basics down, you can expand on this to ensure reproducibility of your work through automated reports, dashboards, and interactive visuals. Having this foundation also helps you become more proficient in both languages — R and Python — because you can reconstruct what works in one language into the other. In a world where R and Python are becoming increasingly more interchangeable, this offers possibilities that are not limited to a specific language!


Here’s a breakdown of the logic for creating a custom function:

1. Start with creating one visual first
2. Understand which variable you want to use to create multiple plots
3. Change the graphing code into a function
4. Loop through your unique values to generate multiple plots

Let’s work with the adorable Palmer Penguins dataset from Allison Horst. This dataset has three unique species of penguins — Chinstrap, Gentoo, Adelie:

Artwork by @allison_horst

Here’s how to load the data

# Load libraries
library(palmerpenguins)
library(tidyverse)

# Load in data
data(package = 'palmerpenguins')
# Write penguins to a `df` variable.
# I'm doing this simply because it's easier to type `df` than `penguins` each time.
df <- penguins

1. Start with creating one visual first

Let’s create a bar plot for the Adelie species to see their median body mass for each year.

# Create a summary table to calculate the median body mass by species and year
summary <- df %>%
group_by(species, year) %>%
summarise(median_body_mass = median(body_mass_g, na.rm =T))

# Create a Plotly bar chart for the median bass of the Adelie penguins
plot_ly(
data= {summary %>% filter(species == "Adelie")},
x = ~year,
y = ~median_body_mass,
color = ~year,
type = "bar",
showlegend = FALSE) %>%
layout(
yaxis = list(title = 'Median Body Mass (g)'),
xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
title = "Median Body Mass for Adelie Penguins") %>%
hide_colorbar() %>%
suppressWarnings()

A bar chart of the median body mass for Adelie Penguins for the years 2007, 2008, and 2009.

2. Understand which variable you want to use to create multiple plots

aka: what’s your `facet_wrap` variable?

Here’s the view of our summary table. We want to create the same bar graph for each species. In this example, our variable of interest is the species variable.

A view of our summary table that displays the median body mass for each penguin species — Adelie, Chinstrap, and Gentoo — for the years 2007, 2008, and 2009.

3. Change the graphing code into a function

Identify the components in your graphing code that need to be generalized. Now, we will swap out any instance of the species name Adelie with a generalized variable:

Description of our Plotly code that shows which variables we will want to generalize. In this example, we want to swap out any instance of the species name “Adelie” with a generalized variable so we can create the plot for each new species.

Transform the graphing code into a function. This function takes in one variable species_name which will be entered as string text. See how instead of the name Adelie here, we have replaced with the variable species_name:

plot_fx <- function(species_name){
plot_ly(
data= {summary %>% filter(species == species_name)},
x = ~year,
y = ~median_body_mass,
color = ~year,
type = "bar",
showlegend = FALSE) %>%
layout(
yaxis = list(title = 'Median Body Mass (g)'),
xaxis = list(title = 'Year',tickvals = list(2007, 2008, 2009)),
title = paste("Median Body Mass for", species_name, "Penguins")) %>%
hide_colorbar() %>%
suppressWarnings()
}

Here’s an example of how to run the function to generate your new plot. Let’s make the same bar chart for the species Chinstrap:

# Run function for species name "Chinstrap"
plot_fx("Chinstrap")
A bar chart of the median body mass for Chinstrap Penguins for the years 2007, 2008, and 2009. This was generated by the custom function we created in the post.

4. Loop through your unique values to generate multiple plots

From here, you need a list of all the unique species to loop through for your function. We get that with unique(summary$species)

Start with creating an empty list to store all your plots

# Create an empty list for all your plots
plot_list = list()

Loop through the unique species variable to generate a plot for each species. Then, add it to the plot_list

# Run the plotting function for all the species
for (i in unique(summary$species)){
plot_list[[i]] = plot_fx(i)
}

# Now you have a list of three plots - one for each species.
# You can see the plots by changing the value within the square brackes from 1 to 3
plot_list[[1]]

Now visualize all the plots in one grid with the subplot function in Plotly:

# Plot all three visuals in one grid
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE)
Three bar charts of the median body mass for Adelie, Chinstrap, and Gentoo Penguins for the years 2007, 2008, and 2009. This was generated by looping through each unique species in our dataset for our custom graphing function.

We did it!

I know that’s a lot more work than using the facet_wrap function in ggplot2, but understanding how to create functions helps with automating reports and creating more dynamic dashboards and visuals!

Bonus Step! Adding Annotations to Get a Title for Each Plot

To get the titles on each of the subplot in the last visual, you have to use annotations in Plotly.

# Create a list of annotations
# The x value is where it lies on the entire subplot grid
# The y value is where it lies on the entire subplot grid

my_annotations = list(
list(
x = 0.1,
y = 0.978,
font = list(size = 16),
text = unique(summary$species)[[1]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
),
list(
x = 0.1,
y = 0.615,
font = list(size = 16),
text = unique(summary$species)[[2]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
),
list(
x = 0.1,
y = 0.285,
font = list(size = 16),
text = unique(summary$species)[[3]],
xref = "paper",
yref = "paper",
xanchor = "center",
yanchor = "bottom",
showarrow = FALSE
))

This is kind of a messy, trial-and-error process, because you have to hard code in the positions. Here’s a breakdown for how to do so:

  1. Create a list of annotations for each subplot title: The annotations will be a list of lists. Each element is a list that includes all the information for each subplot. In our example, I want one title that displays the species name for each subplot, so I will have a list with 3 elements. Here’s what goes into each element:
Description of our annotations code that shows what the ‘x’ , ‘y’, and ‘text’ variables correspond to.
  • x: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the left end and 1 at the right end.
  • y: This is a value between 0 and 1 and corresponds to the position for the entire graphic, with 0 at the bottom and 1 at the top.
  • text: This is the text you want to display for each of the subplot titles.
  • xref and yref: You have options to select ‘paper’ which means the position refers to the distance from the left of the plotting area in normalized coordinates where “0” (“1”) corresponds to the left (right). Alternatively, you can select ‘domain’ that will correspond to the domain for each individual subplot.
  • xanchor: Sets the text box’s horizontal position anchor. This anchor binds the `x` position to the “left”, “center” or “right” side of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on xanchor alignment for Plotly layout.
  • yanchor: Sets the text box’s vertical position anchor. This anchor binds the `y` position to the “top”, “middle” or “bottom” of the annotation. Imagine where your point is based on your x and y coordinates, and how you want the text to align relative to the position.
Description on yanchor alignment for Plotly layout.
  • showarrow: Plotly can draw an arrow that points to the location of your annotation using TRUE or FALSE options . This is helpful if you want to label a specific point on a scatter plot. Since we are just adding text labels onto each subplot, the arrow is unnecessary in this example.

2. Add the layout option to your subplot code: You can add layout options with the layout() function.

# Run the subpot line including a layout
subplot(plot_list, nrows = 3, shareX = TRUE, shareY = FALSE) %>%
layout(annotations = my_annotations,
title = "Median Body Mass for Palmer Penguins",
xaxis = list(tickvals = list(2007, 2008, 2009)),
xaxis2 = list(tickvals = list(2007, 2008, 2009)),
xaxis3 = list(tickvals = list(2007, 2008, 2009)))

Here are some options you can specify:

  • annotations: The list of annotations you created that include all the information for the text and position of each label
  • title: This is the text for the title of the entire grid
  • xaxis, xaxis2, xaxis3: In Plotly, each subplot has its own x axis properties. xaxis refers to the first subplot. In this example, the one for the Adelie penguin species. The remaining x axes can be referenced by numbering each one. Here I am specifying the label for the tick values so that we have standardized years.

Conclusion

While this is a simple example, I hope this helps open up more possibilities for improving your data science workflow by using custom functions! You can take the steps we took here and generalize it to writing custom functions overall by:

  • Starting with a simplified example
  • Swapping out your variable into a generalized variable
  • Applying the function to the rest of your data

Once you have the basics down, you can expand on this to ensure reproducibility of your work through automated reports, dashboards, and interactive visuals. Having this foundation also helps you become more proficient in both languages — R and Python — because you can reconstruct what works in one language into the other. In a world where R and Python are becoming increasingly more interchangeable, this offers possibilities that are not limited to a specific language!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment