Raise Awareness about Climate Change with ggplot2 | by Bruno Ponne | Apr, 2023


Photo by Ganapathy Kumar on Unsplash

Global warming isn’t a prediction. It is happening.
James Hansen

There is solid evidence that temperatures are rising on our planet. With climate change threatening the very existence of humanity, efforts to understand, research, and foster awareness of this critical issue are more crucial than ever before.

Whether you are a student, work for the government, a non-governmental organization, or a private company, it is essential that you show your colleagues your engagement on relevant global problems.

In this tutorial, you will learn where to find reliable and curated historical temperature data and visualize it with ggplot2. After you finish this post, you will:

  • know where to find curated datasets with historical weather data;
  • feel comfortable plotting historical weather data with ggplot2;
  • be able to customize your ggplot2 graphs to tell your story.

Step 1: Find and load the data

Data for this tutorial is available on National Centers for Environmental Information (NCEI)*. The NCEI is the leading authority for environmental data in the USA and provides high-quality data about climate, ecosystems, and water resources. The Global Summary of the Year (GSOY) dataset offers historical weather data by city and station. For this tutorial, we will use data from Berkeley, CA. You can choose your preferred city if you wish. To use the same dataset of this tutorial, search for Berkeley and choose the file that contains data since 1893.

The file will be loaded with read_csv. The only argument is the file path. Directly after loading the data frame, we select only the DATEand TAVGvariables. DATE contains the year the temperature was observed and TAVG is the average annual temperature given in Celcius. To know more about the available variables, please consult the dataset codebook.

library(readr)
library(dplyr)

df <- read_csv('USC00040693.csv') %>%
select("DATE", "TAVG")

summary(df)

The R summary() function tells us that the data range from 1893 to 2019 and that the minimal average annual temperature observed was 12.9 ºC in Berkeley, CA, in this period. The maximum average temperature was 15.93 ºC. It also shows that there are 33 missing temperatures.

Step 2: Imputing missing values with na_interpolation()

Given that we are working with time series, we will fill in missing values with linear interpolation. This method assumes data varied linearly during the missing period. Actually, when you plot a time series using a line plot, the intervals between observations are also filled in with a straight line connecting the two dots.

To perform linear interpolation, we will use the imputeTS package. After installing and loading the library, you can use na_interpolation()to fill in the missing values. You pass two arguments to it. First, the data frame column you would like to treat, and, second, the method you wish to use to perform the imputation.

library(imputeTS)

df$TAVG <- na_interpolation(df$TAVG, option ="linear")

Step 3: Coding the first version of our plot

A ggplot2 visualization is built of layers. As shown in the figure below, each layer contains one geom object, that is, one element that you see in your graph (lines and dots, for instance).

Image created by the author

First, you need to pass a dataset to the ggplot()function. Second, you will map variables to aesthetics — visual properties of a geom object. Aesthetics are the position on the y-axis, the position on the x-axis, color, or size, for instance. Below we also set the black and white ggplot2 theme for our plot. Without adding any other geom objects, the plot will have only the two axes.

library(ggplot2)

theme_set(theme_bw())

axes <- ggplot(data = df, aes(x = DATE, y = TAVG))

axes

Image created by the author

Now you may add a second layer with dots indicating temperatures throughout time. Note that you can add this layer to the plot you made in the previous step using the “+” sign.

axes + 
geom_point()
Image created by the author

Finally, you may add a third layer containing the lines. It is important to highlight that some authors claim that the lines do not represent observed data and should be used carefully. For a complete discussion, please check chapter 13 of Fundamentals of Data Visualization by Claus O. Wilke.

Image created by the author

Step 4: Customizing your plot

In this section, you will learn how to customize your plot to make it clear, informative, and beautiful.

First, to make the increase in temperature more visible, we will map the color aesthetic of the dots to TAVG as well. Since it is a numeric variable, ggplot2 will use a gradient to represent continuous values as colors. You can choose which color will represent low temperatures as well as high temperatures with the scale_color_gradient()function.

Moreover, you may set the x and y axes’ labels with xlab() and ylab(), respectively. A title can be added with ggtitle(). We will also increase the size of the dots and add transparency to make overlapped data visible.

Edward Tufte, an expert in the field of data visualization, recommends maximizing the proportion of ink used to display non-redundant data. The author claims that it makes your plot clearer and avoids distracting your reader.

The ggplot2 theme we are using, theme_bw(), is already in line with Tufte’s recommendations, but we could still eliminate the panel grids of the plot. In order to achieve that, use the theme() function and pass two arguments to it, panel.grid.minor = element_blank()and panel.grid.major = element_blank().

Image created by the author

Step 5: Create a theme for your visualization

You will now learn how to create your own ggplot2 theme. As an example, we will createtheme_tds().

First, we will load the Google font “Source Serif Pro”. It is the font used in Medium articles. You can easily load it with the showtext package. If you do not have it, please install it. Once you have the package installed, load it and use the font_add_google() function to load “Source Serif Pro”. We also tell R to render text using showtextwith showtext_auto().

library(showtext)

font_add_google("Source Serif Pro")

showtext_auto()

Note that some authors recommend only using sans-serif fonts in plots. Check out this post for a discussion on this issue.

Now we will use theme() to customize the graph. The figure below shows some of the arguments you can use. For a complete list, please check this ggplot2 reference.

Image created by the author

You may create a new theme with a function that calls the ggplot2 theme() function containing your customized specifications. Note that we start from the black-and-white theme(theme_bw) and then eliminate grids and change the background, panel, and text colors. To facilitate future changes, two arguments were created for the user to specify the desired text, panel, and background colors.

theme_tds <- function(text_panel_color, background_color) {
theme_bw()+
theme(text=element_text(size=10,
family="Source Serif Pro",
color = text_panel_color),
# Eliminates grids
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
# Changes panel, plot and legend background
panel.background = element_rect(fill = background_color),
plot.background = element_rect(fill = background_color),
legend.background = element_rect(fill= background_color),
# Changes legend texts color
legend.title = element_text(color = text_panel_color),
# Changes plot border color and size
panel.border = element_rect(size = 1, color = text_panel_color),
# Changes color of axis texts
axis.text.x = element_text(color = text_panel_color),
axis.text.y = element_text(color = text_panel_color),
axis.title.x = element_text(color= text_panel_color),
axis.title.y = element_text(color= text_panel_color),
# Changes axis ticks color
axis.ticks.y = element_line(color = text_panel_color),
axis.ticks.x = element_line(color = text_panel_color),
)
}

Now you can simply add theme_tds() to your plot and specify your preferred colors. Here is an example:

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "white",
background_color = "#252525")
Image created by the author

Another example with white background and the font color of Towards Data Science:

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "#365A77",
background_color = "white")
Image created by the author

Finally, you could show the temperature trend with a LOESS (locally estimated scatterplot smoothing) smoother, as recommended by Claus O. Wilke in Chapter 14 of Fundamentals of Data Visualization. You can do that by adding a ggplot2 layer containing the element geom_smooth().

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
geom_smooth(color = "#365A77", se = FALSE)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "#365A77",
background_color = "white")
Image created by the author


Photo by Ganapathy Kumar on Unsplash

Global warming isn’t a prediction. It is happening.
James Hansen

There is solid evidence that temperatures are rising on our planet. With climate change threatening the very existence of humanity, efforts to understand, research, and foster awareness of this critical issue are more crucial than ever before.

Whether you are a student, work for the government, a non-governmental organization, or a private company, it is essential that you show your colleagues your engagement on relevant global problems.

In this tutorial, you will learn where to find reliable and curated historical temperature data and visualize it with ggplot2. After you finish this post, you will:

  • know where to find curated datasets with historical weather data;
  • feel comfortable plotting historical weather data with ggplot2;
  • be able to customize your ggplot2 graphs to tell your story.

Step 1: Find and load the data

Data for this tutorial is available on National Centers for Environmental Information (NCEI)*. The NCEI is the leading authority for environmental data in the USA and provides high-quality data about climate, ecosystems, and water resources. The Global Summary of the Year (GSOY) dataset offers historical weather data by city and station. For this tutorial, we will use data from Berkeley, CA. You can choose your preferred city if you wish. To use the same dataset of this tutorial, search for Berkeley and choose the file that contains data since 1893.

The file will be loaded with read_csv. The only argument is the file path. Directly after loading the data frame, we select only the DATEand TAVGvariables. DATE contains the year the temperature was observed and TAVG is the average annual temperature given in Celcius. To know more about the available variables, please consult the dataset codebook.

library(readr)
library(dplyr)

df <- read_csv('USC00040693.csv') %>%
select("DATE", "TAVG")

summary(df)

The R summary() function tells us that the data range from 1893 to 2019 and that the minimal average annual temperature observed was 12.9 ºC in Berkeley, CA, in this period. The maximum average temperature was 15.93 ºC. It also shows that there are 33 missing temperatures.

Step 2: Imputing missing values with na_interpolation()

Given that we are working with time series, we will fill in missing values with linear interpolation. This method assumes data varied linearly during the missing period. Actually, when you plot a time series using a line plot, the intervals between observations are also filled in with a straight line connecting the two dots.

To perform linear interpolation, we will use the imputeTS package. After installing and loading the library, you can use na_interpolation()to fill in the missing values. You pass two arguments to it. First, the data frame column you would like to treat, and, second, the method you wish to use to perform the imputation.

library(imputeTS)

df$TAVG <- na_interpolation(df$TAVG, option ="linear")

Step 3: Coding the first version of our plot

A ggplot2 visualization is built of layers. As shown in the figure below, each layer contains one geom object, that is, one element that you see in your graph (lines and dots, for instance).

Image created by the author

First, you need to pass a dataset to the ggplot()function. Second, you will map variables to aesthetics — visual properties of a geom object. Aesthetics are the position on the y-axis, the position on the x-axis, color, or size, for instance. Below we also set the black and white ggplot2 theme for our plot. Without adding any other geom objects, the plot will have only the two axes.

library(ggplot2)

theme_set(theme_bw())

axes <- ggplot(data = df, aes(x = DATE, y = TAVG))

axes

Image created by the author

Now you may add a second layer with dots indicating temperatures throughout time. Note that you can add this layer to the plot you made in the previous step using the “+” sign.

axes + 
geom_point()
Image created by the author

Finally, you may add a third layer containing the lines. It is important to highlight that some authors claim that the lines do not represent observed data and should be used carefully. For a complete discussion, please check chapter 13 of Fundamentals of Data Visualization by Claus O. Wilke.

Image created by the author

Step 4: Customizing your plot

In this section, you will learn how to customize your plot to make it clear, informative, and beautiful.

First, to make the increase in temperature more visible, we will map the color aesthetic of the dots to TAVG as well. Since it is a numeric variable, ggplot2 will use a gradient to represent continuous values as colors. You can choose which color will represent low temperatures as well as high temperatures with the scale_color_gradient()function.

Moreover, you may set the x and y axes’ labels with xlab() and ylab(), respectively. A title can be added with ggtitle(). We will also increase the size of the dots and add transparency to make overlapped data visible.

Edward Tufte, an expert in the field of data visualization, recommends maximizing the proportion of ink used to display non-redundant data. The author claims that it makes your plot clearer and avoids distracting your reader.

The ggplot2 theme we are using, theme_bw(), is already in line with Tufte’s recommendations, but we could still eliminate the panel grids of the plot. In order to achieve that, use the theme() function and pass two arguments to it, panel.grid.minor = element_blank()and panel.grid.major = element_blank().

Image created by the author

Step 5: Create a theme for your visualization

You will now learn how to create your own ggplot2 theme. As an example, we will createtheme_tds().

First, we will load the Google font “Source Serif Pro”. It is the font used in Medium articles. You can easily load it with the showtext package. If you do not have it, please install it. Once you have the package installed, load it and use the font_add_google() function to load “Source Serif Pro”. We also tell R to render text using showtextwith showtext_auto().

library(showtext)

font_add_google("Source Serif Pro")

showtext_auto()

Note that some authors recommend only using sans-serif fonts in plots. Check out this post for a discussion on this issue.

Now we will use theme() to customize the graph. The figure below shows some of the arguments you can use. For a complete list, please check this ggplot2 reference.

Image created by the author

You may create a new theme with a function that calls the ggplot2 theme() function containing your customized specifications. Note that we start from the black-and-white theme(theme_bw) and then eliminate grids and change the background, panel, and text colors. To facilitate future changes, two arguments were created for the user to specify the desired text, panel, and background colors.

theme_tds <- function(text_panel_color, background_color) {
theme_bw()+
theme(text=element_text(size=10,
family="Source Serif Pro",
color = text_panel_color),
# Eliminates grids
panel.grid.minor = element_blank(),
panel.grid.major = element_blank(),
# Changes panel, plot and legend background
panel.background = element_rect(fill = background_color),
plot.background = element_rect(fill = background_color),
legend.background = element_rect(fill= background_color),
# Changes legend texts color
legend.title = element_text(color = text_panel_color),
# Changes plot border color and size
panel.border = element_rect(size = 1, color = text_panel_color),
# Changes color of axis texts
axis.text.x = element_text(color = text_panel_color),
axis.text.y = element_text(color = text_panel_color),
axis.title.x = element_text(color= text_panel_color),
axis.title.y = element_text(color= text_panel_color),
# Changes axis ticks color
axis.ticks.y = element_line(color = text_panel_color),
axis.ticks.x = element_line(color = text_panel_color),
)
}

Now you can simply add theme_tds() to your plot and specify your preferred colors. Here is an example:

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "white",
background_color = "#252525")
Image created by the author

Another example with white background and the font color of Towards Data Science:

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "#365A77",
background_color = "white")
Image created by the author

Finally, you could show the temperature trend with a LOESS (locally estimated scatterplot smoothing) smoother, as recommended by Claus O. Wilke in Chapter 14 of Fundamentals of Data Visualization. You can do that by adding a ggplot2 layer containing the element geom_smooth().

ggplot(data = df, aes(x = DATE, y = TAVG, color = TAVG))+
geom_point(size = 4, alpha = 0.7)+
geom_smooth(color = "#365A77", se = FALSE)+
scale_color_gradient(name = "ºC", low = "#4F88EC", high = "#ec4f88")+
ggtitle("Historical air temperature trend in Berkeley, CA")+
xlab("Year")+
ylab("Annual Mean Temperature [ºC]")+
theme_tds(text_panel_color = "#365A77",
background_color = "white")
Image created by the author

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Aprartificial intelligenceawarenessBrunoChangeClimateggplot2machine learningPonneRaiseTech News
Comments (0)
Add Comment