Techno Blender
Digitally Yours.

Introduction to ggplot2 in R. Let’s learn how GGplot2, one of the… | by Ivo Bernardo | Sep, 2022

0 369


Let’s learn how GGplot2, one of the most famous libraries in R, works

Photo by Launde Morel @unsplash.com

[Disclaimer: This post contains some affiliate links to my Udemy Course]

ggplot2, an open-source data visualization package, is one of the most famous R libraries. First released 15 years ago, it’s recognized as one of the most flexible and handy visualization libraries available in the R programming language. It’s a huge upgrade on R base plotting ability as it enables several extensions when building plots, being very flexible and tweakeable.

If you’ve never worked with visualization libraries oriented to code before, ggplot2 may be a bit challenging to work with. First, it’s built in a modular way where you can stack several functions by adding elements using concatenation. Second, it contains tons of tweaks and parameters one can change inside each module, something than can be a bit confusing, at first.

Nevertheless, I believe that as soon as you learn the basics, you’ll be able to build a lot of cool plots, in no time! In this post, my goal is to give you a guide to build your R plots using ggplot2. We’ll understand the main components that make up a ggplot2 plot and learn how to tweak them with R code. Let’s start!

As I’ve said, ggplot2 is a “modular” library and most plots contain the following layers:

  • The plot base that maps the data and the axis.
  • The type of plot you want to do.
  • Specific add-ons you may want to include in your plot.

When I want to build something in ggplot2, I always try to map my “mental plot” to these 3 ingredients. Starting with the first one, the base that states:

  • What’s the dataset that we are going to plot;
  • How will we map the variables to the different axis;

To use a practical example, let’s load the famous iris dataset to build our first plot:

library(ggplot2)
iris_df <- iris

I’m storing iris inside the iris_df just to have this object as a static data frame in the environment. To build our ggplot2 base, we can use a combination of the ggplot2 function with the dataand mappingarguments:

ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)

What to these arguments do? Let’s break them down:

  • data states the dataset we will be using to fill our plot.
  • mapping relies on the aes function to map the x and y columns to their respective axis. In our case, Petal.Length is going to be the x and Petal.Width is going to be the y.

When we execute the code above, the following window pops up:

iris dataset example plot — Image by Author

With the ggplot function, we lay the base of our plot and map the x and y axis. An important part of ggplot2 is that it’s only suited for 2-dimensional plots — if you need to do 3D plots, check plotly.

On the mapping argument, we are passing the variables that will take the place of the x and y axis. Notice that R already bounds both axis to the maximum and minimum values available in the data. For instance, if we check max(iris_df$Petal.Length) , we get 6.9 and this is exactly the upper bound of our x axis!

With the base laid out, we need to define the plot type! Are we going to do a scatter, a line or bar plot? Let’s pass that information to R using ggplot2!

So, ggplot2 is a library where we can build plots using layers. After this statement, two questions come up:

  • How can I add a new layer to the plot?
  • What types of layers can I add?

One of the most important layers we can add will state the type of plot we want to do. Adding a new layer is extremely easy — just by using + we will be able to add a new layer to the existing base!

For instance, let’s imagine that we would like our plot to be a scatter plot. To do this, we just need to add a geom_point() layer to the plot:

ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
) + geom_point()
Scatter Plot in ggplot2 — Image by Author

By adding geom_point()to our current plot, we will let R know that we are interested in building a scatter plot. With the code above we have two layers on our plot — the base that consists of the ggplot function and the type of plot that consists of geom_point() .

But, imagine that we would like to do a line plot instead. Do we add another layer to the plot?

No! We just replace the module we’ve added and add a geom_line() layer instead:

Line Plot in ggplot2 — Image by Author

Of course, the results aren’t very meaningful with this dataset — nevertheless, see how easy it was to change from a scatter to a line plot.

Adding to this flexibility, we can even pass several arguments inside the new layer. This will tweak some graphical aspects of our plot — for instance, the color of our points:

Scatter Plot with Green Color — Image by Author
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
) + geom_point(color = 'darkgreen')

By passing colorinside the geom_point we let R know that our points should have a specific color. There are multiple arguments we can call inside the type of plot layer — here’s a summary of some of them for the geom_point().

Our plot is already pretty cool and ready to be interpreted. But… there may be a lot of tweaks that we could do. How do you think we can do those small tweaks? With new layers, of course!

For instance, let’s imagine that I am not happy with the amount of labels I have on the x-axis. To tweak them, I can use a new layer: the scale_x_countinuous!

(
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)
+ geom_point(color = 'darkgreen')
+ scale_x_continuous(n.breaks=10)
)
Scatter Plot with extra Labels on the X-Axis — Image by Author

Using this layer will break my x-axis into more buckets than the original. In this case, I’m choosing 10 breaks in the n.breaks argument inside the new layer. Am I restricted to adding only a single add-on layer?

Of course not! Let’s also add more breaks to the y-axis with scale_y_countinuous :

(
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)
+ geom_point(color='darkgreen')
+ scale_x_continuous(n.breaks=10)
+ scale_y_continuous(n.breaks=10)
)
Scatter Plot with extra Labels on X-Axis and Y-Axis — Image by Author

Notice that the logic is always the same — to add a new layer, we can provide new functions and concatenate them with the existing plot by using + .

ggplot2 is also one of the most well documented libraries in R. There are literally hundreds of layers available in the library and you can find documentation about them really quickly by visiting the library’s website.

Before we finish, there’s another thing that I would like to tell you. We don’t need to build a huge block of code with 20 layers in a single pass! If it matches your use case and you want to tweak your plot as you go, you can save your plot state in a variable and add layers to it— let’s see that, next!

One of the cool things we can do with ggplot2 is that we can store our plot for later use. As an example, I can save my base as a variable and use the variable name as alias to add layers later:

example_plot <- ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)

Right now, my base is saved on example_plot — calling this variable will yield the base of the plot, as expected:

ggplot2 base using example_plot variable name — Image by Author

The cool part is that we can treat example_plot alias as the first layer (base). If I want to do a scatter plot, I can just:

example_plot + geom_point()
Scatter plot triggered by example_plot + geom_point() — Image by Author

What if I want to change to a line plot? Let’s see:

example_plot + geom_line()
Line plot triggered by example_plot + geom_line() — Image by Author

Cool, right? Our code is much cleaner and the ability to store ggplot2 layers in variables is a great feature. This prevents that you have plots produced with 20+ lines of code which might be a bit cumbersome to interpret and debug.

ggplot2 is considered one of the top libraries for Data Science in R. Learning it will give you a great tool to build plots inside R programming and although base R may be sufficient for several tasks, it normally produce plots that are too simple, basic and that don’t look “professional”.

In this post, my goal was to explain you the basics of ggplot2. As next steps, check other layers such as geom_bar or geom_histogram or experiment with other available parameters. These ones should give you a good overview on how to work with the library for specific use cases and what particular things on the plot’s aspect you can change.

I’ve set up an introduction to R and a Bootcamp on learning Data Science on Udemy. Both courses are tailored for beginners and I would love to have you around!

R Programming Course for Absolute Beginners — Image by Author


Let’s learn how GGplot2, one of the most famous libraries in R, works

Photo by Launde Morel @unsplash.com

[Disclaimer: This post contains some affiliate links to my Udemy Course]

ggplot2, an open-source data visualization package, is one of the most famous R libraries. First released 15 years ago, it’s recognized as one of the most flexible and handy visualization libraries available in the R programming language. It’s a huge upgrade on R base plotting ability as it enables several extensions when building plots, being very flexible and tweakeable.

If you’ve never worked with visualization libraries oriented to code before, ggplot2 may be a bit challenging to work with. First, it’s built in a modular way where you can stack several functions by adding elements using concatenation. Second, it contains tons of tweaks and parameters one can change inside each module, something than can be a bit confusing, at first.

Nevertheless, I believe that as soon as you learn the basics, you’ll be able to build a lot of cool plots, in no time! In this post, my goal is to give you a guide to build your R plots using ggplot2. We’ll understand the main components that make up a ggplot2 plot and learn how to tweak them with R code. Let’s start!

As I’ve said, ggplot2 is a “modular” library and most plots contain the following layers:

  • The plot base that maps the data and the axis.
  • The type of plot you want to do.
  • Specific add-ons you may want to include in your plot.

When I want to build something in ggplot2, I always try to map my “mental plot” to these 3 ingredients. Starting with the first one, the base that states:

  • What’s the dataset that we are going to plot;
  • How will we map the variables to the different axis;

To use a practical example, let’s load the famous iris dataset to build our first plot:

library(ggplot2)
iris_df <- iris

I’m storing iris inside the iris_df just to have this object as a static data frame in the environment. To build our ggplot2 base, we can use a combination of the ggplot2 function with the dataand mappingarguments:

ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)

What to these arguments do? Let’s break them down:

  • data states the dataset we will be using to fill our plot.
  • mapping relies on the aes function to map the x and y columns to their respective axis. In our case, Petal.Length is going to be the x and Petal.Width is going to be the y.

When we execute the code above, the following window pops up:

iris dataset example plot — Image by Author

With the ggplot function, we lay the base of our plot and map the x and y axis. An important part of ggplot2 is that it’s only suited for 2-dimensional plots — if you need to do 3D plots, check plotly.

On the mapping argument, we are passing the variables that will take the place of the x and y axis. Notice that R already bounds both axis to the maximum and minimum values available in the data. For instance, if we check max(iris_df$Petal.Length) , we get 6.9 and this is exactly the upper bound of our x axis!

With the base laid out, we need to define the plot type! Are we going to do a scatter, a line or bar plot? Let’s pass that information to R using ggplot2!

So, ggplot2 is a library where we can build plots using layers. After this statement, two questions come up:

  • How can I add a new layer to the plot?
  • What types of layers can I add?

One of the most important layers we can add will state the type of plot we want to do. Adding a new layer is extremely easy — just by using + we will be able to add a new layer to the existing base!

For instance, let’s imagine that we would like our plot to be a scatter plot. To do this, we just need to add a geom_point() layer to the plot:

ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
) + geom_point()
Scatter Plot in ggplot2 — Image by Author

By adding geom_point()to our current plot, we will let R know that we are interested in building a scatter plot. With the code above we have two layers on our plot — the base that consists of the ggplot function and the type of plot that consists of geom_point() .

But, imagine that we would like to do a line plot instead. Do we add another layer to the plot?

No! We just replace the module we’ve added and add a geom_line() layer instead:

Line Plot in ggplot2 — Image by Author

Of course, the results aren’t very meaningful with this dataset — nevertheless, see how easy it was to change from a scatter to a line plot.

Adding to this flexibility, we can even pass several arguments inside the new layer. This will tweak some graphical aspects of our plot — for instance, the color of our points:

Scatter Plot with Green Color — Image by Author
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
) + geom_point(color = 'darkgreen')

By passing colorinside the geom_point we let R know that our points should have a specific color. There are multiple arguments we can call inside the type of plot layer — here’s a summary of some of them for the geom_point().

Our plot is already pretty cool and ready to be interpreted. But… there may be a lot of tweaks that we could do. How do you think we can do those small tweaks? With new layers, of course!

For instance, let’s imagine that I am not happy with the amount of labels I have on the x-axis. To tweak them, I can use a new layer: the scale_x_countinuous!

(
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)
+ geom_point(color = 'darkgreen')
+ scale_x_continuous(n.breaks=10)
)
Scatter Plot with extra Labels on the X-Axis — Image by Author

Using this layer will break my x-axis into more buckets than the original. In this case, I’m choosing 10 breaks in the n.breaks argument inside the new layer. Am I restricted to adding only a single add-on layer?

Of course not! Let’s also add more breaks to the y-axis with scale_y_countinuous :

(
ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)
+ geom_point(color='darkgreen')
+ scale_x_continuous(n.breaks=10)
+ scale_y_continuous(n.breaks=10)
)
Scatter Plot with extra Labels on X-Axis and Y-Axis — Image by Author

Notice that the logic is always the same — to add a new layer, we can provide new functions and concatenate them with the existing plot by using + .

ggplot2 is also one of the most well documented libraries in R. There are literally hundreds of layers available in the library and you can find documentation about them really quickly by visiting the library’s website.

Before we finish, there’s another thing that I would like to tell you. We don’t need to build a huge block of code with 20 layers in a single pass! If it matches your use case and you want to tweak your plot as you go, you can save your plot state in a variable and add layers to it— let’s see that, next!

One of the cool things we can do with ggplot2 is that we can store our plot for later use. As an example, I can save my base as a variable and use the variable name as alias to add layers later:

example_plot <- ggplot(
data = iris_df,
mapping = aes(x = Petal.Length, y= Petal.Width)
)

Right now, my base is saved on example_plot — calling this variable will yield the base of the plot, as expected:

ggplot2 base using example_plot variable name — Image by Author

The cool part is that we can treat example_plot alias as the first layer (base). If I want to do a scatter plot, I can just:

example_plot + geom_point()
Scatter plot triggered by example_plot + geom_point() — Image by Author

What if I want to change to a line plot? Let’s see:

example_plot + geom_line()
Line plot triggered by example_plot + geom_line() — Image by Author

Cool, right? Our code is much cleaner and the ability to store ggplot2 layers in variables is a great feature. This prevents that you have plots produced with 20+ lines of code which might be a bit cumbersome to interpret and debug.

ggplot2 is considered one of the top libraries for Data Science in R. Learning it will give you a great tool to build plots inside R programming and although base R may be sufficient for several tasks, it normally produce plots that are too simple, basic and that don’t look “professional”.

In this post, my goal was to explain you the basics of ggplot2. As next steps, check other layers such as geom_bar or geom_histogram or experiment with other available parameters. These ones should give you a good overview on how to work with the library for specific use cases and what particular things on the plot’s aspect you can change.

I’ve set up an introduction to R and a Bootcamp on learning Data Science on Udemy. Both courses are tailored for beginners and I would love to have you around!

R Programming Course for Absolute Beginners — Image by Author

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment