Techno Blender
Digitally Yours.

10 Examples to Master ggplot2: Line plots | by Soner Yıldırım | Oct, 2022

0 51


Plotting package for R

Photo by Johannes Andersson on Unsplash

How you deliver information is just as important as the information itself. Data visualization is an imperative tool for delivering information, storytelling, or analysis in data science.

The two biggest players in the data science ecosystem are Python and R. Both have numerous packages to expedite and simplify the common tasks.

In this article, we will go over 10 examples to learn how to create and customize line plots with ggplot2, which is a data visualization package in tidyverse, a collection of R packages for data science.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

We will use 3 different datasets in the examples. You can download them from the datasets repository on my GitHub page.

The first one is a CSV file that contains the Apple stock prices in 2022. Let’s first create a data table by using the fread function of the data table package.

library(data.table)
library(ggplot2)
apple <- fread("datasets/apple_stock_prices_2022.csv")head(apple)
# output
Date High Low Open Close Volume Adj Close
1: 2022-01-03 182.88 177.71 177.83 182.01 104487900 181.2599
2: 2022-01-04 182.94 179.12 182.63 179.70 99310400 178.9595
3: 2022-01-05 180.17 174.64 179.61 174.92 94537600 174.1992
4: 2022-01-06 175.30 171.64 172.70 172.00 96904000 171.2912
5: 2022-01-07 174.14 171.03 172.89 172.17 86709100 171.4605
6: 2022-01-10 172.50 168.17 169.08 172.19 106765600 171.4804

Example 1

We will create a simple line plot that shows the data on the x-axis and closing price on the y-axis.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line()

The ggplot function specifies the data and the mappings to x and y. The aes represents aesthetic mappings that describe how variables in the data are mapped to visual properties of geoms (e.g. geom_line).

The geom_line is the function to draw a line plot. Here is the output of the above code snippet:

(image by author)

Example 2

We can do many customizations on the appearance. Let’s change the line size, and color, which can be done in the geom_line function.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line(size = 1.2, color = "blue")
(image by author)

We can also make it a dashed line using the linestyle parameter (linestyle = “dashed”).

Example 3

The range on the y axis is automatically defined based on the values in the dataset. However, it can be changed using the ylim function.

The defaults are usually fine but we sometimes need to adjust them to keep a standard between multiple plots or have an axis starting from zero. Let’s set the range to 100–200.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line(color = "darkgreen") +
ylim(100, 200)
(image by author)

Example 4

We can add points to indicate the location of data points. This is helpful when we do not have a lot of data points (i.e. the density of observations is low).

In this example, we will use the measurements dataset.

measurements <- fread("datasets/measurements.csv")measurements
# output
day value
1: 1 80
2: 2 93
3: 3 94
4: 4 76
5: 5 63
6: 6 64
7: 8 85
8: 9 64
9: 10 95

Let’s create a line plot that shows the days on the x-axis and the values on the y-axis. We will also add points using the geom_point function.

ggplot(measurements, aes(x = day, y = value)) + 
geom_line() +
geom_point()
(image by author)

The points are placed where we have an observation in the dataset. For instance, the dataset does not have day 7 so it is not shown.

Example 5

In the previous example, the x values of the observations are not very clear. In order to show each day value on the x-axis, we can convert it to a factor and use the group parameter of the ggplot function.

measurements[, day := factor(day)]ggplot(measurements, aes(x = day, y = value, group = 1)) + 
geom_line() +
geom_point()
(image by author)

Example 6

We can have multiple lines on a line plot. We will use another dataset for this example, which contains the stock prices of Apple and Google in September, 2022.

stock <- fread("datasets/apple_google_stock_prices_092022.csv")head(stock)
# output
Date High Low Open Close Volume Adj Close Stock
1: 2022-09-01 158.42 154.67 156.64 157.96 74229900 157.96 AAPL
2: 2022-09-02 160.36 154.97 159.75 155.81 76905200 155.81 AAPL
3: 2022-09-06 157.09 153.69 156.47 154.53 73714800 154.53 AAPL
4: 2022-09-07 156.67 153.61 154.82 155.96 87449600 155.96 AAPL
5: 2022-09-08 156.36 152.68 154.64 154.46 84923800 154.46 AAPL
6: 2022-09-09 157.82 154.75 155.47 157.37 68028800 157.37 AAPL

The stock column indicates the name of the stock.

For each day, we have two different values, one for Apple and one for Google. Thus, if we plot the date and closing price as we did earlier, we end up having a plot as shown below:

(image by author)

We need to show the Apple and Google stock values with different lines. There are a few different ways of doing this. For instance, we can use the colour parameter and specify the column that differentiates Apple and Google.

ggplot(stock, aes(x = Date, y = Close, colour = Stock)) + 
geom_line()
(image by author)

Example 7

Let’s recreate the previous plot but using different line styles for Apple and Google. We just need to use the linetype parameter instead of the colour.

ggplot(stock, aes(x = Date, y = Close, linetype = Stock)) + 
geom_line(size = 1.2)
(image by author)

Example 8

In examples 4 and 5, we added points to mark observations in the dataset. The size and shape of these points can also be customized.

Let’s add points to the plot in example 7 and also change the value range for y-axis.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line() +
geom_point(size = 3, shape = 22, fill = "white") +
ylim(90, 200)
(image by author)

Example 9

We may want to change the default axis labels or add a title to a plot. Let’s make our plot more informative and appealing by doing so.

The labs function can be used for adding a title and subtitle. The axis labels can be changed using the xlab and ylab functions.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line(size = 1) +
labs(title = "Apple vs Google Stock Prices",
subtitle = "September, 2022") +
xlab("") +
ylab("Closing Price")
(image by author)

Example 10

We can add a theme to the plots, which allows for making a lot of customizations including:

  • Changing the font size and style of title and subtitle
  • Changing the font size and style of axis labels
  • Changing the font size, style, and orientation of tick marks

Let’s use these to customize the plot in the previous example.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line(size = 1) +
labs(title = "Apple vs Google Stock Prices",
subtitle = "September, 2022") +
xlab("") +
ylab("Closing Price") +
theme(
plot.title = element_text(size = 18, face = "bold.italic"),
plot.subtitle = element_text(size = 16, face = "bold.italic"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12)
)
(image by author)

Ggplot2 is a highly efficient library that offers a great amount of flexibility. I think it is similar to Matplotlib in terms of how we can customize pretty much anything on a plot.

The examples in this article cover most of what you need to create and customize line plots. There will be some edge cases where you need to do some further customizations but you can worry about them when it comes to that point.

Thank you for reading. Please let me know if you have any feedback.


Plotting package for R

Photo by Johannes Andersson on Unsplash

How you deliver information is just as important as the information itself. Data visualization is an imperative tool for delivering information, storytelling, or analysis in data science.

The two biggest players in the data science ecosystem are Python and R. Both have numerous packages to expedite and simplify the common tasks.

In this article, we will go over 10 examples to learn how to create and customize line plots with ggplot2, which is a data visualization package in tidyverse, a collection of R packages for data science.

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.

We will use 3 different datasets in the examples. You can download them from the datasets repository on my GitHub page.

The first one is a CSV file that contains the Apple stock prices in 2022. Let’s first create a data table by using the fread function of the data table package.

library(data.table)
library(ggplot2)
apple <- fread("datasets/apple_stock_prices_2022.csv")head(apple)
# output
Date High Low Open Close Volume Adj Close
1: 2022-01-03 182.88 177.71 177.83 182.01 104487900 181.2599
2: 2022-01-04 182.94 179.12 182.63 179.70 99310400 178.9595
3: 2022-01-05 180.17 174.64 179.61 174.92 94537600 174.1992
4: 2022-01-06 175.30 171.64 172.70 172.00 96904000 171.2912
5: 2022-01-07 174.14 171.03 172.89 172.17 86709100 171.4605
6: 2022-01-10 172.50 168.17 169.08 172.19 106765600 171.4804

Example 1

We will create a simple line plot that shows the data on the x-axis and closing price on the y-axis.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line()

The ggplot function specifies the data and the mappings to x and y. The aes represents aesthetic mappings that describe how variables in the data are mapped to visual properties of geoms (e.g. geom_line).

The geom_line is the function to draw a line plot. Here is the output of the above code snippet:

(image by author)

Example 2

We can do many customizations on the appearance. Let’s change the line size, and color, which can be done in the geom_line function.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line(size = 1.2, color = "blue")
(image by author)

We can also make it a dashed line using the linestyle parameter (linestyle = “dashed”).

Example 3

The range on the y axis is automatically defined based on the values in the dataset. However, it can be changed using the ylim function.

The defaults are usually fine but we sometimes need to adjust them to keep a standard between multiple plots or have an axis starting from zero. Let’s set the range to 100–200.

ggplot(apple, aes(x = Date, y = Close)) + 
geom_line(color = "darkgreen") +
ylim(100, 200)
(image by author)

Example 4

We can add points to indicate the location of data points. This is helpful when we do not have a lot of data points (i.e. the density of observations is low).

In this example, we will use the measurements dataset.

measurements <- fread("datasets/measurements.csv")measurements
# output
day value
1: 1 80
2: 2 93
3: 3 94
4: 4 76
5: 5 63
6: 6 64
7: 8 85
8: 9 64
9: 10 95

Let’s create a line plot that shows the days on the x-axis and the values on the y-axis. We will also add points using the geom_point function.

ggplot(measurements, aes(x = day, y = value)) + 
geom_line() +
geom_point()
(image by author)

The points are placed where we have an observation in the dataset. For instance, the dataset does not have day 7 so it is not shown.

Example 5

In the previous example, the x values of the observations are not very clear. In order to show each day value on the x-axis, we can convert it to a factor and use the group parameter of the ggplot function.

measurements[, day := factor(day)]ggplot(measurements, aes(x = day, y = value, group = 1)) + 
geom_line() +
geom_point()
(image by author)

Example 6

We can have multiple lines on a line plot. We will use another dataset for this example, which contains the stock prices of Apple and Google in September, 2022.

stock <- fread("datasets/apple_google_stock_prices_092022.csv")head(stock)
# output
Date High Low Open Close Volume Adj Close Stock
1: 2022-09-01 158.42 154.67 156.64 157.96 74229900 157.96 AAPL
2: 2022-09-02 160.36 154.97 159.75 155.81 76905200 155.81 AAPL
3: 2022-09-06 157.09 153.69 156.47 154.53 73714800 154.53 AAPL
4: 2022-09-07 156.67 153.61 154.82 155.96 87449600 155.96 AAPL
5: 2022-09-08 156.36 152.68 154.64 154.46 84923800 154.46 AAPL
6: 2022-09-09 157.82 154.75 155.47 157.37 68028800 157.37 AAPL

The stock column indicates the name of the stock.

For each day, we have two different values, one for Apple and one for Google. Thus, if we plot the date and closing price as we did earlier, we end up having a plot as shown below:

(image by author)

We need to show the Apple and Google stock values with different lines. There are a few different ways of doing this. For instance, we can use the colour parameter and specify the column that differentiates Apple and Google.

ggplot(stock, aes(x = Date, y = Close, colour = Stock)) + 
geom_line()
(image by author)

Example 7

Let’s recreate the previous plot but using different line styles for Apple and Google. We just need to use the linetype parameter instead of the colour.

ggplot(stock, aes(x = Date, y = Close, linetype = Stock)) + 
geom_line(size = 1.2)
(image by author)

Example 8

In examples 4 and 5, we added points to mark observations in the dataset. The size and shape of these points can also be customized.

Let’s add points to the plot in example 7 and also change the value range for y-axis.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line() +
geom_point(size = 3, shape = 22, fill = "white") +
ylim(90, 200)
(image by author)

Example 9

We may want to change the default axis labels or add a title to a plot. Let’s make our plot more informative and appealing by doing so.

The labs function can be used for adding a title and subtitle. The axis labels can be changed using the xlab and ylab functions.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line(size = 1) +
labs(title = "Apple vs Google Stock Prices",
subtitle = "September, 2022") +
xlab("") +
ylab("Closing Price")
(image by author)

Example 10

We can add a theme to the plots, which allows for making a lot of customizations including:

  • Changing the font size and style of title and subtitle
  • Changing the font size and style of axis labels
  • Changing the font size, style, and orientation of tick marks

Let’s use these to customize the plot in the previous example.

ggplot(stock, aes(x = Date, y = Close, color = Stock)) + 
geom_line(size = 1) +
labs(title = "Apple vs Google Stock Prices",
subtitle = "September, 2022") +
xlab("") +
ylab("Closing Price") +
theme(
plot.title = element_text(size = 18, face = "bold.italic"),
plot.subtitle = element_text(size = 16, face = "bold.italic"),
axis.title.y = element_text(size = 14, face = "bold"),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12)
)
(image by author)

Ggplot2 is a highly efficient library that offers a great amount of flexibility. I think it is similar to Matplotlib in terms of how we can customize pretty much anything on a plot.

The examples in this article cover most of what you need to create and customize line plots. There will be some edge cases where you need to do some further customizations but you can worry about them when it comes to that point.

Thank you for reading. Please let me know if you have any feedback.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment