How to Reverse Code an Interval Scale in R | by Rory Spanton | Nov, 2022

By Jessie Hobb On Nov 1, 2022

An easy way to clean questionnaire and measurement data

If you’re using R for data analysis, chances are you might run into a situation where you’re working with interval data. Often, it’s useful to be able to reverse data on an interval scale. Here’s how to do it, with step-by-step examples. If you want all the code featured in this article in one script, check out the GitHub gist at the end.

Imagine you’re a psychologist who has collected data about people’s experiences of anxiety. You gave people surveys asking them how anxious they feel right now, how often they experience worrying thoughts on a daily basis, and so on. You want to analyse those data to give each of your participants a score for their anxiety. But, the questionnaire you used had a few reverse-worded questions where the usual response scale applies backwards. You need to reverse the coding for these answers to get the right scores when you add up responses for each participant.

Fortunately, there’s a simple way of doing this kind of analysis in R. To demonstrate, let’s create some data for this example.

Let’s say each item on our survey uses a 7–point Likert scale, like this:

“1 = Strongly Disagree; 2 = Disagree; 3 = Somewhat Disagree; 4 = Neither Agree nor Disagree; 5 = Somewhat Agree; 6 = Agree; 7 = Strongly Agree”

We can create a dataset by sampling values randomly from this response scale, as done below.

You can see what the dataset looks like below. The first column, worried_thoughts, contains participant responses to the statement “I often have intrusive worrying thoughts”. The second column, anxiety_effects, contains responses to the statement “Anxiety rarely impacts my everyday life”.

Because of the wording of the second statement, a high numeric score would indicate low feelings of anxiety. This means that the responses to this statement must be reverse-coded to be compared with those from other questions that are normally worded, where a high numeric score means high anxiety.

Luckily, there’s an easy formula for this.

To reverse values on an interval scale, you take the minimum value of the scale your variable is on, subtract the value you’re reversing, and then add the maximum value from your scale.

This formula works with a response scale that starts from any number, positive or negative. It also works with scales that have any size interval, so long as that interval is consistent.

Here’s an example of a couple of ways you can apply this formula to reverse numeric interval values in a dataset.

In this code, we first define the numeric scale used in the question. This is just a vector of the numbers 1 to 7, stored in the variable myscale.

We can then use the values in myscale in our reverse coding formula. In the tidyverse example, this formula is expressed as min(myscale) — response + max(myscale) where response is a value from the column you want to recode.

It’s really easy and you can apply it with Base R or Tidyverse as shown. In both examples, the results of the reverse coding operation are stored in the new column response_reversed.

What about situations when you need to flip the scale of several questions in your dataset? You can do this with the same formula as before with a little added tidyverse magic.

For this example, let’s create some more example data, this time with multiple questions.

This data is in long format, meaning there are multiple rows that repeat in the first column. You can read more about the difference between long and wide data here if you’re interested. If your questionnaire data isn’t in long format, you can convert it with the pivot_longer function in R.

To reverse code some questions in this dataset while keeping others unchanged, we first need to select the questions we’d like to recode. We can do this by creating a vector containing the names of the questions we’re recoding. In this case, we want to reverse the scoring of questions 2 and 3.

Now, we can use this vector to selectively apply our recoding operation. We can do this by using case_when within mutate, one of my favourite tidyverse functions.

This might be a lot to process at first, so let’s break it down a bit.

This code uses mutate, a function which manipulates the values of existing columns or creates new columns. I used it in the last example too, but in this case, I’ve set it to assign the results of our operation to the response column that’s already in the data. This means it’ll change some of the existing values in that column.

The case_when function then reverses the values in the response column, but only for the questions we want to recode. It works by checking each row of the data against a condition. Here, it checks whether the value of the question column is in the vector of question names we defined earlier.

If so, it applies the scale reversal formula to the value in the response column. You can see this after the tilde (~) symbol in the case_when line. If a given row doesn’t contain a question that we selected to recode, case_when returns the original value of the response column unchanged. It does this with the TRUE ~ as.integer(response) command.

The result of this code is that the values in response are reversed for the questions we selected, and unchanged for those we didn’t.

Scoring the questionnaire is now straightforward. Again, we use tidyverse functions, this time to sum up the responses to get scores for each participant.