Customer Satisfaction Measurement with N-gram and Sentiment Analysis | by Petr Korab

Product reviews are an excellent source of information for qualified management decisions. Learn more about the right text mining techniques.

Photo by Freepik on Freepik

Happy customers drive company growth. The five-word sentence explains everything about why we do our best to maximize customer satisfaction. Product reviews are one of the major data sources that large companies like Amazon and Apple, middle-sized exporters including Lentiamo, and local companies running their Facebook pages collect. Reviews are typically collected repeatedly over time, and factors like quality shifts, marketing communications, and customer care friendliness impact the sentiment expressed by customers.

Note: Image by author, based on the review of Karim (2011), Baker and Wurgler (2006), Merrin et al. (2013), and Eachempat et al. (2022)

The Business Intelligence (BI) role should be to analyze product reviews, identify potential problems, and develop hypotheses to solve them. In the next stage, these recommendations are scrutinized by other departments based on the company structure. This article will explain more closely the analytics of customer satisfaction measurement with product reviews data.

The-end-to end process includes:

How to make an exploratory analysis of time-series product reviews
How to quickly evaluate sentiment in product reviews over time
How to display the most frequent satisfaction factors over time

We’ll work in Python, which most BI and data analysts routinely use.

Getting unscraped product reviews data with a flexible data license is generally difficult. Synthetic data from Fake Reviews Dataset distributed with Attribution 4.0 International license is an excellent option in this case.

The data looks like this:

Image 1. Fake Reviews Dataset

The text column contains product reviews, and period marks the review date. The subset we’ll work with contains 3848 reviews of clothing, shoes, and jewelry products.

EDA in this type of dataset should discover the completeness of the dataset for each period of interest and the presence of noise that does not bring valuable information – digits and special characters. We try to avoid the situation of highly imbalanced datasets containing very few reviews in some periods and noisy datasets with too many digits and special characters (/, @, &,;, ? etc.).

3.1. Checking for data completeness

First, let’s check the data for completeness for each period. We make annual comparisons and therefore summarise the reviews for each year:

import pandas as pd# Calculate review frequencies by year
data['year'] = pd.DatetimeIndex(data['period']).year
rows_count = data.groupby('year', as_index=False).year.value_counts()
rows_count.columns=['year','reviews']
import matplotlib.pyplot as plt
# Generate a line plot
rows_count.plot.line(x='year',y = 'reviews')
plt.show()

A simple line pot shows the yearly review frequencies. It is not necessary to make any graph formatting here:

Image 2. Yearly product review frequencies. Image by author.

The dataset is not largely imbalanced. We have about 200 and more reviews per year, which makes a solid dataset for sentient and n-gram analyses.

3.2. Calculating letter-to-other characters ratio

Next, let’s check if the data does not consist mainly of numbers and special characters, which could bias later text mining procedures.

import re# Convert product reviews to a string
text = pd.Series.to_string(data['text'], index = False)
# Remove newline characters
text = re.sub('\n', '', text)
# Calculate sum of numbers, letters, and spaces
numbers = sum(c.isdigit() for c in text)
letters = sum(c.isalpha() for c in text)
spaces  = sum(c.isspace() for c in text)
others  = len(text) - numbers - letters - spaces

We calculate cleanness and dirtiness (the opposite) metrics that show the ratios of letters to other characters (numbers, spaces, and special characters):

# Calculate metrics
dirtiness = ((numbers + others) / len(text)) * 100
cleanness = ((letters + spaces) / len(text)) * 100print(dirtiness)
print(cleanness)

The print output is:

dirtiness is 8.77 %
cleanness is 91.22 %

We have around 9 % of digits and special characters in the data. This volume of noise should not bias sentiment analysis in the later stage.

Text mining attempts to find out: (1) how the customers’ sentiment developed over time and (2) which factors contributed to changes in customer satisfaction. We’ll use Arabica, a Python library for time-series text mining, to explore both.

4.1. Sentiment analysis

Arabica offerscoffee_break module for sentiment and breakpoint analysis. Its documentation provides more details about the models and methodology of sentiment evaluation.

This code cleans data from punctuation and numbers, calculates sentiment for each review, aggregates the sentiment by year, and identifies two major break points in the sentiment time series:

from arabica import coffee_breakcoffee_break(text = data['text'],
time = data['period'],
date_format = 'us',   # Read dates in US format
preprocess = True,    # Clean data - digits and punctuation
n_breaks = 2,         # No structural break analysis
time_freq = 'Y')      # Yearly aggregation

The line plot with sentiment and breakpoints:

Image 3. Sentiment analysis with breakpoints. Image by author.

We can see fluctuations in sentiment over time. These changes are not rapid as sentiment varies [0.6: 0.7]. The higher value of sentiment, the better customers perceive the products (and the opposite).

In 1991 and 2000, there were positive and negative breakpoints in sentiment. In the next stage, let’s check what caused these shifts.

4.2. Factors driving customer satisfaction

We’ll use a heatmap of n-gram-frequencies from the cappuccino module to visually derive the factors that influenced customers’ sentiment. Read the documentation for more technical details.

This code plots a heatmap of the ten most frequent bigrams (i.e., two consecutive words) in each year. The data pre-processing includes cleaning from English stopwords, numbers, and punctuation:

from arabica import cappuccinocappuccino(text = data['text'],
time = data['period'],
date_format = 'us',      # Uses US-style date format to parse dates
plot = 'heatmap',
ngram = 2,               # N-gram size, 1 = unigram, 2 = bigram
time_freq = 'Y',         # Aggregation period, 'M' = monthly, 'Y' = yearly
max_words = 10,          # Displays 10 most frequent unigrams for each period
stopwords = ['english'], # Remove English stopwords
skip = None,             # Remove additional strings
numbers = True,          # Remove numbers
lower_case = True,       # Lowercase text before cleaning and frequency analysis
punct = True)            # Remove punctuation

The bigram heatmap:

Image 4. Bigram heatmap. Image by author.

Bigram frequencies indicate that:

2000s drop (missing “would recommend”, “would definitely”, lower or missing frequencies of “fits perfectly”, “fits perfect”, “fit great”, and “fit well”, in the top 10), and high frequencies of “wide foot” and “size fit” suggest that we might be selling products to people that do not fit their size of foot. Note that we removed stop words such as “don’t”.

This type of text-mining analytics should help make qualified data-informed decisions and improve the quality of our products and services. Based on the analytical results, we determine a set of hypotheses about what might be wrong and where is space for improvement.

From the text mining analysis in this article, we can see potential problems with product quality. The products fit worse our customers’ current needs. Receiving the reviews in 2001, the analysts should formulate a set of problem hypotheses such as:

The new shoes don’t fit well.
We don’t sell wide-foot shoes, but customers demand them.
There’s a quality problem in the production of new products.

Many other potential problem hypotheses arise since you know your products better than anyone else. Other departments (Marketing, Customer Care, Logistics, ..) should find out where the problem is, then react with specific improvements, and fix the problem. The response, of course, depends on the company’s size and structure and other firm-specific factors.

The jupyter notebook with the code and data is on my GitHub.

PS: You can subscribe to my email list to get notified every time I write a new article. And if you are not a Medium member yet you can join here.

Baker, M., Wurgler, J., 2006. Investor sentiment and the cross-section of stock returns. Journal of Finance 61 (4).

Eachempati, P., Srivastava, P. R., Kumar A., Munoz, J., Dursun D., 2022. Can customer sentiment impact firm value? An integrated text mining approach. Technological Forecasting & Social Change 174 (1).

Karim, B., 2011. Corporate name change and shareholder wealth effect: empirical evidence in the French stock market. Journal of Asset Management 12 (3).

Merrin, R. P., Hoffmann, A. O., Pennings, J. M., 2013. Customer satisfaction as a buffer against sentimental stock-price corrections. Marketing Letters 24 (1).

Product reviews are an excellent source of information for qualified management decisions. Learn more about the right text mining techniques.

Photo by Freepik on Freepik

Note: Image by author, based on the review of Karim (2011), Baker and Wurgler (2006), Merrin et al. (2013), and Eachempat et al. (2022)

The-end-to end process includes:

How to make an exploratory analysis of time-series product reviews
How to quickly evaluate sentiment in product reviews over time
How to display the most frequent satisfaction factors over time

We’ll work in Python, which most BI and data analysts routinely use.

The data looks like this:

Image 1. Fake Reviews Dataset

The text column contains product reviews, and period marks the review date. The subset we’ll work with contains 3848 reviews of clothing, shoes, and jewelry products.

3.1. Checking for data completeness

First, let’s check the data for completeness for each period. We make annual comparisons and therefore summarise the reviews for each year:

import pandas as pd# Calculate review frequencies by year
data['year'] = pd.DatetimeIndex(data['period']).year
rows_count = data.groupby('year', as_index=False).year.value_counts()
rows_count.columns=['year','reviews']
import matplotlib.pyplot as plt
# Generate a line plot
rows_count.plot.line(x='year',y = 'reviews')
plt.show()

A simple line pot shows the yearly review frequencies. It is not necessary to make any graph formatting here:

Image 2. Yearly product review frequencies. Image by author.

The dataset is not largely imbalanced. We have about 200 and more reviews per year, which makes a solid dataset for sentient and n-gram analyses.

3.2. Calculating letter-to-other characters ratio

Next, let’s check if the data does not consist mainly of numbers and special characters, which could bias later text mining procedures.

import re# Convert product reviews to a string
text = pd.Series.to_string(data['text'], index = False)
# Remove newline characters
text = re.sub('\n', '', text)
# Calculate sum of numbers, letters, and spaces
numbers = sum(c.isdigit() for c in text)
letters = sum(c.isalpha() for c in text)
spaces  = sum(c.isspace() for c in text)
others  = len(text) - numbers - letters - spaces

We calculate cleanness and dirtiness (the opposite) metrics that show the ratios of letters to other characters (numbers, spaces, and special characters):

# Calculate metrics
dirtiness = ((numbers + others) / len(text)) * 100
cleanness = ((letters + spaces) / len(text)) * 100print(dirtiness)
print(cleanness)

The print output is:

dirtiness is 8.77 %
cleanness is 91.22 %

We have around 9 % of digits and special characters in the data. This volume of noise should not bias sentiment analysis in the later stage.

4.1. Sentiment analysis

Arabica offerscoffee_break module for sentiment and breakpoint analysis. Its documentation provides more details about the models and methodology of sentiment evaluation.

This code cleans data from punctuation and numbers, calculates sentiment for each review, aggregates the sentiment by year, and identifies two major break points in the sentiment time series:

from arabica import coffee_breakcoffee_break(text = data['text'],
time = data['period'],
date_format = 'us',   # Read dates in US format
preprocess = True,    # Clean data - digits and punctuation
n_breaks = 2,         # No structural break analysis
time_freq = 'Y')      # Yearly aggregation

The line plot with sentiment and breakpoints:

Image 3. Sentiment analysis with breakpoints. Image by author.

In 1991 and 2000, there were positive and negative breakpoints in sentiment. In the next stage, let’s check what caused these shifts.

4.2. Factors driving customer satisfaction

We’ll use a heatmap of n-gram-frequencies from the cappuccino module to visually derive the factors that influenced customers’ sentiment. Read the documentation for more technical details.

This code plots a heatmap of the ten most frequent bigrams (i.e., two consecutive words) in each year. The data pre-processing includes cleaning from English stopwords, numbers, and punctuation:

from arabica import cappuccinocappuccino(text = data['text'],
time = data['period'],
date_format = 'us',      # Uses US-style date format to parse dates
plot = 'heatmap',
ngram = 2,               # N-gram size, 1 = unigram, 2 = bigram
time_freq = 'Y',         # Aggregation period, 'M' = monthly, 'Y' = yearly
max_words = 10,          # Displays 10 most frequent unigrams for each period
stopwords = ['english'], # Remove English stopwords
skip = None,             # Remove additional strings
numbers = True,          # Remove numbers
lower_case = True,       # Lowercase text before cleaning and frequency analysis
punct = True)            # Remove punctuation

The bigram heatmap:

Image 4. Bigram heatmap. Image by author.

Bigram frequencies indicate that:

2000s drop (missing “would recommend”, “would definitely”, lower or missing frequencies of “fits perfectly”, “fits perfect”, “fit great”, and “fit well”, in the top 10), and high frequencies of “wide foot” and “size fit” suggest that we might be selling products to people that do not fit their size of foot. Note that we removed stop words such as “don’t”.

The new shoes don’t fit well.
We don’t sell wide-foot shoes, but customers demand them.
There’s a quality problem in the production of new products.

The jupyter notebook with the code and data is on my GitHub.

PS: You can subscribe to my email list to get notified every time I write a new article. And if you are not a Medium member yet you can join here.

Baker, M., Wurgler, J., 2006. Investor sentiment and the cross-section of stock returns. Journal of Finance 61 (4).

Karim, B., 2011. Corporate name change and shareholder wealth effect: empirical evidence in the French stock market. Journal of Asset Management 12 (3).

Merrin, R. P., Hoffmann, A. O., Pennings, J. M., 2013. Customer satisfaction as a buffer against sentimental stock-price corrections. Marketing Letters 24 (1).

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.

Customer Satisfaction Measurement with N-gram and Sentiment Analysis | by Petr Korab | Apr, 2023

Product reviews are an excellent source of information for qualified management decisions. Learn more about the right text mining techniques.

4.1. Sentiment analysis

4.2. Factors driving customer satisfaction

Product reviews are an excellent source of information for qualified management decisions. Learn more about the right text mining techniques.

4.1. Sentiment analysis

4.2. Factors driving customer satisfaction

Related Posts