A Comprehensive Guide to Moderating Sensitive Audio Content | by Avi Chawla | Dec, 2022
Content Moderation Made Simple
· Motivation
· Content Moderation in Audio Files
· Results
· Conclusion
User-generated content is screened and monitored by an online platform based on rules and policies relevant to that platform.
To put it another way, when a user submits content to a website, it typically goes through a screening process (also called the moderation process) to ensure that it adheres to the website’s rules, and is not inappropriate, harassing, illegal, etc.
When texts are exchanged online or on social media, sensitive content can be detected using content moderation models that are typically AI-driven.
In addition to transcribing information from audio or video sources, some of the finest Speech-to-Text APIs include content moderation.
Topics relating to drugs, alcohol, violence, delicate social concerns, hate speech, and more, are frequently among the sensitive content that content moderation APIs challenge to tackle.
Therefore, in this post, I will demonstrate how you can use the AssemblyAI API to detect sensitive content in an audio file.
Let’s begin 🚀!
With the help of AssemblyAI, you can moderate and predict mentions of any severity, drug abuse, hate speech, and more in the given audio/video file.
The image below depicts the transcription workflow of AssemblyAI.
Below is the step-by-step tutorial on content moderation on Audio files using AssemblyAI.
The transcription API will perform speech-to-text conversion and detect the sensitive content (if any) in the given file. These include mentions of accidents, disasters, hate speech, gambling, etc.
Step 1: Get the Token
Firstly, we need to get AssemblyAI’s API Token to access the services.
Now that we are ready with the API token, let’s define the headers.
Step 2: Upload the File
Next, we will upload the input audio file to the hosting service of AssemblyAI, which will return a URL that will be used for further requests.
Step 3: Transcription
Once we receive the URL from AssemblyAI, we can proceed with the transcription that will also detect the sensitive content.
Here, we will specify the content_safety
parameter as True
. This will invoke the content moderation models.
Step 4: Fetch Results
A GET request using the id
returned in the POST request is required as the last step. We will make repeated GET requests until the status of the response is marked as ‘completed
’ or ‘error
’.
Step 5: Storing the Output
The response from the transcription services is then stored in a text file.
Now, let’s interpret the content moderation output.
The results for content moderation are available under the content_safety_labels
key of the JSON response received from AssemblyAI.
The outer text
field contains the audio file’s text transcription.
Moreover, as shown in the output above, the results of the content safety detection will be added to the content_safety_labels
key.
The description of the keys with the content_safety_labels
key is given below:
results
: This represents a list of segments of the audio transcription that the model classified as sensitive content.results.text
: This field contains the text transcription which triggered the content moderation model.results.labels
: This field contains all the labels corresponding to sentences detected as the sensitive content. The confidence and severity metrics are also included with each JSON object in this list.summary
: This field contains the confidence scores for every label predicted results in the entire audio file.severity_score_summary
: Describes the total impact of each resultant predicted label on the whole audio file.
Each projected label will include its confidence
and severity
values — both of which are different.
The severity
value depicts how severe the flagged content is on a scale of 0–1
. On the other hand, the confidence
score reveals the model’s confidence while predicting the output label.
Content Moderation Made Simple
· Motivation
· Content Moderation in Audio Files
· Results
· Conclusion
User-generated content is screened and monitored by an online platform based on rules and policies relevant to that platform.
To put it another way, when a user submits content to a website, it typically goes through a screening process (also called the moderation process) to ensure that it adheres to the website’s rules, and is not inappropriate, harassing, illegal, etc.
When texts are exchanged online or on social media, sensitive content can be detected using content moderation models that are typically AI-driven.
In addition to transcribing information from audio or video sources, some of the finest Speech-to-Text APIs include content moderation.
Topics relating to drugs, alcohol, violence, delicate social concerns, hate speech, and more, are frequently among the sensitive content that content moderation APIs challenge to tackle.
Therefore, in this post, I will demonstrate how you can use the AssemblyAI API to detect sensitive content in an audio file.
Let’s begin 🚀!
With the help of AssemblyAI, you can moderate and predict mentions of any severity, drug abuse, hate speech, and more in the given audio/video file.
The image below depicts the transcription workflow of AssemblyAI.
Below is the step-by-step tutorial on content moderation on Audio files using AssemblyAI.
The transcription API will perform speech-to-text conversion and detect the sensitive content (if any) in the given file. These include mentions of accidents, disasters, hate speech, gambling, etc.
Step 1: Get the Token
Firstly, we need to get AssemblyAI’s API Token to access the services.
Now that we are ready with the API token, let’s define the headers.
Step 2: Upload the File
Next, we will upload the input audio file to the hosting service of AssemblyAI, which will return a URL that will be used for further requests.
Step 3: Transcription
Once we receive the URL from AssemblyAI, we can proceed with the transcription that will also detect the sensitive content.
Here, we will specify the content_safety
parameter as True
. This will invoke the content moderation models.
Step 4: Fetch Results
A GET request using the id
returned in the POST request is required as the last step. We will make repeated GET requests until the status of the response is marked as ‘completed
’ or ‘error
’.
Step 5: Storing the Output
The response from the transcription services is then stored in a text file.
Now, let’s interpret the content moderation output.
The results for content moderation are available under the content_safety_labels
key of the JSON response received from AssemblyAI.
The outer text
field contains the audio file’s text transcription.
Moreover, as shown in the output above, the results of the content safety detection will be added to the content_safety_labels
key.
The description of the keys with the content_safety_labels
key is given below:
results
: This represents a list of segments of the audio transcription that the model classified as sensitive content.results.text
: This field contains the text transcription which triggered the content moderation model.results.labels
: This field contains all the labels corresponding to sentences detected as the sensitive content. The confidence and severity metrics are also included with each JSON object in this list.summary
: This field contains the confidence scores for every label predicted results in the entire audio file.severity_score_summary
: Describes the total impact of each resultant predicted label on the whole audio file.
Each projected label will include its confidence
and severity
values — both of which are different.
The severity
value depicts how severe the flagged content is on a scale of 0–1
. On the other hand, the confidence
score reveals the model’s confidence while predicting the output label.