I Used My Voice to Interact With OpenAI GPT-3 | by Avi Chawla | Sep, 2022
Building a web app that talks to GPT-3
Large Language Models (LLMs for short), such as PaLM by Google, GPT-3 by OpenAI, Megatron-Turing NLG by Microsoft-NVIDIA, etc., have achieved remarkable performance in generating comprehensive natural language texts.
Contemporarily, among these three models, only OpenAI’s GPT-3 is available to the public, with accessibility using OpenAI’s APIs. As a result, since its release, OpenAI’s GPT-3 has been leveraged in over 300 applications/products. (source: here).
GPT-3 takes a text input as a prompt and performs the task of text completion by predicting one token at a time. What makes GPT-3 special is the scale at which it was built, possessing nearly 175B parameters.
While the core idea behind GPT-3 is to respond to a “textual” prompt, integrating voice-enabled input applications has also been of great interest lately to the community.
Therefore, in this blog, we shall create a Streamlit application to interact with OpenAI GPT-3 by providing it with speech-based inputs.
The highlight of the article is as follows:
· App Workflow
· Prerequisites
· Building The Streamlit App
· Executing the Application
· Conclusion
Let’s begin!
As discussed above, the GPT-3 model expects a text prompt as an input. However, if we begin with speech, we first need to convert speech to text and then feed the transcribed text as input to the GPT-3 model.
To generate audio transcription, I will use AssemblyAI’s speech-to-text transcription API.
The high-level workflow of the application is demonstrated in the image below:
First, the user will provide voice input, which will be recorded. Next, we will send the audio file to AssemblyAI for transcription. Once the transcribed text is ready and retrieved from AssemblyAI’s servers, we will provide it as input to the OpenAI GPT-3 model using the OpenAI API.
A few requirements exist to create a voice-based app that can interact with GPT-3 are specified below:
#1 Install Streamlit
First, as we are creating this application using Streamlit, we should install the streamlit library using the following command:
#2 Install OpenAI
Next, to send requests to the GPT-3 model, we should install the OpenAI API as well as follows:
#3 Import Dependencies
Next, we import the python libraries we will utilize in this project.
#4 Get the AssemblyAI API Token
To leverage the transcription services of AssemblyAI, you should get an API Access token from the AssemblyAI website. Let’s name it assembly_auth_key
for our Streamlit app.
#5 Get the OpenAI API Token
Lastly, to access the GPT-3 model and generate text outputs, you should get an API access token from the OpenAI website. In OpenAI, this is declared as the api_key
attribute as follows:
Once we have fulfilled all the prerequisites for our application, we can proceed with building the app.
For this, we shall define five different functions. These are:
record_audio(file_name)
: As the name suggests, this will allow the user to provide verbal inputs to the application. The function will collect the audio and store it in an audio file locally asfile_name
. I have referred to this code for integrating this method into the app.upload_to_assemblyai(file_name)
: This function will take the audio file, upload it to AssemblyAI’s server and return the URL of the file asupload_url
.transcribe(upload_url)
: Once theupload_url
is available, we shall create a POST request to transcribe the audio file. This will return thetranscription_id
, which will be used to fetch the transcription results from AssemblyAI.get_transcription_result(transcription_id)
: To retrieve the transcribed text, we shall execute a GET request with thetranscription_id
obtained fromtranscribe()
method. The function will return the transcribed text, which we will store as aprompt
variable.call_gpt3(prompt)
: Lastly, this function will pass the prompt received from the user and retrieve the output from the GPT-3 model.
Method 2: Uploading Audio File to AssemblyAI
Once the audio file is ready and saved locally, we shall upload this file to AssemblyAI and obtain its URL.
However, before uploading the file, we should declare the headers and the transcription endpoints of AssemblyAI.
In the code block above:
- The
upload_endpoint
specifies the AssemblyAI’s upload service. - After uploading the file, we will use the
transcription_endpoint
to transcribe the audio file.
The upload_to_assemblyai()
method is implemented below:
We make a post request to AssemblyAI with the upload_endpoint
, the headers
and the path to the audio file (file_path
). We collect and return the upload_url
from the JSON response received.
Method 3: Transcribing the Audio File
Next, we shall define the transcribe()
method.
In contrast to the POST request made in the upload_to_assemblyai() method, here, we invoke the transcription_endpoint
instead as the objective is to transcribe the file.
The method returns the transcription_id
for our POST request, which we can use to fetch the transcription results.
Method 4: Fetching the Transcription Results
The fourth step in this list is to fetch the transcription results from AssemblyAI using a GET request.
To fetch the results corresponding to our specific request, we should provide the unique identifier (transcription_id
) received from AssemblyAI in our GET request. The get_transcription_result()
method is implemented below:
The transcription run-time will vary depending upon the input audio’s duration. Therefore, we should make repeated GET requests to check the status of our request and fetch the results once the status changes to completed
or indicates an error
. Here, we return the transcription text (prompt
).
Method 5: Sending the Prompt to OpenAI GPT-3
The final method will send the prompt as an input to the GPT-3 model using the OpenAI API.
You can find the list of available GPT-3 engines here.
Integrating the Functions in Main Method
As the final step in our Streamlit application, we integrate the functions defined above in the main()
method.
Now that we have built the entire application, it’s time to run it.
Open a new terminal session and navigate to the working directory. Here, execute the following command:
streamlit run file-name.py
Replace
file-name.py
with the name of your app file.
Demo Walkthrough
Next, let’s do a quick walkthrough of our Streamlit voice-enable GPT-3 application.
As we saw above, the app asks to speak the prompt. In the walkthrough below, I have presented the following prompt to GPT-3: “Think about the existence of Life outside Earth.”
The application records the audio and saves it to a file locally. Next, it sends the file to AssemblyAI for transcription. Finally, the transcribed text is sent to GPT-3, whose response is displayed on the application.
The response returned by GPT-3 to our prompt is: “This is a difficult question to answer, as there is no concrete evidence that life exists outside of Earth. However, there are many possible theories about where life could exist in the universe….”
Building a web app that talks to GPT-3
Large Language Models (LLMs for short), such as PaLM by Google, GPT-3 by OpenAI, Megatron-Turing NLG by Microsoft-NVIDIA, etc., have achieved remarkable performance in generating comprehensive natural language texts.
Contemporarily, among these three models, only OpenAI’s GPT-3 is available to the public, with accessibility using OpenAI’s APIs. As a result, since its release, OpenAI’s GPT-3 has been leveraged in over 300 applications/products. (source: here).
GPT-3 takes a text input as a prompt and performs the task of text completion by predicting one token at a time. What makes GPT-3 special is the scale at which it was built, possessing nearly 175B parameters.
While the core idea behind GPT-3 is to respond to a “textual” prompt, integrating voice-enabled input applications has also been of great interest lately to the community.
Therefore, in this blog, we shall create a Streamlit application to interact with OpenAI GPT-3 by providing it with speech-based inputs.
The highlight of the article is as follows:
· App Workflow
· Prerequisites
· Building The Streamlit App
· Executing the Application
· Conclusion
Let’s begin!
As discussed above, the GPT-3 model expects a text prompt as an input. However, if we begin with speech, we first need to convert speech to text and then feed the transcribed text as input to the GPT-3 model.
To generate audio transcription, I will use AssemblyAI’s speech-to-text transcription API.
The high-level workflow of the application is demonstrated in the image below:
First, the user will provide voice input, which will be recorded. Next, we will send the audio file to AssemblyAI for transcription. Once the transcribed text is ready and retrieved from AssemblyAI’s servers, we will provide it as input to the OpenAI GPT-3 model using the OpenAI API.
A few requirements exist to create a voice-based app that can interact with GPT-3 are specified below:
#1 Install Streamlit
First, as we are creating this application using Streamlit, we should install the streamlit library using the following command:
#2 Install OpenAI
Next, to send requests to the GPT-3 model, we should install the OpenAI API as well as follows:
#3 Import Dependencies
Next, we import the python libraries we will utilize in this project.
#4 Get the AssemblyAI API Token
To leverage the transcription services of AssemblyAI, you should get an API Access token from the AssemblyAI website. Let’s name it assembly_auth_key
for our Streamlit app.
#5 Get the OpenAI API Token
Lastly, to access the GPT-3 model and generate text outputs, you should get an API access token from the OpenAI website. In OpenAI, this is declared as the api_key
attribute as follows:
Once we have fulfilled all the prerequisites for our application, we can proceed with building the app.
For this, we shall define five different functions. These are:
record_audio(file_name)
: As the name suggests, this will allow the user to provide verbal inputs to the application. The function will collect the audio and store it in an audio file locally asfile_name
. I have referred to this code for integrating this method into the app.upload_to_assemblyai(file_name)
: This function will take the audio file, upload it to AssemblyAI’s server and return the URL of the file asupload_url
.transcribe(upload_url)
: Once theupload_url
is available, we shall create a POST request to transcribe the audio file. This will return thetranscription_id
, which will be used to fetch the transcription results from AssemblyAI.get_transcription_result(transcription_id)
: To retrieve the transcribed text, we shall execute a GET request with thetranscription_id
obtained fromtranscribe()
method. The function will return the transcribed text, which we will store as aprompt
variable.call_gpt3(prompt)
: Lastly, this function will pass the prompt received from the user and retrieve the output from the GPT-3 model.
Method 2: Uploading Audio File to AssemblyAI
Once the audio file is ready and saved locally, we shall upload this file to AssemblyAI and obtain its URL.
However, before uploading the file, we should declare the headers and the transcription endpoints of AssemblyAI.
In the code block above:
- The
upload_endpoint
specifies the AssemblyAI’s upload service. - After uploading the file, we will use the
transcription_endpoint
to transcribe the audio file.
The upload_to_assemblyai()
method is implemented below:
We make a post request to AssemblyAI with the upload_endpoint
, the headers
and the path to the audio file (file_path
). We collect and return the upload_url
from the JSON response received.
Method 3: Transcribing the Audio File
Next, we shall define the transcribe()
method.
In contrast to the POST request made in the upload_to_assemblyai() method, here, we invoke the transcription_endpoint
instead as the objective is to transcribe the file.
The method returns the transcription_id
for our POST request, which we can use to fetch the transcription results.
Method 4: Fetching the Transcription Results
The fourth step in this list is to fetch the transcription results from AssemblyAI using a GET request.
To fetch the results corresponding to our specific request, we should provide the unique identifier (transcription_id
) received from AssemblyAI in our GET request. The get_transcription_result()
method is implemented below:
The transcription run-time will vary depending upon the input audio’s duration. Therefore, we should make repeated GET requests to check the status of our request and fetch the results once the status changes to completed
or indicates an error
. Here, we return the transcription text (prompt
).
Method 5: Sending the Prompt to OpenAI GPT-3
The final method will send the prompt as an input to the GPT-3 model using the OpenAI API.
You can find the list of available GPT-3 engines here.
Integrating the Functions in Main Method
As the final step in our Streamlit application, we integrate the functions defined above in the main()
method.
Now that we have built the entire application, it’s time to run it.
Open a new terminal session and navigate to the working directory. Here, execute the following command:
streamlit run file-name.py
Replace
file-name.py
with the name of your app file.
Demo Walkthrough
Next, let’s do a quick walkthrough of our Streamlit voice-enable GPT-3 application.
As we saw above, the app asks to speak the prompt. In the walkthrough below, I have presented the following prompt to GPT-3: “Think about the existence of Life outside Earth.”
The application records the audio and saves it to a file locally. Next, it sends the file to AssemblyAI for transcription. Finally, the transcribed text is sent to GPT-3, whose response is displayed on the application.
The response returned by GPT-3 to our prompt is: “This is a difficult question to answer, as there is no concrete evidence that life exists outside of Earth. However, there are many possible theories about where life could exist in the universe….”