Build your own AI Voice Assistant to Control Your PC | by Bharath K | Jun, 2022

By Jessie Hobb On Jun 17, 2022

A simple guide on how you can construct your own AI assistant to control various actions on your PC

Recently the usage of virtual assistants to control your surroundings is becoming a common practice. We make use of Google AI, Siri, Alexa, Cortana, and many other similar virtual assistants to complete tasks for us with a simple voice or audio command. You could ask them to play music or open a particular file or any other similar task, and they would perform such actions with ease.

While these devices are cool, it is also intriguing to develop your own AI voice automated assistant, which you can utilize to control your desktop with just the help of your voice. We can use such an AI to chat with you, open videos, play music, and so much more.

In this article, we will work on developing an introductory project for an AI assistant that you can utilize to control your PC or any other similar device with your voice. We will get started with an introduction to some of the basic dependencies that are required to construct this project and proceed to put it all together into a Python file through which the AI Voice assistant is built to follow your commands.

Before diving into this article, if you are interested in other such cool projects where we construct stuff from scratch, I would recommend checking out one of my previous works. Below is a link provided where you can develop your own weather application indicator with Python in less than ten lines of code.

Part-1: The Desktop Control

In this section of the article, we will learn how to control our PC. We will learn how to manage and handle some basic operations on the physical screen. With the help of the PyAutoGUI, we can perform numerous functionalities required for this project. This automation library tool allows the users to programmatically control the mouse and keyboard.

You can install the requirements to handle all the cursor, mouse, and keyboard-related tasks with the PyAutoGUI library with a simple pip command, as shown below.

pip install PyAutoGUI

Let us get started with some of the basic commands from this library that we will require for developing our Python project for voice-assisted AI. After a couple of minutes, the installation should be finished in the respective environment without much of a hassle.

Firstly, let us import the PyAutoGUI library as shown in the below code snippet. The next critical step is to know the resolution of your working screen. We can print the default screen size and height of the screen with the size function available in the recently installed library.

import pyautogui# Printing the default screen width and height
screenWidth, screenHeight = pyautogui.size()print(screenWidth, screenHeight)

Output: 1920 1080

You can notice that the resolution of my screen is 1920 x 1080, which should be the default screen size for most computers. However, if you have higher or lower resolutions on your monitor screen, you can still follow along with the guide easily. The commands can be used interchangeably to obtain the desired coordinates on any resolution. Just make sure to change some of the parameters accordingly if your screen display resolution doesn’t match mine.

The other essential command that we will go cover in this section is the command to discover the current location and position of your mouse pointer. The position() function of the library will locate the current coordinates where your mouse pointer is placed. We can use these positions to locate folders and other essential directories on your Desktop screen. Below is the code snippet to perform the following action.

# Showing the current cursor position
currentMouseX, currentMouseY = pyautogui.position() # Get the XY position of the mouse.
print(currentMouseX, currentMouseY)

Another interesting functionality of the library is that you can locate the position of certain images on your current working screen along with the respective coordinates with the code snippet provided below.

# Locating on the Screen by getting the co-ordinates
x, y = pyautogui.locateCenterOnScreen("image.png")

The final essential command that we will look at in this section is the function that allows us to open the desired directory. By placing my cursor on the top left corner, I was able to figure out the coordinates of my admin folder. We can move the cursor to the respective location by using the moveTo() function along with the respective position of the folder. We can then use the click() command by mentioning the left or right mouse button clicks and the number of clicks you want to do.

# Open The Admin Directory
pyautogui.moveTo(37, 35, 1)
pyautogui.click(button='left', clicks=2)

With the above code snippet, you should be able to open the admin folder as the cursor automatically moves to the admin directory and double-clicks on it to open it. If you don’t have a similar icon on the top-left of your screen or if you have a different screen resolution, feel free to experiment with the positions and coordinates accordingly.

Part-2: The Voice Command Control

In this section of the article, we will understand some of the basic requirements for speech recognition, which is the second most core component of this project. We will require a microphone to pass our commands through voice and interpret the information accordingly. The speech recognition library, along with a text-to-speech converter of your choice, is recommended. Also ensure that you have PyAudio installed in your working environment.

If the viewers are not too familiar with text-to-speech, I would highly recommend checking out one of my previous articles, where I cover Google text-to-speech with Python with beginner codes to get you started. The link for the same is provided below.

Firstly, we can import the necessary libraries as shown in the below code block. The speech recognition library will enable us to detect the necessary voice commands. Additionally, we can make use of a text-to-speech library as well to pass text commands and convert them to voice and pass them to the system to perform the desired operation. We can create a variable for the voice recognizer.

import speech_recognition as sr
import pyttsx3r = sr.Recognizer()

In the next step, we will read the microphone input of the user as the source and interpret the speech accordingly. Once the audio is recognized as desired, the speech output is displayed on the terminal output. However, if the speech is undetected, we can pass the necessary exceptions to ensure that the user can verify their settings accordingly. Below is the code snippet for simple speech recognition.

with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print ("Say Something")
audio = r.listen(source)try:
text = r.recognize_google(audio)
print("you said: ", text)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))

In the next step, we will construct the final build for the AI voice assistant, where we can combine the two features discussed in this section into a single entity to perform the required actions.

Photo by Possessed Photography on Unsplash

Now that we have a basic understanding of the two essential core components of this article in device control and speech recognition, we can start combining both these elements to develop our project. Let us start with the necessary library imports as shown below.

import pyautogui
import speech_recognition as srr = sr.Recognizer()

In the next snippet, we will define the command’s function, where we will interpret numerous actions. In the below code block, I have defined only a couple of functionalities, i.e., to open my admin directory or the start menu. The function takes a text input provided by the user. We can add several other necessary commands to make further improvements to this project.

def commands(text):
if text == "open admin":
# Open The Admin Directory
pyautogui.moveTo(37, 35, 1)
pyautogui.click(button='left', clicks=2)    elif text == "open start menu":
# Open The start menu
pyautogui.moveTo(18, 1057, 1)
pyautogui.click(button='left', clicks=1)

In the next code block, we will define the functionality for receiving the audio input from the user and recognizing the speech accordingly. Once the audio is heard, make sure to convert it into a lower case before passing the text input into our commands function. Once the below code is built, you are free to test and run the project.

with sr.Microphone() as source:
r.adjust_for_ambient_noise(source)
print ("Say Something")
audio = r.listen(source)try:
text = r.recognize_google(audio)
print("you said: ", text)
commands(text.lower())
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))

The preferred method of running your project is to minimize all tabs and open the terminal to run the Python code. You can give the command “open admin” to watch the cursor move from the default location to the specified position and open it as desired. All the necessary files for the following project are provided in my GitHub repository. Check it out from the following link.

The following project is just an introductory project to getting started with an AI voice assistant of your own from scratch. There are tons of advancements and improvements that we can make to the following project, which I would recommend the users to try out. I will also look at the part-2 extension of this article, where we can make several significant progressions for better features and performance.

“Anything that could give rise to smarter-than-human intelligence — in the form of Artificial Intelligence, brain-computer interfaces, or neuroscience-based human intelligence enhancement — wins hands down beyond contest as doing the most to change the world. Nothing else is even in the same league.”
— Eliezer Yudkowsky

Recognition of speech and voice are primitive tasks for humans to understand. We are able to perceive and reciprocate most human emotions by listening and reading into different types of voices and figures of speech. However, machines do not yet have the complete capability of understanding the emotions behind speech.

Although we haven’t fully been able to develop machines that understand human sentiment completely, we have successfully managed to develop multiple devices that can detect and understand speech. When programmed currently, the AI can recognize the speech and create a network mapping to interpret the dialogue and perform the respective task.

In this article, we developed a project on a voice automation system that can be utilized to control numerous actions on your desktop. We covered the basics of the PyAutoGUI library for handling all the cursor, mouse, and keyboard-related tasks. We then explored the speech recognition library for voice detection and processing. Finally, we constructed an AI Voice Assistant to control your PC.

If you want to get notified about my articles as soon as they go up, check out the following link to subscribe for email recommendations. If you wish to support other authors and me, then subscribe to the below link.

If you have any queries related to the various points stated in this article, then feel free to let me know in the comments below. I will try to get back to you with a response as soon as possible.

A quick update to all the viewers who like to read my content. I am sorry for the delays caused for the blogs as I have been slightly busy with work. I will try to post at least three to five articles each month starting from the next. Thank you all for your continuous support.

Check out some of my other articles in relation to the topic covered in this piece that you might also enjoy reading!

Thank you all for sticking on till the end. I hope all of you enjoyed reading the article. Wish you all a wonderful day!