Techno Blender
Digitally Yours.

Understand async/await with asyncio for Asynchronous Programming in Python | by Lynn Kwong | Dec, 2022

0 45


Image by Patrick Hendry in Unsplash

Most Python developers may have only worked with synchronous code in Python, even some veteran Pythonistas. However, if you are a data scientist, you may have used the multiprocessing library to run some calculations in parallel. And if you are a web developer, you may have the chance to achieve concurrency with threading. Both multiprocessing and threading are advanced concepts in Python and have their own specific fields of application.

Besides multiprocessing and threading, there is a relatively newer member in the concurrency family of Python — asyncio, which is a library to write concurrent code using the async/await syntax. Similar to threading, asyncio is suitable for IO-bound tasks which are very common in practice. In this post, we will introduce the basic concepts of asyncio and demonstrate how to use this new library to write asynchronous code.

CPU-bound and IO-bound tasks

Before we get started with the asyncio library, there are two concepts we should get clear about because they determine which library should be used to solve your particular problem.

A CPU-bound task spends most of its time doing heavy calculations with the CPUs. If you are a data scientist and need to train some machine learning models by crunching a huge amount of data, then it’s a CPU-bound task. If this case, you should use multiprocessing to run your jobs in parallel and make full use of your CPUs.

On the other hand, an IO-bound task spends most of its time waiting for IO responses, which can be responses from webpages, databases, or disks. For web development where a request needs to fetch data from APIs or databases, it’s an IO-bound task and concurrency can be achieved with either threading or asyncio to minimize the waiting time from external resources.

threading vs asyncio

OK, so we know both threading and asyncio are suitable for IO-bound tasks, but what are the differences?

Firstly, and it may seem unbelievable to you at first sight, threading uses multiple threads whereas asyncio uses only one. For threading, it is easier to understand because the threads take turns to run the code and thus achieve concurrency. But how is it possible to achieve concurrency with a single thread?

Well, threading achieves concurrency with pre-emptive multitasking which means we cannot determine when to run which code in which thread. It’s the operating system that determines which code should be run in which thread. The control can be switched at any point between threads by the operating system. This is why we often see random results with threading. This post can be helpful if you want to learn more about threading.

On the other hand, asyncio achieves concurrency with cooperative multitasking. We can decide which part of the code can be awaited and thus the control be switched to run other parts of the code. The tasks need to cooperate and announce when the control will be switched out. And all this is done in a single thread with the await command. It may seem elusive now but will be much clearer when we see the code later.

What is coroutine?

This is a fancy and also exotic name in asyncio. It is not easy to explain what it is. Many tutorials don’t explain this concept at all and just show you what it is with some code. However, let’s try to understand what it is first.

The definition of coroutine in Python is:

Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points.

Well, this may still seem pretty exotic when you first see it. However, it will make more and more sense when you have worked more and more with asyncio.

In this definition, we can understand subroutines as functions despite the differences between the two. Normally, a function is entered and exited only once when it is called. However, there is a special function in Python called generator which can be entered and exited many times.

Coroutines behave quite like generators. Actually, in older versions of Python coroutines are defined by generators. These coroutines are called generator-based coroutines. However, coroutines have become a native feature in Python now and can be defined with the new async def syntax. Even though generator-based coroutines are deprecated now, their history and existence can help us understand what a coroutine is and how the control is switched or yielded between different parts of the code. In case you want to learn more about the history and specification of coroutine in Python, PEP 492 is a good reference. However, it may not be easy to read and understand for beginners.

OK, enough abstract concepts for now. If you get lost somehow and cannot understand all the concepts, it’s OK. They will become clearer over time when you write and read more and more asynchronous code with the asyncio library.

Define a coroutine function

Now that the basic concepts have been introduced, we can start to write our first coroutine function:

async def coro_func():
print("Hello, asyncio!")

coro_func() is a coroutine function and when it is called it will return a coroutine object:

coro_obj = coro_func()

type(coro_obj)
# coroutine

Note that the term coroutine can refer to either a coroutine function or a coroutine object, depending on the context.

As you may have noticed, when the coroutine function is called, the print function is not called. If you have worked with generators, you won’t be surprised because it behaves similarly to generator functions:

def gen_func():
yield "Hello, generator!"

generator = gen_func()
type(generator)
# generator

In order to run the code in a generator, you need to iterate it. For example, you can use the next function to iterate it:

next(generator)
# 'Hello, generator!'

Similarly, to run the code defined in a coroutine function, you need to await it. However, you cannot await it in the same way as you iterate a generator. A coroutine can only be awaited inside another coroutine defined by the async def syntax:

async def coro_func():
print("Hello, asyncio!")

async def main():
print("In the entrypoint coroutine.")
await coro_func()

Now the question is how can we run the main() coroutine function. Well, obviously we cannot put it in another coroutine function and await it.

For the top-level entry point coroutine function, which is normally named as main(), we need to use asyncio.run() to run it:

import asyncio

async def coro_func():
print("Hello, asyncio!")

async def main():
print("In the entrypoint coroutine.")
await coro_func()

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

Note that we need to import the built-in asyncio library here.

Under the hood, it’s handled by something called event loop. However, with modern Python, you don’t need to worry about these details anymore.

Return a value in a coroutine function

We can return a value in a coroutine function. The value is returned with the await command and can be assigned to a variable:

import asyncio

async def coro_func():
return "Hello, asyncio!"

async def main():
print("In the entrypoint coroutine.")
result = await coro_func()
print(result)

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

Run multiple coroutines concurrently

It’s not much fun and not useful to have a single coroutine in your code. Coroutines really shine when there are multiple of them which should be run concurrently.

Let’s first look at an example where coroutines are awaited incorrectly:

import asyncio
from datetime import datetime

async def async_sleep(num):
print(f"Sleeping {num} seconds.")
await asyncio.sleep(num)

async def main():
start = datetime.now()

for i in range(1, 4):
await async_sleep(i)

duration = datetime.now() - start
print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 6.00 seconds.

Firstly, note that we need to use the asyncio.sleep() function in a coroutine function to simulate the IO blocking time.

Secondly, the three coroutine objects created are awaited one by one. Since the control is only handled to the next line of code (the next loop here) when the coroutine object awaited has been completed, these three coroutines are actually awaited one by one. As a result, it took 6 seconds to run the code which is the same as running the code synchronously.

We should run multiple coroutines concurrently with the async.gather() function.

async.gather() is used to run multiple awaitables concurrently. An awaitable is, as the name indicates, something that can be awaited with the await command. It can be a coroutine, a task, a future, or anything that implements the __await__() magic method.

Let’s see the usage with async.gather():

import asyncio
from datetime import datetime

async def async_sleep(num):
print(f"Sleeping {num} seconds.")
await asyncio.sleep(num)

async def main():
start = datetime.now()

coro_objs = []
for i in range(1, 4):
coro_objs.append(async_sleep(i))

await asyncio.gather(*coro_objs)

duration = datetime.now() - start
print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 3.00 seconds.

Note that we need to unpack the list of awaitables for the async.gather() function.

This time the coroutine objects were run concurrently and the code only took 3 seconds.

If you check the return type of asyncio.gather(), you will see that it is a Future object. A Future object is a special data structure representing that some work is done somewhere else and may or may not have been completed. When a Future object is awaited, three things can happen:

  • When the future has been resolved successfully meaning the underlying work has been completed successfully, it will return immediately with the returned value, if available.
  • When the future has been resolved unsuccessfully and an exception is raised, the exception will be propagated to the caller.
  • When the future has not been resolved yet, the code will wait until it’s resolved.

A more practical example with async with and aiohttp

Above we have just written some dummy code to demonstrate the basics of asyncio. Now let’s write some more practical code to further demonstrate the use of asyncio.

We will write some code to fetch responses from the requests to some web pages concurrently, which is a classical IO-bound task as explained at the beginning of this post.

Note that we cannot use our familiar requests library to get responses from web pages. This is because the requests library does not support the asynico library. This is actually a major limitation of the asynico library as many classical Python libraries still do not support the asyncio library. However, over time this will get better, and more asynchronous libraries will be available.

To solve the problem of the requests library, we need to use the aiohttp library which is designed for making asynchronous HTTP requests (and more).

We need to install aiohttp first as it’s still an external library:

pip install aiohttp

It’s highly recommended to install new libraries in a virtual environment so they won’t impact system libraries and you won’t have compatibility issues.

This is the code for using the aiohttp library to perform HTTP requests, which also uses the async with syntax heavily:

import asyncio
import aiohttp

async def scrape_page(session, url):
print(f"Scraping {url}")
async with session.get(url) as resp:
return len(await resp.text())

async def main():
urls = [
"https://www.superdataminer.com/posts/66cff907ce8e",
"https://www.superdataminer.com/posts/f21878c9897",
"https://www.superdataminer.com/posts/b24dec228c43"
]

coro_objs = []

async with aiohttp.ClientSession() as session:
for url in urls:
coro_objs.append(
scrape_page(session, url)
)

results = await asyncio.gather(*coro_objs)

for url, length in zip(urls, results):
print(f"{url} -> {length}")

asyncio.run(main())
# Scraping https://www.superdataminer.com/posts/66cff907ce8e
# Scraping https://www.superdataminer.com/posts/f21878c9897
# Scraping https://www.superdataminer.com/posts/b24dec228c43
# https://www.superdataminer.com/posts/66cff907ce8e -> 12873
# https://www.superdataminer.com/posts/f21878c9897 -> 12809
# https://www.superdataminer.com/posts/b24dec228c43 -> 12920

The async with statement makes it possible to perform asynchronous calls when entering or exiting a context. Under the hood, it’s achieved by the async def __aenter__() and async def __aexit__() magical methods, which is a pretty advanced topic. If you are interested, you should get some knowledge of regular context manager in Python first. And after that, this post can be a good reference if you want to dive deeper. However, normally you don’t need to dive that deep unless you want to create your own asynchronous context managers.

Except the async with syntax, the usage of the aiohttp library is actually very similar to that of the requests library.

In this post, we have introduced the basic concepts of asynchronous programming. The basic usage of the asyncio library with the async/await and asyncio.run() and asyncio.gather() statements are introduced with easy-to-follow examples. With this knowledge, you shall be able to read and write basic asynchronous code with the asyncio library and can work more comfortably with asynchronous API frameworks like FastAPI.


Image by Patrick Hendry in Unsplash

Most Python developers may have only worked with synchronous code in Python, even some veteran Pythonistas. However, if you are a data scientist, you may have used the multiprocessing library to run some calculations in parallel. And if you are a web developer, you may have the chance to achieve concurrency with threading. Both multiprocessing and threading are advanced concepts in Python and have their own specific fields of application.

Besides multiprocessing and threading, there is a relatively newer member in the concurrency family of Python — asyncio, which is a library to write concurrent code using the async/await syntax. Similar to threading, asyncio is suitable for IO-bound tasks which are very common in practice. In this post, we will introduce the basic concepts of asyncio and demonstrate how to use this new library to write asynchronous code.

CPU-bound and IO-bound tasks

Before we get started with the asyncio library, there are two concepts we should get clear about because they determine which library should be used to solve your particular problem.

A CPU-bound task spends most of its time doing heavy calculations with the CPUs. If you are a data scientist and need to train some machine learning models by crunching a huge amount of data, then it’s a CPU-bound task. If this case, you should use multiprocessing to run your jobs in parallel and make full use of your CPUs.

On the other hand, an IO-bound task spends most of its time waiting for IO responses, which can be responses from webpages, databases, or disks. For web development where a request needs to fetch data from APIs or databases, it’s an IO-bound task and concurrency can be achieved with either threading or asyncio to minimize the waiting time from external resources.

threading vs asyncio

OK, so we know both threading and asyncio are suitable for IO-bound tasks, but what are the differences?

Firstly, and it may seem unbelievable to you at first sight, threading uses multiple threads whereas asyncio uses only one. For threading, it is easier to understand because the threads take turns to run the code and thus achieve concurrency. But how is it possible to achieve concurrency with a single thread?

Well, threading achieves concurrency with pre-emptive multitasking which means we cannot determine when to run which code in which thread. It’s the operating system that determines which code should be run in which thread. The control can be switched at any point between threads by the operating system. This is why we often see random results with threading. This post can be helpful if you want to learn more about threading.

On the other hand, asyncio achieves concurrency with cooperative multitasking. We can decide which part of the code can be awaited and thus the control be switched to run other parts of the code. The tasks need to cooperate and announce when the control will be switched out. And all this is done in a single thread with the await command. It may seem elusive now but will be much clearer when we see the code later.

What is coroutine?

This is a fancy and also exotic name in asyncio. It is not easy to explain what it is. Many tutorials don’t explain this concept at all and just show you what it is with some code. However, let’s try to understand what it is first.

The definition of coroutine in Python is:

Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points.

Well, this may still seem pretty exotic when you first see it. However, it will make more and more sense when you have worked more and more with asyncio.

In this definition, we can understand subroutines as functions despite the differences between the two. Normally, a function is entered and exited only once when it is called. However, there is a special function in Python called generator which can be entered and exited many times.

Coroutines behave quite like generators. Actually, in older versions of Python coroutines are defined by generators. These coroutines are called generator-based coroutines. However, coroutines have become a native feature in Python now and can be defined with the new async def syntax. Even though generator-based coroutines are deprecated now, their history and existence can help us understand what a coroutine is and how the control is switched or yielded between different parts of the code. In case you want to learn more about the history and specification of coroutine in Python, PEP 492 is a good reference. However, it may not be easy to read and understand for beginners.

OK, enough abstract concepts for now. If you get lost somehow and cannot understand all the concepts, it’s OK. They will become clearer over time when you write and read more and more asynchronous code with the asyncio library.

Define a coroutine function

Now that the basic concepts have been introduced, we can start to write our first coroutine function:

async def coro_func():
print("Hello, asyncio!")

coro_func() is a coroutine function and when it is called it will return a coroutine object:

coro_obj = coro_func()

type(coro_obj)
# coroutine

Note that the term coroutine can refer to either a coroutine function or a coroutine object, depending on the context.

As you may have noticed, when the coroutine function is called, the print function is not called. If you have worked with generators, you won’t be surprised because it behaves similarly to generator functions:

def gen_func():
yield "Hello, generator!"

generator = gen_func()
type(generator)
# generator

In order to run the code in a generator, you need to iterate it. For example, you can use the next function to iterate it:

next(generator)
# 'Hello, generator!'

Similarly, to run the code defined in a coroutine function, you need to await it. However, you cannot await it in the same way as you iterate a generator. A coroutine can only be awaited inside another coroutine defined by the async def syntax:

async def coro_func():
print("Hello, asyncio!")

async def main():
print("In the entrypoint coroutine.")
await coro_func()

Now the question is how can we run the main() coroutine function. Well, obviously we cannot put it in another coroutine function and await it.

For the top-level entry point coroutine function, which is normally named as main(), we need to use asyncio.run() to run it:

import asyncio

async def coro_func():
print("Hello, asyncio!")

async def main():
print("In the entrypoint coroutine.")
await coro_func()

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

Note that we need to import the built-in asyncio library here.

Under the hood, it’s handled by something called event loop. However, with modern Python, you don’t need to worry about these details anymore.

Return a value in a coroutine function

We can return a value in a coroutine function. The value is returned with the await command and can be assigned to a variable:

import asyncio

async def coro_func():
return "Hello, asyncio!"

async def main():
print("In the entrypoint coroutine.")
result = await coro_func()
print(result)

asyncio.run(main())
# In the entrypoint coroutine.
# Hello, asyncio!

Run multiple coroutines concurrently

It’s not much fun and not useful to have a single coroutine in your code. Coroutines really shine when there are multiple of them which should be run concurrently.

Let’s first look at an example where coroutines are awaited incorrectly:

import asyncio
from datetime import datetime

async def async_sleep(num):
print(f"Sleeping {num} seconds.")
await asyncio.sleep(num)

async def main():
start = datetime.now()

for i in range(1, 4):
await async_sleep(i)

duration = datetime.now() - start
print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 6.00 seconds.

Firstly, note that we need to use the asyncio.sleep() function in a coroutine function to simulate the IO blocking time.

Secondly, the three coroutine objects created are awaited one by one. Since the control is only handled to the next line of code (the next loop here) when the coroutine object awaited has been completed, these three coroutines are actually awaited one by one. As a result, it took 6 seconds to run the code which is the same as running the code synchronously.

We should run multiple coroutines concurrently with the async.gather() function.

async.gather() is used to run multiple awaitables concurrently. An awaitable is, as the name indicates, something that can be awaited with the await command. It can be a coroutine, a task, a future, or anything that implements the __await__() magic method.

Let’s see the usage with async.gather():

import asyncio
from datetime import datetime

async def async_sleep(num):
print(f"Sleeping {num} seconds.")
await asyncio.sleep(num)

async def main():
start = datetime.now()

coro_objs = []
for i in range(1, 4):
coro_objs.append(async_sleep(i))

await asyncio.gather(*coro_objs)

duration = datetime.now() - start
print(f"Took {duration.total_seconds():.2f} seconds.")

asyncio.run(main())
# Sleeping 1 seconds.
# Sleeping 2 seconds.
# Sleeping 3 seconds.
# Took 3.00 seconds.

Note that we need to unpack the list of awaitables for the async.gather() function.

This time the coroutine objects were run concurrently and the code only took 3 seconds.

If you check the return type of asyncio.gather(), you will see that it is a Future object. A Future object is a special data structure representing that some work is done somewhere else and may or may not have been completed. When a Future object is awaited, three things can happen:

  • When the future has been resolved successfully meaning the underlying work has been completed successfully, it will return immediately with the returned value, if available.
  • When the future has been resolved unsuccessfully and an exception is raised, the exception will be propagated to the caller.
  • When the future has not been resolved yet, the code will wait until it’s resolved.

A more practical example with async with and aiohttp

Above we have just written some dummy code to demonstrate the basics of asyncio. Now let’s write some more practical code to further demonstrate the use of asyncio.

We will write some code to fetch responses from the requests to some web pages concurrently, which is a classical IO-bound task as explained at the beginning of this post.

Note that we cannot use our familiar requests library to get responses from web pages. This is because the requests library does not support the asynico library. This is actually a major limitation of the asynico library as many classical Python libraries still do not support the asyncio library. However, over time this will get better, and more asynchronous libraries will be available.

To solve the problem of the requests library, we need to use the aiohttp library which is designed for making asynchronous HTTP requests (and more).

We need to install aiohttp first as it’s still an external library:

pip install aiohttp

It’s highly recommended to install new libraries in a virtual environment so they won’t impact system libraries and you won’t have compatibility issues.

This is the code for using the aiohttp library to perform HTTP requests, which also uses the async with syntax heavily:

import asyncio
import aiohttp

async def scrape_page(session, url):
print(f"Scraping {url}")
async with session.get(url) as resp:
return len(await resp.text())

async def main():
urls = [
"https://www.superdataminer.com/posts/66cff907ce8e",
"https://www.superdataminer.com/posts/f21878c9897",
"https://www.superdataminer.com/posts/b24dec228c43"
]

coro_objs = []

async with aiohttp.ClientSession() as session:
for url in urls:
coro_objs.append(
scrape_page(session, url)
)

results = await asyncio.gather(*coro_objs)

for url, length in zip(urls, results):
print(f"{url} -> {length}")

asyncio.run(main())
# Scraping https://www.superdataminer.com/posts/66cff907ce8e
# Scraping https://www.superdataminer.com/posts/f21878c9897
# Scraping https://www.superdataminer.com/posts/b24dec228c43
# https://www.superdataminer.com/posts/66cff907ce8e -> 12873
# https://www.superdataminer.com/posts/f21878c9897 -> 12809
# https://www.superdataminer.com/posts/b24dec228c43 -> 12920

The async with statement makes it possible to perform asynchronous calls when entering or exiting a context. Under the hood, it’s achieved by the async def __aenter__() and async def __aexit__() magical methods, which is a pretty advanced topic. If you are interested, you should get some knowledge of regular context manager in Python first. And after that, this post can be a good reference if you want to dive deeper. However, normally you don’t need to dive that deep unless you want to create your own asynchronous context managers.

Except the async with syntax, the usage of the aiohttp library is actually very similar to that of the requests library.

In this post, we have introduced the basic concepts of asynchronous programming. The basic usage of the asyncio library with the async/await and asyncio.run() and asyncio.gather() statements are introduced with easy-to-follow examples. With this knowledge, you shall be able to read and write basic asynchronous code with the asyncio library and can work more comfortably with asynchronous API frameworks like FastAPI.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment