Techno Blender
Digitally Yours.

Comprehension Pipelines in Python | Marcin Kozak

0 72


PYTHON PROGRAMMING

Comprehension pipelines are a Python-specific idea for building pipelines

Comprehension pipelines take you straight to the goal. Photo by Anika Huizinga on Unsplash

Generator pipelines offer a Pythonic way to create software pipelines, that is, chains of operations in which each operation but the first one takes the output of the previous operation as its input:

They enable you to apply transforming programming as described by Thomas and Hunt in their great book The Pragmatic Programmer:

A typical generator pipeline in Python uses a generator at each step of the pipeline. In other words, each step of the pipeline is constructed as a generator. Thomas and Hunt discuss pipelines achieved via the pipe operator, available in many programming languages. While Python does not have a built-in pipe operator, it can be easily created because some operators are available for overloading in class definitions. We can see this done in the Pipe Python package, which uses the | operator.

In the above-mentioned article, I showed that using generators to create each step of the pipeline can introduce a visual clutter that decreases code readability. In addition, when each run of a pipeline is fast, this approach shows poor performance. Therefore, I proposed an alternative, efficient way of building generator pipelines that is more readable than the classical generator pipeline. The method combines function composition with a generator. In the future, I will show you how to create pipelines using a pipe operator.

Nonetheless, generator expressions are a special case of comprehensions, as I wrote in this article:

So, why should we limit ourselves to generator pipelines? Why not listcomp pipelines or dictcomp pipelines or setcomp pipelines?

This question has been bothering me for some time, but I treated it like a tiresome fly buzzing over my head and refusing to let me rest, buzzing and buzzing and buzzing… Finally, I gave up rejecting the idea and decided at least to bring it up for consideration. It was about two months ago, during a pleasant walk with my two dogs in the woods. A beautiful winter, snow, frost, strong wind — and me and the dogs in the woods, walking one of my favorite paths and thinking (well, me, not the dogs) about building pipelines using other comprehensions than generator expressions.

This article is the result of this walk. I propose a generalization of generator pipelines into what I call comprehension pipelines, of which generator pipelines are just a specific case.

For consistency and clarity, I will use the same example of a generator pipeline I used in the previous article. Do note that I revised type annotations a little bit. The generator pipeline looked as follows¹:

import math

from typing import Generator

# Type aliases
Number = int | float
PowerType = int | float
PipelineItems = Iterable[Number]

def double(x: Number) -> Number:
return x**2

def power(x: Number, n: PowerType) -> Number:
return x**n

def add(x: Number, y: Number) -> Number:
return x + y

def calculate(x: Number) -> Number:
x = power(x, 0.5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**0.5)
x = round(x, 2)
x = add(x, 75)
return x

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

The get_generator_pipeline() function — previously named get_pipeline() — returns a generator pipeline; that is, a generator that lazily (on demand) calculates the pipeline for the subsequent elements of the items iterable. We can evaluate the generator any way we want, e.g., using list():

>>> items = [1.12, 2.05, 1.122, -0.220002, 7.0036]
>>> pipeline = get_generator_pipeline(items)
>>> list(pipeline)

If we want to get a list, however, it makes no sense to use a generator expression — the corresponding list comprehension would do better! So, why should we use a generator pipeline and not a list-comprehension pipeline from the very beginning?

Generator pipelines have one big advantage, and it’s the same as the main advantage of generators in general: lazy evaluation. When the items iterable is huge, or when the the pipeline’s steps produce large objects, a generator pipeline will help us avoid problems with running out of memory.

But what if the items iterable is short and the output iterable does not consume much memory? Why should we worry about memory when we’re certain there’s nothing to worry about? What’s more, we know that a generator expression can be slower than the corresponding list comprehension — with one exception, when the number (or size) of items is too big to keep and process all of them in the memory.

Consider the following example, quite different from the above one:

  • We have paths, a list of paths to files.
  • In each step of the pipeline, a text file is read from a path, and the text is processed.
  • As a result, an iterable of the processed texts is returned.

In this case, the output iterable will be far larger than the input iterable. Thus, a generator pipeline will work best here when the number of paths is huge — because it would be memory-inefficient, if possible at all, to return a long list of long texts. We need the pipeline to return a generator; hence, a generator pipeline.

Now, imagine another pipeline. We have the same iterable of paths, but our processing of the texts is different. Before, we returned long texts. Now, the only information we need from each text is whether or not the word “Python” occurs in a text; hence, for each text we will need a Boolean value and nothing more. So, for the list of paths, we will get a list of the same length with Boolean values. What’s the advantage of the generator pipeline here? None.

What’s more, returning just a Boolean value would make little sense, if any: it’d be difficult to link a particular value with the corresponding text. Hence, it’d be best to return a dictionary with paths as keys and their Boolean values as values. This would make a pipeline based on a dictionary. Alternatively, we could return a list of paths with the word “Python” in the text; such output, however, would omit the other paths, and so we would lose part of the information — and sometimes we may need it.

We’ve finally come to the main topic of this article: comprehension pipelines, and how to build them. The examples above showed how we can decide whether or not we need a generator pipeline or a different type of pipeline. In addition to generator pipelines, we can build

  • list-comprehension pipelines, aka listcomp pipelines
  • set-comprehension pipelines, aka setcomp pipelines
  • dictionary-comprehension pipelines, aka dictcomp pipelines

Below, I will rewrite the generator pipeline that uses the calculate() function using these three other types of comprehension pipelines. For the sake of completeness, I will repeat the code of the generator pipeline. I will also change the imports from typing, as this time we need more types than before, when we created only a generator pipeline.

from typing import Dict, Generator, Iterable, List, Set

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

def get_listcomp_pipeline(items: PipelineItems) -> List[Number]:
"""Create listcomp pipeline applying calculate() to each item."""
return [calculate(x_i) for x_i in items]

def get_setcomp_pipeline(items: PipelineItems) -> Set[Number]:
"""Create setcomp pipeline applying calculate() to each item."""
return {calculate(x_i) for x_i in items}

def get_dictcomp_pipeline(items: PipelineItems) -> Dict[Number, Number]:
"""Create dictcomp pipeline using calculate() for items.

Items are dict keys with calculate(item) being
the corresponding value.
"""
return {x_i: calculate(x_i) for x_i in items}

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

I used a particular version of a dictionary pipeline, but note that we can build also other versions, depending on what we want to use as the dictionary’s keys. We will see an example in a moment.

Below, we will summarize these basic types of pipelines, including the generator pipeline. Do remember that they differ in the output, as all of them can take the input of the same type — any iterable will do.

Generator pipeline

  • Takes any iterable (items) as input.
  • Returns a generator as a pipeline.
  • Can use a generator expression of any form and complexity (e.g., it can include several levels of if filters and for loops).
  • Can be evaluated on demand.

List-comprehension pipeline, aka listcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a list comprehension as a pipeline and so returns a list.
  • Can use a list comprehension of any complexity.
  • Is evaluated greedily.

Set-comprehension pipeline, aka setcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a set comprehension as a pipeline and so returns a set.
  • Can use a set comprehension of any complexity.
  • As the final output is a set, it will contain unique results. So, if two or more items from the iterable return the same output, its repeated instances will be skipped.
  • Is evaluated greedily.

Dictionary-comprehension pipeline, aka dictcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a dictionary comprehension as a pipeline and so returns a dictionary.
  • Can use a dictionary comprehension of any complexity.
  • For a particular item, a key-value pair is returned; the key does not have to be item — it can be anything that results from item or any processing of it.
  • Since the pipeline returns a dictionary, you should use unique keys; otherwise, the results for the same keys will be overwritten and the last key-value pair will be kept.
  • Is evaluated greedily.

Here, let’s see how the above pipelines behave. We will do it for the following iterable:

>>> items = (1, 1.0, 10, 50.03, 100)

I show the examples as doctests. You can read more about this fantastic built-in Python package for documentation testing in the following Towards Data Science article:

In the appendix at the end of the article, you will find the whole script for this exercise.

Generator pipeline

>>> gen_pipeline = get_generator_pipeline(items)
>>> gen_pipeline # doctest: +ELLIPSIS
<generator object get_generator_pipeline.<locals>.<genexpr> at 0x7...>

As expected, the generator pipeline returns a generator. So, for the moment, we cannot see the output, and to do so, we need to evaluate its values. How to do that depends on the pipeline and the problem being solved. Here, we will use a good ol’ for loop:

>>> for i in gen_pipeline:
... print(i)
245.77
245.77
560.77
3924.49
12620.77

Do remember that gen_pipeline is a regular generator, even if created using a generator pipeline. As a generator, after evaluating (which we did in the for loop above), it’s empty. It’s still there, but you cannot use it to see the output anymore:

>>> gen_pipeline # doctest: +ELLIPSIS
<generator object get_generator_pipeline.<locals>.<genexpr> at 0x7...>
>>> next(gen_pipeline)
Traceback (most recent call last):
...
StopIteration

Listcomp pipeline

>>> list_pipeline = get_listcomp_pipeline(items)
>>> list_pipeline
[245.77, 245.77, 560.77, 3924.49, 12620.77]

A listcomp pipeline evaluates the pipeline greedily, so when being created. We can already see the results, and of course you can do that as many times as you want, unlike the generator pipeline above.

Like before, we can see that the first two values are exactly the same. This should be expected, as the first two elements of items are the same… or aren’t they? The first one is an integer, 1, while the second one is a float, 1.0. Theoretically, these are not the same objects, as they have different types. Python, however, treats them as equal:

>>> 1 == 1.0
True

So, how will setcomp and dictcomp pipelines behave? We’ll see that below.

Setcomp pipeline

>>> set_pipeline = get_setcomp_pipeline(items)
>>> set_pipeline
{560.77, 3924.49, 245.77, 12620.77}

Ha! Note that while items contain five elements, the output above contains only four. This is not unexpected — as we saw above, Python treats 1 and 1.0 as equal, so the results of evaluating calculate(x) for these two values are the same. And since they are the same, the resulting set contains only one output value, that is, 560.77.

Remember this when using sets and setcomp pipelines. Thus, use setcomp pipelines when you want to achieve such behavior — in other words, use this type of pipeline when you want to keep only unique results.

Dictcomp pipelines

>>> dict_pipeline = get_dictcomp_pipeline(items)
{1: 245.77, 10: 560.77, 50.03: 3924.49, 100: 12620.77}

Like with sets, we got four elements of the resulting dictionary. As you see, when you want to use both 1 and 1.0 as keys in a dictionary, they are joined into one key, 1 in our case. If this is what you need to get, you’re done here.

What if you need the both of them? You can create string keys, for instance. Does indeed Python treat str(1) and str(1.0) as different? Let’s see:

>>> str(1) != str(1.0)
True

Yes, it does! We need to redefine the pipeline function, then:

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

Let’s see this new dictcomp pipeline in action:

>>> dict_str_pipeline = get_dictcomp_pipeline_str(items)
>>> dict_str_pipeline
{'1': 245.77, '1.0': 245.77, '10': 560.77, '50.03': 3924.49, '100': 12620.77}

The resulting dictionary has five elements, as we wanted it to have.

In this article, I proposed the generalization of generator pipelines to comprehension pipelines. While generator expressions are commonly used to create pipelines, the resulting generator pipelines are a specific case of comprehension pipelines. When you create a pipeline, consider which type of pipeline best represents your needs — and use it. No need to stick to generator pipelines only because the term “generator pipelines” is common in the Python community. You’re free to use whatever suits your goal.

In this article, we used simple examples. I did this for purpose: this simplicity helped us focus on the main topic of this article — comprehension pipelines. In the future, I plan to show you more advanced examples representing real-life scenarios.

Note that the functions creating the final comprehension — in our examples, get_generator_pipeline(), get_listcomp_pipeline(), get_setcomp_pipeline(), get_dictcomp_pipeline() and get_dictcomp_pipeline_str() — create only the final step of the pipeline. The actual pipeline, however, is hidden in the function that these functions call; in our case, this is the calculate() function. Let’s return to this function for the moment:

def calculate(x: Number) -> Number:
x = power(x, .5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**.5)
x = round(x, 2)
x = add(x, 75)
return x

Do you see what I mean? Our pipeline consists of the functions power(), double(), add(), power() again, add() again, round(), and add() once more. This is where all the steps of the pipeline are applied, and the functions that create the output simply call this function in a way that suits your needs.

Remember that if the basic functions (the first four of the list from the previous sentence) do not suit your needs, you can create a new function, as we did above when defining the get_dictcomp_pipeline_str() function. This example showed that we’re not limited to the base versions of comprehension pipelines: you can do whatever you want, if only this is correct.

¹ If you work with Python in version below 3.10, the code will not work. This is because of typing’s union operator |, which can be used instead of Union. It was added in Python 3.10. So, if you have older Python,replace these two lines:

Number = int | float
PowerType = int | float

with these three lines:

Number = Union[int, float]
PowerType = Union[int, float]

This will work.

Below, you will find the full code of the script used in this article. As mentioned in the above footnote, in older versions of Python, you may need to replace int | float with Union[int, float], certainly after importing Union from typing.

import math

from typing import Dict, Generator, Iterable, List, Set

Number = int | float
PowerType = int | float
PipelineItems = Iterable[Number]

def double(x: Number) -> Number:
return x**2

def power(x: Number, n: PowerType) -> Number:
return x**n

def add(x: Number, y: Number) -> Number:
return x + y

def calculate(x: Number) -> Number:
x = power(x, 0.5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**0.5)
x = round(x, 2)
x = add(x, 75)
return x

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

def get_listcomp_pipeline(items: PipelineItems) -> List[Number]:
"""Create listcomp pipeline applying calculate() to each item."""
return [calculate(x_i) for x_i in items]

def get_setcomp_pipeline(items: PipelineItems) -> Set[Number]:
"""Create setcomp pipeline applying calculate() to each item."""
return {calculate(x_i) for x_i in items}

def get_dictcomp_pipeline(items: PipelineItems) -> Dict[Number, Number]:
"""Create dictcomp pipeline using calculate() for items.

Items are dict keys with calculate(item) being the corresponding value.
"""
return {x_i: calculate(x_i) for x_i in items}

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

if __name__ == "__main__":
items = (1, 1.0, 10, 50.03, 100)
gen_pipeline = get_generator_pipeline(items)
list_pipeline = get_listcomp_pipeline(items)
set_pipeline = get_setcomp_pipeline(items)
dict_pipeline = get_dictcomp_pipeline(items)
dict_str_pipeline = get_dictcomp_pipeline_str(items)

# Generator pipeline
# Note that we need to evaluate it to see the output,
# hence the for loop.
print(gen_pipeline)
for i in gen_pipeline:
print(i)

# Listcomp pipeline
print(list_pipeline)

# Setcomp pipeline
print(set_pipeline)

# Dictcomp pipeline
print(dict_pipeline)

# Dictcomp pipeline with strings as keys
print(dict_str_pipeline)


PYTHON PROGRAMMING

Comprehension pipelines are a Python-specific idea for building pipelines

Comprehension pipelines take you straight to the goal. Photo by Anika Huizinga on Unsplash

Generator pipelines offer a Pythonic way to create software pipelines, that is, chains of operations in which each operation but the first one takes the output of the previous operation as its input:

They enable you to apply transforming programming as described by Thomas and Hunt in their great book The Pragmatic Programmer:

A typical generator pipeline in Python uses a generator at each step of the pipeline. In other words, each step of the pipeline is constructed as a generator. Thomas and Hunt discuss pipelines achieved via the pipe operator, available in many programming languages. While Python does not have a built-in pipe operator, it can be easily created because some operators are available for overloading in class definitions. We can see this done in the Pipe Python package, which uses the | operator.

In the above-mentioned article, I showed that using generators to create each step of the pipeline can introduce a visual clutter that decreases code readability. In addition, when each run of a pipeline is fast, this approach shows poor performance. Therefore, I proposed an alternative, efficient way of building generator pipelines that is more readable than the classical generator pipeline. The method combines function composition with a generator. In the future, I will show you how to create pipelines using a pipe operator.

Nonetheless, generator expressions are a special case of comprehensions, as I wrote in this article:

So, why should we limit ourselves to generator pipelines? Why not listcomp pipelines or dictcomp pipelines or setcomp pipelines?

This question has been bothering me for some time, but I treated it like a tiresome fly buzzing over my head and refusing to let me rest, buzzing and buzzing and buzzing… Finally, I gave up rejecting the idea and decided at least to bring it up for consideration. It was about two months ago, during a pleasant walk with my two dogs in the woods. A beautiful winter, snow, frost, strong wind — and me and the dogs in the woods, walking one of my favorite paths and thinking (well, me, not the dogs) about building pipelines using other comprehensions than generator expressions.

This article is the result of this walk. I propose a generalization of generator pipelines into what I call comprehension pipelines, of which generator pipelines are just a specific case.

For consistency and clarity, I will use the same example of a generator pipeline I used in the previous article. Do note that I revised type annotations a little bit. The generator pipeline looked as follows¹:

import math

from typing import Generator

# Type aliases
Number = int | float
PowerType = int | float
PipelineItems = Iterable[Number]

def double(x: Number) -> Number:
return x**2

def power(x: Number, n: PowerType) -> Number:
return x**n

def add(x: Number, y: Number) -> Number:
return x + y

def calculate(x: Number) -> Number:
x = power(x, 0.5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**0.5)
x = round(x, 2)
x = add(x, 75)
return x

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

The get_generator_pipeline() function — previously named get_pipeline() — returns a generator pipeline; that is, a generator that lazily (on demand) calculates the pipeline for the subsequent elements of the items iterable. We can evaluate the generator any way we want, e.g., using list():

>>> items = [1.12, 2.05, 1.122, -0.220002, 7.0036]
>>> pipeline = get_generator_pipeline(items)
>>> list(pipeline)

If we want to get a list, however, it makes no sense to use a generator expression — the corresponding list comprehension would do better! So, why should we use a generator pipeline and not a list-comprehension pipeline from the very beginning?

Generator pipelines have one big advantage, and it’s the same as the main advantage of generators in general: lazy evaluation. When the items iterable is huge, or when the the pipeline’s steps produce large objects, a generator pipeline will help us avoid problems with running out of memory.

But what if the items iterable is short and the output iterable does not consume much memory? Why should we worry about memory when we’re certain there’s nothing to worry about? What’s more, we know that a generator expression can be slower than the corresponding list comprehension — with one exception, when the number (or size) of items is too big to keep and process all of them in the memory.

Consider the following example, quite different from the above one:

  • We have paths, a list of paths to files.
  • In each step of the pipeline, a text file is read from a path, and the text is processed.
  • As a result, an iterable of the processed texts is returned.

In this case, the output iterable will be far larger than the input iterable. Thus, a generator pipeline will work best here when the number of paths is huge — because it would be memory-inefficient, if possible at all, to return a long list of long texts. We need the pipeline to return a generator; hence, a generator pipeline.

Now, imagine another pipeline. We have the same iterable of paths, but our processing of the texts is different. Before, we returned long texts. Now, the only information we need from each text is whether or not the word “Python” occurs in a text; hence, for each text we will need a Boolean value and nothing more. So, for the list of paths, we will get a list of the same length with Boolean values. What’s the advantage of the generator pipeline here? None.

What’s more, returning just a Boolean value would make little sense, if any: it’d be difficult to link a particular value with the corresponding text. Hence, it’d be best to return a dictionary with paths as keys and their Boolean values as values. This would make a pipeline based on a dictionary. Alternatively, we could return a list of paths with the word “Python” in the text; such output, however, would omit the other paths, and so we would lose part of the information — and sometimes we may need it.

We’ve finally come to the main topic of this article: comprehension pipelines, and how to build them. The examples above showed how we can decide whether or not we need a generator pipeline or a different type of pipeline. In addition to generator pipelines, we can build

  • list-comprehension pipelines, aka listcomp pipelines
  • set-comprehension pipelines, aka setcomp pipelines
  • dictionary-comprehension pipelines, aka dictcomp pipelines

Below, I will rewrite the generator pipeline that uses the calculate() function using these three other types of comprehension pipelines. For the sake of completeness, I will repeat the code of the generator pipeline. I will also change the imports from typing, as this time we need more types than before, when we created only a generator pipeline.

from typing import Dict, Generator, Iterable, List, Set

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

def get_listcomp_pipeline(items: PipelineItems) -> List[Number]:
"""Create listcomp pipeline applying calculate() to each item."""
return [calculate(x_i) for x_i in items]

def get_setcomp_pipeline(items: PipelineItems) -> Set[Number]:
"""Create setcomp pipeline applying calculate() to each item."""
return {calculate(x_i) for x_i in items}

def get_dictcomp_pipeline(items: PipelineItems) -> Dict[Number, Number]:
"""Create dictcomp pipeline using calculate() for items.

Items are dict keys with calculate(item) being
the corresponding value.
"""
return {x_i: calculate(x_i) for x_i in items}

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

I used a particular version of a dictionary pipeline, but note that we can build also other versions, depending on what we want to use as the dictionary’s keys. We will see an example in a moment.

Below, we will summarize these basic types of pipelines, including the generator pipeline. Do remember that they differ in the output, as all of them can take the input of the same type — any iterable will do.

Generator pipeline

  • Takes any iterable (items) as input.
  • Returns a generator as a pipeline.
  • Can use a generator expression of any form and complexity (e.g., it can include several levels of if filters and for loops).
  • Can be evaluated on demand.

List-comprehension pipeline, aka listcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a list comprehension as a pipeline and so returns a list.
  • Can use a list comprehension of any complexity.
  • Is evaluated greedily.

Set-comprehension pipeline, aka setcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a set comprehension as a pipeline and so returns a set.
  • Can use a set comprehension of any complexity.
  • As the final output is a set, it will contain unique results. So, if two or more items from the iterable return the same output, its repeated instances will be skipped.
  • Is evaluated greedily.

Dictionary-comprehension pipeline, aka dictcomp pipeline

  • Takes any iterable (items) as input.
  • Runs a dictionary comprehension as a pipeline and so returns a dictionary.
  • Can use a dictionary comprehension of any complexity.
  • For a particular item, a key-value pair is returned; the key does not have to be item — it can be anything that results from item or any processing of it.
  • Since the pipeline returns a dictionary, you should use unique keys; otherwise, the results for the same keys will be overwritten and the last key-value pair will be kept.
  • Is evaluated greedily.

Here, let’s see how the above pipelines behave. We will do it for the following iterable:

>>> items = (1, 1.0, 10, 50.03, 100)

I show the examples as doctests. You can read more about this fantastic built-in Python package for documentation testing in the following Towards Data Science article:

In the appendix at the end of the article, you will find the whole script for this exercise.

Generator pipeline

>>> gen_pipeline = get_generator_pipeline(items)
>>> gen_pipeline # doctest: +ELLIPSIS
<generator object get_generator_pipeline.<locals>.<genexpr> at 0x7...>

As expected, the generator pipeline returns a generator. So, for the moment, we cannot see the output, and to do so, we need to evaluate its values. How to do that depends on the pipeline and the problem being solved. Here, we will use a good ol’ for loop:

>>> for i in gen_pipeline:
... print(i)
245.77
245.77
560.77
3924.49
12620.77

Do remember that gen_pipeline is a regular generator, even if created using a generator pipeline. As a generator, after evaluating (which we did in the for loop above), it’s empty. It’s still there, but you cannot use it to see the output anymore:

>>> gen_pipeline # doctest: +ELLIPSIS
<generator object get_generator_pipeline.<locals>.<genexpr> at 0x7...>
>>> next(gen_pipeline)
Traceback (most recent call last):
...
StopIteration

Listcomp pipeline

>>> list_pipeline = get_listcomp_pipeline(items)
>>> list_pipeline
[245.77, 245.77, 560.77, 3924.49, 12620.77]

A listcomp pipeline evaluates the pipeline greedily, so when being created. We can already see the results, and of course you can do that as many times as you want, unlike the generator pipeline above.

Like before, we can see that the first two values are exactly the same. This should be expected, as the first two elements of items are the same… or aren’t they? The first one is an integer, 1, while the second one is a float, 1.0. Theoretically, these are not the same objects, as they have different types. Python, however, treats them as equal:

>>> 1 == 1.0
True

So, how will setcomp and dictcomp pipelines behave? We’ll see that below.

Setcomp pipeline

>>> set_pipeline = get_setcomp_pipeline(items)
>>> set_pipeline
{560.77, 3924.49, 245.77, 12620.77}

Ha! Note that while items contain five elements, the output above contains only four. This is not unexpected — as we saw above, Python treats 1 and 1.0 as equal, so the results of evaluating calculate(x) for these two values are the same. And since they are the same, the resulting set contains only one output value, that is, 560.77.

Remember this when using sets and setcomp pipelines. Thus, use setcomp pipelines when you want to achieve such behavior — in other words, use this type of pipeline when you want to keep only unique results.

Dictcomp pipelines

>>> dict_pipeline = get_dictcomp_pipeline(items)
{1: 245.77, 10: 560.77, 50.03: 3924.49, 100: 12620.77}

Like with sets, we got four elements of the resulting dictionary. As you see, when you want to use both 1 and 1.0 as keys in a dictionary, they are joined into one key, 1 in our case. If this is what you need to get, you’re done here.

What if you need the both of them? You can create string keys, for instance. Does indeed Python treat str(1) and str(1.0) as different? Let’s see:

>>> str(1) != str(1.0)
True

Yes, it does! We need to redefine the pipeline function, then:

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

Let’s see this new dictcomp pipeline in action:

>>> dict_str_pipeline = get_dictcomp_pipeline_str(items)
>>> dict_str_pipeline
{'1': 245.77, '1.0': 245.77, '10': 560.77, '50.03': 3924.49, '100': 12620.77}

The resulting dictionary has five elements, as we wanted it to have.

In this article, I proposed the generalization of generator pipelines to comprehension pipelines. While generator expressions are commonly used to create pipelines, the resulting generator pipelines are a specific case of comprehension pipelines. When you create a pipeline, consider which type of pipeline best represents your needs — and use it. No need to stick to generator pipelines only because the term “generator pipelines” is common in the Python community. You’re free to use whatever suits your goal.

In this article, we used simple examples. I did this for purpose: this simplicity helped us focus on the main topic of this article — comprehension pipelines. In the future, I plan to show you more advanced examples representing real-life scenarios.

Note that the functions creating the final comprehension — in our examples, get_generator_pipeline(), get_listcomp_pipeline(), get_setcomp_pipeline(), get_dictcomp_pipeline() and get_dictcomp_pipeline_str() — create only the final step of the pipeline. The actual pipeline, however, is hidden in the function that these functions call; in our case, this is the calculate() function. Let’s return to this function for the moment:

def calculate(x: Number) -> Number:
x = power(x, .5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**.5)
x = round(x, 2)
x = add(x, 75)
return x

Do you see what I mean? Our pipeline consists of the functions power(), double(), add(), power() again, add() again, round(), and add() once more. This is where all the steps of the pipeline are applied, and the functions that create the output simply call this function in a way that suits your needs.

Remember that if the basic functions (the first four of the list from the previous sentence) do not suit your needs, you can create a new function, as we did above when defining the get_dictcomp_pipeline_str() function. This example showed that we’re not limited to the base versions of comprehension pipelines: you can do whatever you want, if only this is correct.

¹ If you work with Python in version below 3.10, the code will not work. This is because of typing’s union operator |, which can be used instead of Union. It was added in Python 3.10. So, if you have older Python,replace these two lines:

Number = int | float
PowerType = int | float

with these three lines:

Number = Union[int, float]
PowerType = Union[int, float]

This will work.

Below, you will find the full code of the script used in this article. As mentioned in the above footnote, in older versions of Python, you may need to replace int | float with Union[int, float], certainly after importing Union from typing.

import math

from typing import Dict, Generator, Iterable, List, Set

Number = int | float
PowerType = int | float
PipelineItems = Iterable[Number]

def double(x: Number) -> Number:
return x**2

def power(x: Number, n: PowerType) -> Number:
return x**n

def add(x: Number, y: Number) -> Number:
return x + y

def calculate(x: Number) -> Number:
x = power(x, 0.5)
x = double(x)
x = add(x, 12)
x = power(x, 2)
x = add(x, math.pi**0.5)
x = round(x, 2)
x = add(x, 75)
return x

def get_generator_pipeline(
items: PipelineItems,
) -> Generator[Number, None, None]:
"""Create generator pipeline applying calculate() to each item."""
return (calculate(x_i) for x_i in items)

def get_listcomp_pipeline(items: PipelineItems) -> List[Number]:
"""Create listcomp pipeline applying calculate() to each item."""
return [calculate(x_i) for x_i in items]

def get_setcomp_pipeline(items: PipelineItems) -> Set[Number]:
"""Create setcomp pipeline applying calculate() to each item."""
return {calculate(x_i) for x_i in items}

def get_dictcomp_pipeline(items: PipelineItems) -> Dict[Number, Number]:
"""Create dictcomp pipeline using calculate() for items.

Items are dict keys with calculate(item) being the corresponding value.
"""
return {x_i: calculate(x_i) for x_i in items}

def get_dictcomp_pipeline_str(items: PipelineItems) -> Dict[str, Number]:
"""Create dictcomp pipeline using calculate() for items.

str(item) are dict keys with calculate(item) being
the corresponding value.
"""
return {str(x_i): calculate(x_i) for x_i in items}

if __name__ == "__main__":
items = (1, 1.0, 10, 50.03, 100)
gen_pipeline = get_generator_pipeline(items)
list_pipeline = get_listcomp_pipeline(items)
set_pipeline = get_setcomp_pipeline(items)
dict_pipeline = get_dictcomp_pipeline(items)
dict_str_pipeline = get_dictcomp_pipeline_str(items)

# Generator pipeline
# Note that we need to evaluate it to see the output,
# hence the for loop.
print(gen_pipeline)
for i in gen_pipeline:
print(i)

# Listcomp pipeline
print(list_pipeline)

# Setcomp pipeline
print(set_pipeline)

# Dictcomp pipeline
print(dict_pipeline)

# Dictcomp pipeline with strings as keys
print(dict_str_pipeline)

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment