Python Inheritance: Should You Inherit From dict or UserDict? | by Marcin Kozak | May, 2023
In his fantastic book Fluent Python. 2nd ed., Luciano Ramalho explains why you should not create custom classes inheriting from dict
. The reason behind this rule, strange at the first glance, is simple but critical: dict
is a highly optimized type implemented in C, and it wouldn’t call the methods you overload in your subclass of dict
.
That would be a nasty surprise, wouldn’t it? Let’s see this in an example. Imagine you want to create a dictionary-like class in which the provided values will be converted to their string representation. Let’s try to do this by subclassing the dict
built-in type:
class StringDict(dict):
def __setitem__(self, key, value):
super().__setitem__(key, str(value))
This seems like perfectly valid Python code. Let’s see how this works:
>>> class StringDict(dict):
... def __setitem__(self, key, value):
... super().__setitem__(key, str(value))
...
>>> mydict = StringDict(first=1, second=2, third=3)
>>> mydict
{'first': 1, 'second': 2, 'third': 3}
Well, it does not work at all — or rather, this __setitem__
method does not work at all. We wanted to get values converted to strings but they are not. Nonetheless, we don’t see any error; the class itself works somehow — in fact, it works just like a regular dictionary would. (Or rather, it provides the same results but slower; we will discuss this later on.)
To get what you want, you should subclass UserDict
:
>>> from collections import UserDict
>>> class StringUserDict(UserDict):
... def __setitem__(self, key, value):
... super().__setitem__(key, str(value))
...
>>> mydict = StringUserDict(first=1, second=2, third=3)
>>> mydict
{'first': '1', 'second': '2', 'third': '3'}
As you see the only thing we changed in the definition is subclassing UserDict
instead of dict
.
So now you know. It’s enough to use UserDict
. Great.
Is it?
Hold on. Let’s think. There are a couple of things we should consider before deciding that using UserDict
instead of dict
is so great.
First, we know that Python’s built-in types are highly optimized because they are implemented in C, and that this implementation itself is highly optimized.
Second, we know that we shouldn’t subclass dict
because its methods implemented in C will not call the overwritten methods implemented in Python.
And third, an easy thing to check, collections.UserDict
is implemented in Python. In Linux, you will find its definition here:
In this context, the following question asks itself: If so, will my custom class subclassing UserDict
be performant?
My immediate guess is that no, it won’t. The optimization of dict
comes from C implementation, and UserDict
is implemented in Python. Why should it be optimized whatsoever? We will check this in the following section.
For benchmarks, we will use the standard-library timeit
module. You can read about it here:
To make benchmarks a little simpler and structurized, let us define a simple function to run time benchmarks for two or more code snippets:
import rounder
import timeit
import pprintdef compare(
__snippet1,
__snippet2,
*args,
number=10_000_000,
repeat=7,
setup="from collections import UserDict"):
snippets = [__snippet1, __snippet2, *args]
results = {}
for i, snippet in enumerate(snippets):
name = snippet if len(snippet) < 30 else f"snippet {i + 1}"
results[name] = min(timeit.repeat(
snippet, number=number, repeat=repeat, setup=setup
)) / number,
results = rounder.signif_object(results, digits=4)
pprint.pprint(results)
A couple of things:
- The function uses the
rounder
package, to round all numbers in the dictionary to four significant digits; you can read more about it here:
__snippet1
and__snippet2
are positional-only arguments, so you cannot call them by name. This is thanks to the double-underscore prefix.- Thanks to
*args
after the two snippet arguments, you can provide more snippets, also as positional keywords; you can use as many of them as you want. - All the remaining arguments are keyword-only. Here, this is achieved by putting them after
*args
.¹ - The function report the results as the mean of all calls from the quickest of seven runs. Hence, all the results are directly comparable, even if the
timeit.repeat()
function was used with differentnumber
values. - The function return implicit
None
and prints a short report of the benchmarks, using thepprint()
function from the standard-librarypprint
module. More often than not, avoid replacing return statements with prints², unless your function/method is a one that prints.
Okay, we’ll see the function in action in a second. Let’s start off with comparing how quickly dict()
and UserDict()
create instances. Nonetheless, we can instantiate a regular dictionary using two methods, dict()
and (significantly faster) {}
, so we’ll include them both:
>>> compare("UserDict()", "dict()", "{}")
{'UserDict()': (1.278e-07,), 'dict()': (3.826e-08,), '{}': (1.518e-08,)}
For all benchmarks in this article, I used Python 3.11 on a Windows 10 machine, in WSL 1, 32 GM of RAM and four physical (eight logical) cores. The benchmark shows that to create a new instance, UserDict
is twice slower than dict
.
As mentioned above, the values we see in the results dictionary represent the time of creating one instance of either UserDict
or regular dict
(created using two methods). Clearly, to create an instance of UserDict
, you need more time, about 1.3e-07
of a second — while {}
needs about 1.5e-08
. Not a big difference? Note when you need to create a single instance, but imagine creating millions of dictionaries. So, to create a regular dictionary, you need more or less 3–8 times less time than to create a UserDict
, depending on the instantiation method.
Let’s see how this works with bigger dictionaries. We’ll create a simple numerical dictionary via a dictionary comprehension. Since UserDict
does not enable us to use the dictcomp syntax (another drawback), the only thing we can do is to create first a regular dictionary using the dictcomp syntax and then convert it to a UserDict
instance:
>>> compare(
"UserDict({i: i**2 for i in range(1000)})",
"{i: i**2 for i in range(1000)}",
number=100_000)
{'snippet 1': (0.0001316,), 'snippet 2': (5.027e-05,)}
A regular dictionary is about 2.5 times faster. That seems quite amazing given that creating an empty dictionary was even faster. We have to remember that the results of such benchmarks can vary from run to run. But we also have to remember that when we use many repetitions (here, a hundred thousand — we could use more), the variation should be relatively small.
Will the size matter when we compare lookup time? Basically key lookup is unrelated to dictionary size, so supposedly, we should be getting similar results irrespective of dict size.
First, a small dictionary:
>>> setup = """from collections import UserDict
... d = {'x': 1, 'y': 2, 'z': 3}
... ud = UserDict(d)
... """
>>> compare("ud['x']", "d['x']", setup=setup)
{"ud['x']": (4.754e-08,), "d['x']": (1.381e-08,)}
Okay, so about 3.5 times slower. Now, for a bigger one, of 10_000
key-value pairs:
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10_000)}
... ud = UserDict(d)
... """
>>> compare("ud['9999']", "d['9999']", setup=setup, number=1_000_000)
{"ud['9999']": (7.785e-08,), "d['9999']": (2.787e-08,)}
and for one of 10 million elements:
>>> compare("ud['9999']", "d['9999']", setup=setup, number=100_000)
{"ud['9999']": (6.662e-08,), "d['9999']": (2.499e-08,)}
So, the size indeed does not matter, and every time, dict
was more or less 3–3.5 times faster. Let’s see, this time just for medium-sized dictionaries, how it works with an unexisting key:
>>> compare(
... "ud.get('a', None)",
... "d.get('a', None)",
... setup=setup,
... number=1_000_000)
{"d.get('a', None)": (4.318e-08,), "ud.get('a', None)": (4.525e-07,)}
This time it’s a bigger difference, dict
being over 10 times faster.
What about checking if a key is in the dictionary?
>>> compare("'a' in ud", "'a' in d", setup=setup, number=1_000_000)
{"'a' in d": (1.465e-08,), "'a' in ud": (4.562e-08,)}
So, again 3–3.5 times faster.
Now, let’s benchmark a frequent operation, that is, iterating over a dictionary; again, let’s check various sizes of the dictionary:
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=1_000_000
... )
{'for i, v in d.items(): pass': (1.726e-07,),
'for i, v in ud.items(): pass': (1.235e-06,)}>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10_000)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=10_000
... )
{'for i, v in d.items(): pass': (0.0001255,),
'for i, v in ud.items(): pass': (0.00112,)}
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(100_000)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=10_000
... )
{'for i, v in d.items(): pass': (0.001772,),
'for i, v in ud.items(): pass': (0.01718,)}
Okay, so for small dictionaries, dict
is about 7 times faster to loop over its key-value pairs available via the .items()
method. For medium-sized dictionaries (of 10 thousand elements in our experiment), about 9 times faster. For bigger dictionaries (of 100_000
elements), the result was similar, so looping itself — once it started — does not seem to depend on the type of dictionary.
Since it was a rather minor benchmark, we can conclude that a regular dictionary should be around 5–10 times faster to loop over its items than UserDict
.
To conclude the benchmarks
Maybe let’s stop here. We could conduct more benchmarks, but that’s not the point. I did not want to conduct a full comparison of dict
and UserDict
in terms of execution time; if you’re interested, you can play with the code to conduct a set of solid benchmarks. Instead, I wanted to shed some light on the issue and check if, as I expected based on the knowledge of UserDict
and dict
’s implementations, the former is significantly slower than the latter.
And it is — unless you consider 5–10 times slower as an insignificant number. So, if you only can, consider using regular dictionaries, not those subclassing UserDict
, unless you must change the behavior of dict
.
Ah… Why can’t we just subclass dict
?! Why?
Or… Can we?
Perhaps you’ve noticed that the rule of not subclassing dict
is related to the dict
methods implemented in C, which would not call the built-in dict
methods overloaded in Python. But what if you just want to add some functionalities to dict
, leaving the methods already implemented in C untouched?
That’s a very good question. And the answer is short and simple: Yes, you can do it! You can inherit from dict
; simply don’t overload dict
’s methods, that’s all.
The question is, will such a dict
-based class be as performant as dict
? Or rather as collections.UserDict
? To answer this question, we will have to run some more benchmarks.
Let’s imagine we keep some data in a dictionary, and we want to add method .summarize()
that calculates some summary statistics for the data. It could be something like this (just an example):
from collections.abc import Sequence
from typing import Callabledef try_calculate(func: Callable, *args, **kwargs):
"""Try calculations; when data are incorrect, return nan."""
try:
return func(*args, **kwargs)
except TypeError:
return float("nan")
class RichDict(dict):
measures = {
"sum": sum,
"n": len,
"mean": lambda x: sum(x) / len(x),
}
def summarize(self):
statistics = {}
for k, v in self.items():
if isinstance(v, str):
statistics[k] = {"n": len(v)}
elif isinstance(v, Sequence):
statistics[k] = {
name: try_calculate(func, v)
for name, func
in self.measures.items()
}
return statistics
RichDict
is a dict
with one more method: .summarize()
. The method does the following:
- It iterates over key-value pairs of the data (reached via the
.items
method). - When a value is a string,
statistics
includes only length and is returned as a dictionary with one key,n
. - When a value is a
Sequence
, the main summary statistics are calculated. The measures are kept as callables in a class attributeRichDict.measures
, which is a dictionary. - The method protects calculations: if a measure cannot be calculated, an exception is caught and
float("nan")
(for not-a-number) is returned as the results of calculations. That way, Python will not throw an error when attempting to calculate the mean of an empty list, for instance.
If you want to add a measure, you can easily do it:
RichDict.measures["min"] = min
RichDict.measures["max"] = max
If the function is more complicated, you can use a lambda
function:
RichDict.measures["max-to-mean"] = lambda x: max(x) / min(x)
or, preferably, first define a function and then assign it here:
def max_to_min(x: float) -> float:
return max(x) / min(x)RichDict.measures["max-to-mean"] = max_to_min
Note that since .measures
is a class attribute, all instances of RichDict
(those to be created, but also those already existing) will have the extended measures, with min
and max
statistics.
Just one example of RichDict
in action:
>>> d = RichDict(x=[1,4,5,7],
... y=[1,"1",2],
... z="Shout Bamalama!",
... f=10)
>>>
>>> stats = d.summarize()
>>> stats # doctest: NORMALIZE_WHITESPACE
{'x': {'sum': 17, 'n': 4, 'mean': 4.25, 'min': 1, 'max': 7, 'max-to-min': 7},
'y': {'sum': nan, 'n': 3, 'mean': nan, 'min': nan, 'max': nan, 'max-to-min': nan},
'z': {'n': 15}}
Above, the RichDict
class has one class attribute with measures to be used for sequence data; for strings, the .summarize()
methods calculates only one measure. Update the class so that it has two class attributes, measures_seq
and measures_str
, designed as measures
above. For strings, the .summarize()
method should calculate the measures the way it does for sequences, that is, using measures_str
.
You will find the solution in Appendix 1.
In the code, I used the standard-library doctests
module for documentation testing. If you’re interested in learning more about this interesting module, you can do so from this article:
Okay, now that we know that RichDict
works, we know we can subclass dict
. What we want to learn now is whether RichDict
’s added functionality, defined in Python (not in C, like the base code for dict
), adds some overhead time for the regular behavior of dict
. To this end, we will benchmark such behavior as creating a new RichDict
versus creating a new dict
, key lookup, and the like.
Let’s conduct similar benchmarks to those above, conducted for UserDict
. You can find the code for them in this GitHib gist. You will find there the setup
value used below.
>>> compare("UserDict()", "RichDict()", "dict()", setup=setup)
{'UserDict()': (2.236e-07,), 'RichDict()': (1.073e-07,), 'dict()': (5.892e-08,)}
As shown above, when it comes to creating an empty instance, RichDict
is significantly faster than UserDict
(about 2 times faster) and slower than dict
(about 2 times slower).
>>> compare(
... "UserDict({i: i**2 for i in range(1000)})",
... "RichDict({i: i**2 for i in range(1000)})",
... "{i: i**2 for i in range(1000)}",
... number=100_000,
... setup=setup)
{'snippet 1': (0.0001765,), # UserDict
'snippet 2': (6.845e-05,), # RichDict
'snippet 3': (5.388e-05,)} # dict
This time, RichDict
is about 2.5 times faster than UserDict
but a little slower (around 1.3 times) than dict
.
Below, you will find examples of more benchmarks, separated by blank lines for your convenience:
>>> setup += """d = {'x': 1, 'y': 2, 'z': 3}
... ud = UserDict(d)
... rd = RichDict(d)
... """
>>> compare("ud['x']", "rd['x']", "d['x']", setup=setup)
{"ud['x']": (5.111e-08,), rd['x']": (3.024e-08,), "d['x']": (1.475e-08,)}>>> compare(
... "'a' in ud",
... "'a' in rd",
... "'a' in d",
... setup=setup,
... number=1_000_000)
{"'a' in d": (1.366e-08,), # dict
"'a' in rd": (2.228e-08,), # RichDict
"'a' in ud": (4.436e-08,)} # UserDict
>>> compare(
... "ud.get('a', None)",
... "rd.get('a', None)",
... "d.get('a', None)",
... setup=setup,
... number=1_000_000)
{"d.get('a', None)": (1.935e-08,), # dict
"rd.get('a', None)": (3.016e-08,), # RichDict
"ud.get('a', None)": (5.125e-07,)} # UserDict
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in rd.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=1_000
... )
{'for i, v in d.items(): pass': (0.001783,),
'for i, v in rd.items(): pass': (0.001743,),
'for i, v in ud.items(): pass': (0.01627,)}
The numbers themselves say what we need, and so I will leave you with them for the moment.
To conclude the benchmarks
RichDict
is usually slower than dict
(though sometimes only very little) but faster than UserDict
.
Thus, if you just want to add some functionalities to dict
, without overwriting its built-in methods, you definitely can subclass dict
. I’d say this should be your preferred method, instead of subclassing collections.UserDict
, as the latter is significantly slower. Do remember that we’re talking about the situation when you do not need to change the regular behavior of a dictionary, just to add some new behavior.
Remember also that you will have to pay a tax for using a built-in type that way: Your class (in our example, RichDict
) will be slower than dict
. Still, however, it will be faster than UserDict
, which was created for this very aim: to let you inherit from… well, not from dict
, so rather to let you create a new type (class) that will have the same behavior as dict
. Using UserDict
, unfortunately, is rather costly, as its performance is significantly worse than that of dict
.
Let’s summarize our discussion about subclassing dict
and UserDict
. We’ve learned we have three options to choose from:
- Inherit from
UserDict
, when you want to overwrite the built-in behavior ofdict
. This will be the slowest option. - Inherit from
dict
, when you don’t want to overwrite the built-in behavior ofdict
, but to add new functionalities (methods). This will be a faster option than option 1. - Use the built-in
dict
type, without creating a custom class. If you need custom functionalities, you can implement them in functions that take adict
instance as an argument. This is the fastest option (see below).
We have not discussed the third option yet, as it does not include subclassing. Not much to discuss, though, as this is the simplest approach, one that uses a more procedural approach rather than OOP. On the one hand, using the summarize()
function in this approach would be only a little (if any at all) faster than using the RichDict.summarize()
method in option 2. This gist contains the corresponding benchmark code; in my machine, it provided a small but stable (from run to run) improvement in performance. On the other hand, all the other behavior of regular dict
, as we know, is significantly faster than that of RichDict
. Hence, generally, option 3 offers the fastest way to handle a dictionary with additional functionalities.
Thus, if performance matters, the most sensible seems to be the third option — that is, using a regular dictionary and implementing the additional required behavior in external functions. Depending on the situation, it can also be an option with the clearest code, as it does not require a custom data structure but uses a dictionary, one of the most common data structures in Python, combined with a function. More often than not, this means clearer code.
The second option means worse performance, as adding methods to dict
leads to overhead time of its behavior. As we know, option 3 gets rid of this overhead by moving the method outside the dictionary.
The first option is definitely the worst one in terms of performance. I think the only situations in which this option makes sense are those in which each of the following three conditions is met:
- performance does not matter
- you need to overwrite one or more built-in
dict
methods - the code will be clearer and easier to use thanks to creating a class that will join all the required functionalities
¹ I am planning to write a dedicated article to positional-only and keyword-only arguments. Once it’s published, I will link it here.
² By the way, in an interactive session, a return from the function would have the same effect (of course, when the results are not assigned). I ran the benchmarks in a script, not an interactive session, however.
Solution to the exercise
You can do it in various way. The below solution avoids repetition, but also makes it easy to add yet another type (to Sequence
and str
) to calculations.
from collections.abc import Sequence
from typing import Callableclass RichDict(dict):
measures_seq = {
"sum": sum,
"n": len,
"mean": lambda x: sum(x) / len(x),
}
measures_str = {
"n": len,
}
def summarize(self):
statistics = {}
for k, v in self.items():
if isinstance(v, str):
measures = self.measures_str
elif isinstance(v, Sequence):
measures = self.measures_seq
else:
continue
statistics[k] = {
name: try_calculate(func, v)
for name, func
in measures.items()
}
return statistics
NaN = float("nan")
def try_calculate(func: Callable, *args, **kwargs):
"""Try calculations and when the data are incorrect, return nan."""
try:
return func(*args, **kwargs)
except TypeError:
return NaN
In his fantastic book Fluent Python. 2nd ed., Luciano Ramalho explains why you should not create custom classes inheriting from dict
. The reason behind this rule, strange at the first glance, is simple but critical: dict
is a highly optimized type implemented in C, and it wouldn’t call the methods you overload in your subclass of dict
.
That would be a nasty surprise, wouldn’t it? Let’s see this in an example. Imagine you want to create a dictionary-like class in which the provided values will be converted to their string representation. Let’s try to do this by subclassing the dict
built-in type:
class StringDict(dict):
def __setitem__(self, key, value):
super().__setitem__(key, str(value))
This seems like perfectly valid Python code. Let’s see how this works:
>>> class StringDict(dict):
... def __setitem__(self, key, value):
... super().__setitem__(key, str(value))
...
>>> mydict = StringDict(first=1, second=2, third=3)
>>> mydict
{'first': 1, 'second': 2, 'third': 3}
Well, it does not work at all — or rather, this __setitem__
method does not work at all. We wanted to get values converted to strings but they are not. Nonetheless, we don’t see any error; the class itself works somehow — in fact, it works just like a regular dictionary would. (Or rather, it provides the same results but slower; we will discuss this later on.)
To get what you want, you should subclass UserDict
:
>>> from collections import UserDict
>>> class StringUserDict(UserDict):
... def __setitem__(self, key, value):
... super().__setitem__(key, str(value))
...
>>> mydict = StringUserDict(first=1, second=2, third=3)
>>> mydict
{'first': '1', 'second': '2', 'third': '3'}
As you see the only thing we changed in the definition is subclassing UserDict
instead of dict
.
So now you know. It’s enough to use UserDict
. Great.
Is it?
Hold on. Let’s think. There are a couple of things we should consider before deciding that using UserDict
instead of dict
is so great.
First, we know that Python’s built-in types are highly optimized because they are implemented in C, and that this implementation itself is highly optimized.
Second, we know that we shouldn’t subclass dict
because its methods implemented in C will not call the overwritten methods implemented in Python.
And third, an easy thing to check, collections.UserDict
is implemented in Python. In Linux, you will find its definition here:
In this context, the following question asks itself: If so, will my custom class subclassing UserDict
be performant?
My immediate guess is that no, it won’t. The optimization of dict
comes from C implementation, and UserDict
is implemented in Python. Why should it be optimized whatsoever? We will check this in the following section.
For benchmarks, we will use the standard-library timeit
module. You can read about it here:
To make benchmarks a little simpler and structurized, let us define a simple function to run time benchmarks for two or more code snippets:
import rounder
import timeit
import pprintdef compare(
__snippet1,
__snippet2,
*args,
number=10_000_000,
repeat=7,
setup="from collections import UserDict"):
snippets = [__snippet1, __snippet2, *args]
results = {}
for i, snippet in enumerate(snippets):
name = snippet if len(snippet) < 30 else f"snippet {i + 1}"
results[name] = min(timeit.repeat(
snippet, number=number, repeat=repeat, setup=setup
)) / number,
results = rounder.signif_object(results, digits=4)
pprint.pprint(results)
A couple of things:
- The function uses the
rounder
package, to round all numbers in the dictionary to four significant digits; you can read more about it here:
__snippet1
and__snippet2
are positional-only arguments, so you cannot call them by name. This is thanks to the double-underscore prefix.- Thanks to
*args
after the two snippet arguments, you can provide more snippets, also as positional keywords; you can use as many of them as you want. - All the remaining arguments are keyword-only. Here, this is achieved by putting them after
*args
.¹ - The function report the results as the mean of all calls from the quickest of seven runs. Hence, all the results are directly comparable, even if the
timeit.repeat()
function was used with differentnumber
values. - The function return implicit
None
and prints a short report of the benchmarks, using thepprint()
function from the standard-librarypprint
module. More often than not, avoid replacing return statements with prints², unless your function/method is a one that prints.
Okay, we’ll see the function in action in a second. Let’s start off with comparing how quickly dict()
and UserDict()
create instances. Nonetheless, we can instantiate a regular dictionary using two methods, dict()
and (significantly faster) {}
, so we’ll include them both:
>>> compare("UserDict()", "dict()", "{}")
{'UserDict()': (1.278e-07,), 'dict()': (3.826e-08,), '{}': (1.518e-08,)}
For all benchmarks in this article, I used Python 3.11 on a Windows 10 machine, in WSL 1, 32 GM of RAM and four physical (eight logical) cores. The benchmark shows that to create a new instance, UserDict
is twice slower than dict
.
As mentioned above, the values we see in the results dictionary represent the time of creating one instance of either UserDict
or regular dict
(created using two methods). Clearly, to create an instance of UserDict
, you need more time, about 1.3e-07
of a second — while {}
needs about 1.5e-08
. Not a big difference? Note when you need to create a single instance, but imagine creating millions of dictionaries. So, to create a regular dictionary, you need more or less 3–8 times less time than to create a UserDict
, depending on the instantiation method.
Let’s see how this works with bigger dictionaries. We’ll create a simple numerical dictionary via a dictionary comprehension. Since UserDict
does not enable us to use the dictcomp syntax (another drawback), the only thing we can do is to create first a regular dictionary using the dictcomp syntax and then convert it to a UserDict
instance:
>>> compare(
"UserDict({i: i**2 for i in range(1000)})",
"{i: i**2 for i in range(1000)}",
number=100_000)
{'snippet 1': (0.0001316,), 'snippet 2': (5.027e-05,)}
A regular dictionary is about 2.5 times faster. That seems quite amazing given that creating an empty dictionary was even faster. We have to remember that the results of such benchmarks can vary from run to run. But we also have to remember that when we use many repetitions (here, a hundred thousand — we could use more), the variation should be relatively small.
Will the size matter when we compare lookup time? Basically key lookup is unrelated to dictionary size, so supposedly, we should be getting similar results irrespective of dict size.
First, a small dictionary:
>>> setup = """from collections import UserDict
... d = {'x': 1, 'y': 2, 'z': 3}
... ud = UserDict(d)
... """
>>> compare("ud['x']", "d['x']", setup=setup)
{"ud['x']": (4.754e-08,), "d['x']": (1.381e-08,)}
Okay, so about 3.5 times slower. Now, for a bigger one, of 10_000
key-value pairs:
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10_000)}
... ud = UserDict(d)
... """
>>> compare("ud['9999']", "d['9999']", setup=setup, number=1_000_000)
{"ud['9999']": (7.785e-08,), "d['9999']": (2.787e-08,)}
and for one of 10 million elements:
>>> compare("ud['9999']", "d['9999']", setup=setup, number=100_000)
{"ud['9999']": (6.662e-08,), "d['9999']": (2.499e-08,)}
So, the size indeed does not matter, and every time, dict
was more or less 3–3.5 times faster. Let’s see, this time just for medium-sized dictionaries, how it works with an unexisting key:
>>> compare(
... "ud.get('a', None)",
... "d.get('a', None)",
... setup=setup,
... number=1_000_000)
{"d.get('a', None)": (4.318e-08,), "ud.get('a', None)": (4.525e-07,)}
This time it’s a bigger difference, dict
being over 10 times faster.
What about checking if a key is in the dictionary?
>>> compare("'a' in ud", "'a' in d", setup=setup, number=1_000_000)
{"'a' in d": (1.465e-08,), "'a' in ud": (4.562e-08,)}
So, again 3–3.5 times faster.
Now, let’s benchmark a frequent operation, that is, iterating over a dictionary; again, let’s check various sizes of the dictionary:
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=1_000_000
... )
{'for i, v in d.items(): pass': (1.726e-07,),
'for i, v in ud.items(): pass': (1.235e-06,)}>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(10_000)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=10_000
... )
{'for i, v in d.items(): pass': (0.0001255,),
'for i, v in ud.items(): pass': (0.00112,)}
>>> setup = """from collections import UserDict
... d = {str(i): i for i in range(100_000)}
... ud = UserDict(d)
... """
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=10_000
... )
{'for i, v in d.items(): pass': (0.001772,),
'for i, v in ud.items(): pass': (0.01718,)}
Okay, so for small dictionaries, dict
is about 7 times faster to loop over its key-value pairs available via the .items()
method. For medium-sized dictionaries (of 10 thousand elements in our experiment), about 9 times faster. For bigger dictionaries (of 100_000
elements), the result was similar, so looping itself — once it started — does not seem to depend on the type of dictionary.
Since it was a rather minor benchmark, we can conclude that a regular dictionary should be around 5–10 times faster to loop over its items than UserDict
.
To conclude the benchmarks
Maybe let’s stop here. We could conduct more benchmarks, but that’s not the point. I did not want to conduct a full comparison of dict
and UserDict
in terms of execution time; if you’re interested, you can play with the code to conduct a set of solid benchmarks. Instead, I wanted to shed some light on the issue and check if, as I expected based on the knowledge of UserDict
and dict
’s implementations, the former is significantly slower than the latter.
And it is — unless you consider 5–10 times slower as an insignificant number. So, if you only can, consider using regular dictionaries, not those subclassing UserDict
, unless you must change the behavior of dict
.
Ah… Why can’t we just subclass dict
?! Why?
Or… Can we?
Perhaps you’ve noticed that the rule of not subclassing dict
is related to the dict
methods implemented in C, which would not call the built-in dict
methods overloaded in Python. But what if you just want to add some functionalities to dict
, leaving the methods already implemented in C untouched?
That’s a very good question. And the answer is short and simple: Yes, you can do it! You can inherit from dict
; simply don’t overload dict
’s methods, that’s all.
The question is, will such a dict
-based class be as performant as dict
? Or rather as collections.UserDict
? To answer this question, we will have to run some more benchmarks.
Let’s imagine we keep some data in a dictionary, and we want to add method .summarize()
that calculates some summary statistics for the data. It could be something like this (just an example):
from collections.abc import Sequence
from typing import Callabledef try_calculate(func: Callable, *args, **kwargs):
"""Try calculations; when data are incorrect, return nan."""
try:
return func(*args, **kwargs)
except TypeError:
return float("nan")
class RichDict(dict):
measures = {
"sum": sum,
"n": len,
"mean": lambda x: sum(x) / len(x),
}
def summarize(self):
statistics = {}
for k, v in self.items():
if isinstance(v, str):
statistics[k] = {"n": len(v)}
elif isinstance(v, Sequence):
statistics[k] = {
name: try_calculate(func, v)
for name, func
in self.measures.items()
}
return statistics
RichDict
is a dict
with one more method: .summarize()
. The method does the following:
- It iterates over key-value pairs of the data (reached via the
.items
method). - When a value is a string,
statistics
includes only length and is returned as a dictionary with one key,n
. - When a value is a
Sequence
, the main summary statistics are calculated. The measures are kept as callables in a class attributeRichDict.measures
, which is a dictionary. - The method protects calculations: if a measure cannot be calculated, an exception is caught and
float("nan")
(for not-a-number) is returned as the results of calculations. That way, Python will not throw an error when attempting to calculate the mean of an empty list, for instance.
If you want to add a measure, you can easily do it:
RichDict.measures["min"] = min
RichDict.measures["max"] = max
If the function is more complicated, you can use a lambda
function:
RichDict.measures["max-to-mean"] = lambda x: max(x) / min(x)
or, preferably, first define a function and then assign it here:
def max_to_min(x: float) -> float:
return max(x) / min(x)RichDict.measures["max-to-mean"] = max_to_min
Note that since .measures
is a class attribute, all instances of RichDict
(those to be created, but also those already existing) will have the extended measures, with min
and max
statistics.
Just one example of RichDict
in action:
>>> d = RichDict(x=[1,4,5,7],
... y=[1,"1",2],
... z="Shout Bamalama!",
... f=10)
>>>
>>> stats = d.summarize()
>>> stats # doctest: NORMALIZE_WHITESPACE
{'x': {'sum': 17, 'n': 4, 'mean': 4.25, 'min': 1, 'max': 7, 'max-to-min': 7},
'y': {'sum': nan, 'n': 3, 'mean': nan, 'min': nan, 'max': nan, 'max-to-min': nan},
'z': {'n': 15}}
Above, the RichDict
class has one class attribute with measures to be used for sequence data; for strings, the .summarize()
methods calculates only one measure. Update the class so that it has two class attributes, measures_seq
and measures_str
, designed as measures
above. For strings, the .summarize()
method should calculate the measures the way it does for sequences, that is, using measures_str
.
You will find the solution in Appendix 1.
In the code, I used the standard-library doctests
module for documentation testing. If you’re interested in learning more about this interesting module, you can do so from this article:
Okay, now that we know that RichDict
works, we know we can subclass dict
. What we want to learn now is whether RichDict
’s added functionality, defined in Python (not in C, like the base code for dict
), adds some overhead time for the regular behavior of dict
. To this end, we will benchmark such behavior as creating a new RichDict
versus creating a new dict
, key lookup, and the like.
Let’s conduct similar benchmarks to those above, conducted for UserDict
. You can find the code for them in this GitHib gist. You will find there the setup
value used below.
>>> compare("UserDict()", "RichDict()", "dict()", setup=setup)
{'UserDict()': (2.236e-07,), 'RichDict()': (1.073e-07,), 'dict()': (5.892e-08,)}
As shown above, when it comes to creating an empty instance, RichDict
is significantly faster than UserDict
(about 2 times faster) and slower than dict
(about 2 times slower).
>>> compare(
... "UserDict({i: i**2 for i in range(1000)})",
... "RichDict({i: i**2 for i in range(1000)})",
... "{i: i**2 for i in range(1000)}",
... number=100_000,
... setup=setup)
{'snippet 1': (0.0001765,), # UserDict
'snippet 2': (6.845e-05,), # RichDict
'snippet 3': (5.388e-05,)} # dict
This time, RichDict
is about 2.5 times faster than UserDict
but a little slower (around 1.3 times) than dict
.
Below, you will find examples of more benchmarks, separated by blank lines for your convenience:
>>> setup += """d = {'x': 1, 'y': 2, 'z': 3}
... ud = UserDict(d)
... rd = RichDict(d)
... """
>>> compare("ud['x']", "rd['x']", "d['x']", setup=setup)
{"ud['x']": (5.111e-08,), rd['x']": (3.024e-08,), "d['x']": (1.475e-08,)}>>> compare(
... "'a' in ud",
... "'a' in rd",
... "'a' in d",
... setup=setup,
... number=1_000_000)
{"'a' in d": (1.366e-08,), # dict
"'a' in rd": (2.228e-08,), # RichDict
"'a' in ud": (4.436e-08,)} # UserDict
>>> compare(
... "ud.get('a', None)",
... "rd.get('a', None)",
... "d.get('a', None)",
... setup=setup,
... number=1_000_000)
{"d.get('a', None)": (1.935e-08,), # dict
"rd.get('a', None)": (3.016e-08,), # RichDict
"ud.get('a', None)": (5.125e-07,)} # UserDict
>>> compare(
... "for i, v in ud.items(): pass",
... "for i, v in rd.items(): pass",
... "for i, v in d.items(): pass",
... setup=setup,
... number=1_000
... )
{'for i, v in d.items(): pass': (0.001783,),
'for i, v in rd.items(): pass': (0.001743,),
'for i, v in ud.items(): pass': (0.01627,)}
The numbers themselves say what we need, and so I will leave you with them for the moment.
To conclude the benchmarks
RichDict
is usually slower than dict
(though sometimes only very little) but faster than UserDict
.
Thus, if you just want to add some functionalities to dict
, without overwriting its built-in methods, you definitely can subclass dict
. I’d say this should be your preferred method, instead of subclassing collections.UserDict
, as the latter is significantly slower. Do remember that we’re talking about the situation when you do not need to change the regular behavior of a dictionary, just to add some new behavior.
Remember also that you will have to pay a tax for using a built-in type that way: Your class (in our example, RichDict
) will be slower than dict
. Still, however, it will be faster than UserDict
, which was created for this very aim: to let you inherit from… well, not from dict
, so rather to let you create a new type (class) that will have the same behavior as dict
. Using UserDict
, unfortunately, is rather costly, as its performance is significantly worse than that of dict
.
Let’s summarize our discussion about subclassing dict
and UserDict
. We’ve learned we have three options to choose from:
- Inherit from
UserDict
, when you want to overwrite the built-in behavior ofdict
. This will be the slowest option. - Inherit from
dict
, when you don’t want to overwrite the built-in behavior ofdict
, but to add new functionalities (methods). This will be a faster option than option 1. - Use the built-in
dict
type, without creating a custom class. If you need custom functionalities, you can implement them in functions that take adict
instance as an argument. This is the fastest option (see below).
We have not discussed the third option yet, as it does not include subclassing. Not much to discuss, though, as this is the simplest approach, one that uses a more procedural approach rather than OOP. On the one hand, using the summarize()
function in this approach would be only a little (if any at all) faster than using the RichDict.summarize()
method in option 2. This gist contains the corresponding benchmark code; in my machine, it provided a small but stable (from run to run) improvement in performance. On the other hand, all the other behavior of regular dict
, as we know, is significantly faster than that of RichDict
. Hence, generally, option 3 offers the fastest way to handle a dictionary with additional functionalities.
Thus, if performance matters, the most sensible seems to be the third option — that is, using a regular dictionary and implementing the additional required behavior in external functions. Depending on the situation, it can also be an option with the clearest code, as it does not require a custom data structure but uses a dictionary, one of the most common data structures in Python, combined with a function. More often than not, this means clearer code.
The second option means worse performance, as adding methods to dict
leads to overhead time of its behavior. As we know, option 3 gets rid of this overhead by moving the method outside the dictionary.
The first option is definitely the worst one in terms of performance. I think the only situations in which this option makes sense are those in which each of the following three conditions is met:
- performance does not matter
- you need to overwrite one or more built-in
dict
methods - the code will be clearer and easier to use thanks to creating a class that will join all the required functionalities
¹ I am planning to write a dedicated article to positional-only and keyword-only arguments. Once it’s published, I will link it here.
² By the way, in an interactive session, a return from the function would have the same effect (of course, when the results are not assigned). I ran the benchmarks in a script, not an interactive session, however.
Solution to the exercise
You can do it in various way. The below solution avoids repetition, but also makes it easy to add yet another type (to Sequence
and str
) to calculations.
from collections.abc import Sequence
from typing import Callableclass RichDict(dict):
measures_seq = {
"sum": sum,
"n": len,
"mean": lambda x: sum(x) / len(x),
}
measures_str = {
"n": len,
}
def summarize(self):
statistics = {}
for k, v in self.items():
if isinstance(v, str):
measures = self.measures_str
elif isinstance(v, Sequence):
measures = self.measures_seq
else:
continue
statistics[k] = {
name: try_calculate(func, v)
for name, func
in measures.items()
}
return statistics
NaN = float("nan")
def try_calculate(func: Callable, *args, **kwargs):
"""Try calculations and when the data are incorrect, return nan."""
try:
return func(*args, **kwargs)
except TypeError:
return NaN