What the Hash. Understand mapping types and how they… | by Pere Miquel Brull | Aug, 2022


Diving into the Unshashable Type error

A common issue all Python developers face at some point (or in reality, many times) is a TypeError with the magic words unhashable type. We can all agree that it does not sound good and can get close to scary without the proper context.

Python is a language heavily built on top of dictionaries. For example, that is how namespaces or classes store their functions and attributes. Understanding how dictionaries work beyond Python’s API can help us make sense of this data structure and avoid common errors.

This post will dive into mapping types, what they do internally, and how this knowledge helps us understand more about the unhashable exception.

Data access can be done in two ways:

  • Full Scan, where the process needs to go through all the elements of the data structure. In Python, an example would be a List. While we could access an item by index, there is no way to identify a specific element within the container directly, plus the indexes could change over time.
  • Key Lookup, where the process knows how to identify and retrieve specific items directly. A mapping is a data structure that supports this data access strategy. A dict would be one of Python’s implementations of a mapping type.

The main difference between both approaches is that for mappings, the data stored in the container needs to be provided with a key, which will be used to identify an item, and a value, which is the element containing the data we want to retrieve.

For the Key Lookup to be consistent, there are a couple of properties that need to be satisfied (if we do not modify the container):

  1. A value should only be accessible by one key. Unique values avoid collisions and help ensure that no data gets overridden by a distinct element.
  2. A key should always return the same value for the whole lifetime of a key-value pair.

Mapping hashable keys to the values stored in the container ensures that the properties above hold. When we add a new pair to a dictionary, the key itself won’t be used to store the value but rather the result of applying a hash function to it.

Hashing a key to store the value. Image by the author.

Following the diagram, hash functions help us to uniquely obtain the required values based on a key for the whole lifetime of the key. Let’s imagine for a moment what would happen if hash functions did not hold these properties:

Hash not holding properties mess. Image by the author.
  1. We start with two keys k1and k2 that produce the same hash H1
  2. Using k1 we could either retrieve v1 or v2, ending up with data inconsistency.
  3. Inserting a new pair (k3, v3) where k3 is hashed as H0, means that we lose the contents of v0 as we are overwriting it with v3.

Without proper guarantees, we get unexpected results and can even lose data.

However, hash functions are not the only ones involved in the Key Lookups. The actual Key objects are part of the equation as well. Therefore, to ensure the two main lookup properties are satisfied, we need to introduce an important topic: mutability.

We say that an object is immutable if its state cannot be altered after it is created and mutable otherwise.

Even if hash functions are well-defined, mutable objects that hold multiple states during their lifetime will yield different results if we apply the hash function to them. Then, we end up in the mapping mess showcased above.

For the sake of discussion, let’s suppose that length is a proper hash function. If we wanted to use a list as a key, we’d be applying len(list) to access the container. Updating the state of the list by adding or removing elements (mutating the list) would mean that we could not correctly store and get data from a mapping.

The example above showcases why Python only allows immutable objects to become mapping keys. Strings or integers are immutable, so developers can safely create dicts with them:

But trying this with types such as list or set, raises a TypeError: unhashable type. Why? Because those two are mutable. We can add and remove elements as we wish, so the hash function results will vary.

The Unshashable Type error tells us that we are trying to create a dictionary with a value that is not fit to become a proper Key, as we cannot ensure that the hash function will always return the same output for the same object.

While str and int are the typical dictionary keys; we might need other data types at some point. Luckily, Python brings some collections to the table that are immutable:

  • list vs. tuple: While Lists are a generic ordered collection, Tuples usually bring meaning and structure to their elements. In addition, tuples are immutable, and as they cannot be updated, each position holds a specific piece of information (recommended source).
  • set vs. frozenset: Both structures share most of the C implementation but frozensets are immutable.
  • frozenmap: This has not yet been implemented, but there’s a PEP-603 open for consideration to add a persistent data structure for mappings.

Sometimes, it is both valuable and exciting to jump to the root of a topic to understand why Python works the way it does.

In this post, we have reviewed:

  • How mapping data types work.
  • How hash functions and immutability ensure consistency and safety when storing and accessing data.
  • Different mutable and immutable Python types.


Diving into the Unshashable Type error

A common issue all Python developers face at some point (or in reality, many times) is a TypeError with the magic words unhashable type. We can all agree that it does not sound good and can get close to scary without the proper context.

Python is a language heavily built on top of dictionaries. For example, that is how namespaces or classes store their functions and attributes. Understanding how dictionaries work beyond Python’s API can help us make sense of this data structure and avoid common errors.

This post will dive into mapping types, what they do internally, and how this knowledge helps us understand more about the unhashable exception.

Data access can be done in two ways:

  • Full Scan, where the process needs to go through all the elements of the data structure. In Python, an example would be a List. While we could access an item by index, there is no way to identify a specific element within the container directly, plus the indexes could change over time.
  • Key Lookup, where the process knows how to identify and retrieve specific items directly. A mapping is a data structure that supports this data access strategy. A dict would be one of Python’s implementations of a mapping type.

The main difference between both approaches is that for mappings, the data stored in the container needs to be provided with a key, which will be used to identify an item, and a value, which is the element containing the data we want to retrieve.

For the Key Lookup to be consistent, there are a couple of properties that need to be satisfied (if we do not modify the container):

  1. A value should only be accessible by one key. Unique values avoid collisions and help ensure that no data gets overridden by a distinct element.
  2. A key should always return the same value for the whole lifetime of a key-value pair.

Mapping hashable keys to the values stored in the container ensures that the properties above hold. When we add a new pair to a dictionary, the key itself won’t be used to store the value but rather the result of applying a hash function to it.

Hashing a key to store the value. Image by the author.

Following the diagram, hash functions help us to uniquely obtain the required values based on a key for the whole lifetime of the key. Let’s imagine for a moment what would happen if hash functions did not hold these properties:

Hash not holding properties mess. Image by the author.
  1. We start with two keys k1and k2 that produce the same hash H1
  2. Using k1 we could either retrieve v1 or v2, ending up with data inconsistency.
  3. Inserting a new pair (k3, v3) where k3 is hashed as H0, means that we lose the contents of v0 as we are overwriting it with v3.

Without proper guarantees, we get unexpected results and can even lose data.

However, hash functions are not the only ones involved in the Key Lookups. The actual Key objects are part of the equation as well. Therefore, to ensure the two main lookup properties are satisfied, we need to introduce an important topic: mutability.

We say that an object is immutable if its state cannot be altered after it is created and mutable otherwise.

Even if hash functions are well-defined, mutable objects that hold multiple states during their lifetime will yield different results if we apply the hash function to them. Then, we end up in the mapping mess showcased above.

For the sake of discussion, let’s suppose that length is a proper hash function. If we wanted to use a list as a key, we’d be applying len(list) to access the container. Updating the state of the list by adding or removing elements (mutating the list) would mean that we could not correctly store and get data from a mapping.

The example above showcases why Python only allows immutable objects to become mapping keys. Strings or integers are immutable, so developers can safely create dicts with them:

But trying this with types such as list or set, raises a TypeError: unhashable type. Why? Because those two are mutable. We can add and remove elements as we wish, so the hash function results will vary.

The Unshashable Type error tells us that we are trying to create a dictionary with a value that is not fit to become a proper Key, as we cannot ensure that the hash function will always return the same output for the same object.

While str and int are the typical dictionary keys; we might need other data types at some point. Luckily, Python brings some collections to the table that are immutable:

  • list vs. tuple: While Lists are a generic ordered collection, Tuples usually bring meaning and structure to their elements. In addition, tuples are immutable, and as they cannot be updated, each position holds a specific piece of information (recommended source).
  • set vs. frozenset: Both structures share most of the C implementation but frozensets are immutable.
  • frozenmap: This has not yet been implemented, but there’s a PEP-603 open for consideration to add a persistent data structure for mappings.

Sometimes, it is both valuable and exciting to jump to the root of a topic to understand why Python works the way it does.

In this post, we have reviewed:

  • How mapping data types work.
  • How hash functions and immutability ensure consistency and safety when storing and accessing data.
  • Different mutable and immutable Python types.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
AugBrullhashlatest newsmappingMiquelPereTechnoblenderTechnologytypesUnderstand
Comments (0)
Add Comment