Techno Blender
Digitally Yours.

Understanding Python Sets. An underutilized class in Python… | by Miguel Saldana | Aug, 2022

0 99


An underutilized class in Python, because lists don’t solve all problems

Photo by CHUTTERSNAP on Unsplash

Python sets are something that most people in their early Python learn, but sometimes forget that in some places it might be more useful than, say, a list. Lists kind of get all the attention and are maybe not used in the proper context, but in this article we’ll highlight what sets are, ways you can analyze datasets with set theory, and how this would apply to data analysis.

What is a Set?

In Python context, a set is a container type that contains unique and immutable elements. It is also stored without any particular order. The key thing to know what separates a set from lists is the first two properties mentioned, unique and immutable. No set can contain multiple elements that have the same value and similar to a tuple once a set is created items cannot be modified within it. Another key aspect when utilizing a set data type is that, unlike lists or arrays, they are unordered or each element is not tied to a unique index or place in the set. This is important to know as when a set is created, the order of each item will never be a feature of the set. Another major feature of the set datatype is that arguments for a set can be iterables, meaning that you can give it a list or an array of items when creating the set. Lastly, unlike some data types where objects within it need to be of the same type, sets, can contain items of varying types, like strings and numerical types.

Set Manipulations (or can you???)

Remember that when we previously mentioned that sets were immutable, this makes the type of operations and manipulations that can be done on sets very limited compared to something like a list. Some very useful things that can be done on sets rely on set theory and the various logical operations that can be applied. This would include applying unions, intersections, and differences between different sets.

Photo by Tatiana Rodriguez on Unsplash

Unions

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Notices that we can use the | or the x1.union(x2) logic to apply union
eaters = meateaters|planteaters
eaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys''sheep', 'cows','birds')
#OR
eaters2 = meateaters.union(planteaters)
eaters2 = ('dogs', 'humans', 'lions', 'tigers', 'monkeys''sheep', 'cows','birds')
#Also note that although humans and monkeys were mentioned multiple times it only stores a single instance of each in the final union

One this to note in the union example above is that there are two different ways of executing a union operation on sets. One key distinction between the two is that the | operator requires that both items be of a set type. The x.union method could be applied to any iterable argument that will then be converted to a set before applying the union operation. This gives flexibility when using items that could be in a list or another data type and still apply union logic between objects.

Photo by sylvie charron on Unsplash

Differences

Similar to unions, the difference operations will look at the two different sets and find out the differences between sets. In the process of doing this, it will create a new set of unique items. So taking the example used above between meat- and plant-eaters, the difference operation will exclude humans and monkeys since they exist in both sets.

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Animals that are not omnivores
nonomnivores = meateaters.difference(planteaters)
nonomnivores = ('dogs', 'lions', 'tigers', 'sheep', 'cows', 'birds')
#The (-) minus operator can also be used in the same way
nonomnivores2 = meateaters - planteaters
nonomnivores2 = ('dogs', 'lions', 'tigers', 'sheep', 'cows', 'birds')
#Also more than two sets can be applied with similar logic
x = difference(y, z)
#Or
x - y - z
Photo by Siyuan on Unsplash

Intersections

The last common operation that can be done with sets is applying an intersection logic. This will look at two different sets and then identify the intersection, or things that exist in both sets as the output. The below code is an example of applying the intersection operation to our example of plant-eaters and meat-eaters.

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Animals that eat both
omnivores = meateaters.intersection(planteaters)
omnivores = ('humans', 'monkeys')
#Alternative way to execute is using the & character
omnivores2 = meateaters & planteaters
omnivores2 = ('humans', 'monkeys')
Photo by Firmbee.com on Unsplash

Why is this important?

The ultimate question about anything to do with programming is, so what? With understanding sets, analysis of the properties of the items and how they relate to other items can be done. A good example is looking at the properties of objects that may be held as a set or even a list. Applying the three operations above will allow an easy way to evaluate what makes the properties of the different items similar or different in a couple of contexts.

Talking through an example is like building a page-rank type of analysis of pages of data that are connected by links. If items are connected with various properties, applying logic found in set theory (or utilizing sets for properties), one can easily apply union, difference, and intersection operations to gain deeper insights into the dataset. Ideally, you can compare the top N number of pages and then identify using union between their properties to understand what commonalities they have together. This same logic can be applied by looking at the Top N and Bottom N and then understanding the differences and/or intersections between them. This can definitely help in the data processing step and give a leg up into features that can provide significant importance to a data science model.

Another example of this is with applying similarity models in data science, the most common similarity method used and written about is cosine similarity, but by utilizing set theory another powerful similarity measurement is Jaccard similarity. It is commonly used in applying similarity models in text mining and recommendation services. This is largely due to the fact that the Jaccard similarity can be written in set notation as:

J(A, B) = |A∩B| / |A∪ B|

import numpy as np#Introduce two vectors (which could represent a variety of parameters)
a = [0, 1, 2, 5, 6, 8, 9]
b = [0, 2, 3, 4, 5, 7, 9]
#Define Jaccard Similarity functiondef jaccard(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection) / union

#Find Jaccard Similarity between the two sets
jaccard(a, b)

#Result
0.4

Playing around with set theory ideas and utilizing the set data will help with approaching problems from different angles and give a wider appreciation for a subset of math that is super powerful for data analysis.


An underutilized class in Python, because lists don’t solve all problems

Photo by CHUTTERSNAP on Unsplash

Python sets are something that most people in their early Python learn, but sometimes forget that in some places it might be more useful than, say, a list. Lists kind of get all the attention and are maybe not used in the proper context, but in this article we’ll highlight what sets are, ways you can analyze datasets with set theory, and how this would apply to data analysis.

What is a Set?

In Python context, a set is a container type that contains unique and immutable elements. It is also stored without any particular order. The key thing to know what separates a set from lists is the first two properties mentioned, unique and immutable. No set can contain multiple elements that have the same value and similar to a tuple once a set is created items cannot be modified within it. Another key aspect when utilizing a set data type is that, unlike lists or arrays, they are unordered or each element is not tied to a unique index or place in the set. This is important to know as when a set is created, the order of each item will never be a feature of the set. Another major feature of the set datatype is that arguments for a set can be iterables, meaning that you can give it a list or an array of items when creating the set. Lastly, unlike some data types where objects within it need to be of the same type, sets, can contain items of varying types, like strings and numerical types.

Set Manipulations (or can you???)

Remember that when we previously mentioned that sets were immutable, this makes the type of operations and manipulations that can be done on sets very limited compared to something like a list. Some very useful things that can be done on sets rely on set theory and the various logical operations that can be applied. This would include applying unions, intersections, and differences between different sets.

Photo by Tatiana Rodriguez on Unsplash

Unions

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Notices that we can use the | or the x1.union(x2) logic to apply union
eaters = meateaters|planteaters
eaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys''sheep', 'cows','birds')
#OR
eaters2 = meateaters.union(planteaters)
eaters2 = ('dogs', 'humans', 'lions', 'tigers', 'monkeys''sheep', 'cows','birds')
#Also note that although humans and monkeys were mentioned multiple times it only stores a single instance of each in the final union

One this to note in the union example above is that there are two different ways of executing a union operation on sets. One key distinction between the two is that the | operator requires that both items be of a set type. The x.union method could be applied to any iterable argument that will then be converted to a set before applying the union operation. This gives flexibility when using items that could be in a list or another data type and still apply union logic between objects.

Photo by sylvie charron on Unsplash

Differences

Similar to unions, the difference operations will look at the two different sets and find out the differences between sets. In the process of doing this, it will create a new set of unique items. So taking the example used above between meat- and plant-eaters, the difference operation will exclude humans and monkeys since they exist in both sets.

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Animals that are not omnivores
nonomnivores = meateaters.difference(planteaters)
nonomnivores = ('dogs', 'lions', 'tigers', 'sheep', 'cows', 'birds')
#The (-) minus operator can also be used in the same way
nonomnivores2 = meateaters - planteaters
nonomnivores2 = ('dogs', 'lions', 'tigers', 'sheep', 'cows', 'birds')
#Also more than two sets can be applied with similar logic
x = difference(y, z)
#Or
x - y - z
Photo by Siyuan on Unsplash

Intersections

The last common operation that can be done with sets is applying an intersection logic. This will look at two different sets and then identify the intersection, or things that exist in both sets as the output. The below code is an example of applying the intersection operation to our example of plant-eaters and meat-eaters.

#Animals that eat meat
meateaters = ('dogs', 'humans', 'lions', 'tigers', 'monkeys')
#Animals that eat plants
planteaters = ('humans', 'sheep', 'cows', 'monkeys', 'birds')
#Animals that eat both
omnivores = meateaters.intersection(planteaters)
omnivores = ('humans', 'monkeys')
#Alternative way to execute is using the & character
omnivores2 = meateaters & planteaters
omnivores2 = ('humans', 'monkeys')
Photo by Firmbee.com on Unsplash

Why is this important?

The ultimate question about anything to do with programming is, so what? With understanding sets, analysis of the properties of the items and how they relate to other items can be done. A good example is looking at the properties of objects that may be held as a set or even a list. Applying the three operations above will allow an easy way to evaluate what makes the properties of the different items similar or different in a couple of contexts.

Talking through an example is like building a page-rank type of analysis of pages of data that are connected by links. If items are connected with various properties, applying logic found in set theory (or utilizing sets for properties), one can easily apply union, difference, and intersection operations to gain deeper insights into the dataset. Ideally, you can compare the top N number of pages and then identify using union between their properties to understand what commonalities they have together. This same logic can be applied by looking at the Top N and Bottom N and then understanding the differences and/or intersections between them. This can definitely help in the data processing step and give a leg up into features that can provide significant importance to a data science model.

Another example of this is with applying similarity models in data science, the most common similarity method used and written about is cosine similarity, but by utilizing set theory another powerful similarity measurement is Jaccard similarity. It is commonly used in applying similarity models in text mining and recommendation services. This is largely due to the fact that the Jaccard similarity can be written in set notation as:

J(A, B) = |A∩B| / |A∪ B|

import numpy as np#Introduce two vectors (which could represent a variety of parameters)
a = [0, 1, 2, 5, 6, 8, 9]
b = [0, 2, 3, 4, 5, 7, 9]
#Define Jaccard Similarity functiondef jaccard(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection) / union

#Find Jaccard Similarity between the two sets
jaccard(a, b)

#Result
0.4

Playing around with set theory ideas and utilizing the set data will help with approaching problems from different angles and give a wider appreciation for a subset of math that is super powerful for data analysis.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment