The Ins and Outs of Clustering Algorithms | by TDS Editors | Dec, 2022

By Jessie Hobb On Dec 8, 2022

Solving a data science problem often starts with asking the same simple questions over and over again, with the occasional variation: Is there a relationship here? Do these data points belong together? What about those other ones over there? How do the former relate to the latter?

Things can (and do) become complicated very quickly—especially when we try to detect subtle patterns and relationships while dealing with large datasets. This is where clustering algorithms come in handy with their power to divide a messy pile of data into distinct, meaningful groups, which we can then leverage in our analyses.

To help you on your clustering learning journey, we’ve selected our best recent articles on the topic—they cover a lot of ground, from basic concepts to more specialized use cases. Enjoy!

The fundamentals of k-means clustering. Whether you’re brand new to machine learning or a veteran in need of a solid refrehser, Jacob Bumgarner’s introduction to the most widely used centroid-based clustering method is a great place to start.
An accessible guide to density-based clustering. Once you’ve mastered k-means clustering and are ready to branch out a bit, Shreya Rao is here to help with a clearly explained guide to DBSCAN (Density-Based Spatial Clustering of Applications with Noise), an algorithm that “requires minimum domain knowledge, can discover clusters of arbitrary shape, and is efficient for large databases.”
How to put an algorithm to good use. Feeling inspired to apply your clustering knowledge to a concrete problem? Lihi Gur Arie, PhD’s new tutorial is based on k-means clustering; it patiently walks readers through the steps of identifying and quantifying objects in an image based on their color.

Clustering methods meet renewable energy. In the real world, it’s not always self-evident which clustering approach works best for a given use case. Abiodun Olaoye looks at a number of algorithms—k-means, agglomerative clustering (AGC), Gaussian mixture models (GMM), and affinity propagation (AP)—to determine which one is the most effective at discovering wind-turbine neighbors.
How to choose the right density-based algorithm. Deciding which model is the best one to use with your dataset can sometimes hinge on small, nuanced differences. Thomas A Dorfer presents one such example by comparing the performance of DBSCAN to that of HDBSCAN, its more recent sibling, and shows us how to look at the pros and cons of different clustering options.

To help you on your clustering learning journey, we’ve selected our best recent articles on the topic—they cover a lot of ground, from basic concepts to more specialized use cases. Enjoy!

The fundamentals of k-means clustering. Whether you’re brand new to machine learning or a veteran in need of a solid refrehser, Jacob Bumgarner’s introduction to the most widely used centroid-based clustering method is a great place to start.
An accessible guide to density-based clustering. Once you’ve mastered k-means clustering and are ready to branch out a bit, Shreya Rao is here to help with a clearly explained guide to DBSCAN (Density-Based Spatial Clustering of Applications with Noise), an algorithm that “requires minimum domain knowledge, can discover clusters of arbitrary shape, and is efficient for large databases.”
How to put an algorithm to good use. Feeling inspired to apply your clustering knowledge to a concrete problem? Lihi Gur Arie, PhD’s new tutorial is based on k-means clustering; it patiently walks readers through the steps of identifying and quantifying objects in an image based on their color.

Clustering methods meet renewable energy. In the real world, it’s not always self-evident which clustering approach works best for a given use case. Abiodun Olaoye looks at a number of algorithms—k-means, agglomerative clustering (AGC), Gaussian mixture models (GMM), and affinity propagation (AP)—to determine which one is the most effective at discovering wind-turbine neighbors.
How to choose the right density-based algorithm. Deciding which model is the best one to use with your dataset can sometimes hinge on small, nuanced differences. Thomas A Dorfer presents one such example by comparing the performance of DBSCAN to that of HDBSCAN, its more recent sibling, and shows us how to look at the pros and cons of different clustering options.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.