statsannotations: Add Statistical Significance Annotations on Seaborn Plots | by Khuyen Tran | Aug, 2022


Gain Insights from a Comparison in Three Lines of Code

Imagine you are trying to determine if there is a significant difference in the median total payment between two cities that a taxi picks up. You decide to create a box plot to observe the total fare per pickup city.

Image by Author

This plot gives you some ideas about the difference in the total fare between multiple cities but doesn’t give you insights into what you are looking for.

Wouldn’t it be nice if you add statistical annotations on a box plot like below? That is when statsannotation comes in handy.

Image by Author

statsannotation is a Python package to optionally compute statistical tests and add statistical annotations on plots generated with seaborn.

To install statsannotation, type:

pip install statsannotation

To learn how to use statsannotation, let’s first start with loading the dataset of taxis in New York from seaborn.

Let’s the median total fare for each city:

Image by Author

We can see that the median total fare for taxis that pick up customers from Queens is the highest, followed by Bronx, Brooklyn, and Manhattan.

To get a better idea of the distribution of the total fare per city, we can create the box plot for the total fare per city:

Image by Author

To add statistical annotations to the plot, we will use statsannotions.

Start with getting the total fares for all rides per city:

Next, get all possible combinations of the two cities for the comparisons:

[('Manhattan', 'Brooklyn'),
('Manhattan', 'Bronx'),
('Manhattan', 'Queens'),
('Brooklyn', 'Bronx'),
('Brooklyn', 'Queens'),
('Bronx', 'Queens')]

Now we are ready to add statistical annotations to the plot! Specially, we will use the Man-Whitney U test to compare two independent groups.

The null hypothesis is that the total fares of the two cities are equal. The alternative hypothesis is that the total fares of the two cities are not equal.

Manhattan vs. Brooklyn: Mann-Whitney-Wilcoxon test two-sided, P_val:7.225e-01 U_stat=9.979e+05
Brooklyn vs. Bronx: Mann-Whitney-Wilcoxon test two-sided, P_val:1.992e-02 U_stat=1.608e+04
Bronx vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:1.676e-02 U_stat=2.768e+04
Manhattan vs. Bronx: Mann-Whitney-Wilcoxon test two-sided, P_val:5.785e-04 U_stat=2.082e+05
Brooklyn vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:3.666e-12 U_stat=9.335e+04
Manhattan vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:2.929e-30 U_stat=1.258e+06
Image by Author

The meaning of the number of stars in the plot:

      ns: p <= 1.00e+00
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04

ns stands for not statistically significant. In general, the smaller a p-value is, the stronger evidence there is in favor of the alternative hypothesis.

In the plot above, we can see that there is a significant difference in the median total payment between most pairs of cities except Manhattan and Brooklyn.

If you don’t like the star notation and want to add p-values to your plot instead, specify text_format="simple" :

Image by Author

And you will see the p-values for the comparison between a particular pair of cities!

Congratulations! You have just learned how to add statistical annotations to your seaborn plot. I hope this article will give you the skill to investigate the relationships between two data on a deeper level.

Feel free to play and fork the source code of this article here:


Gain Insights from a Comparison in Three Lines of Code

Imagine you are trying to determine if there is a significant difference in the median total payment between two cities that a taxi picks up. You decide to create a box plot to observe the total fare per pickup city.

Image by Author

This plot gives you some ideas about the difference in the total fare between multiple cities but doesn’t give you insights into what you are looking for.

Wouldn’t it be nice if you add statistical annotations on a box plot like below? That is when statsannotation comes in handy.

Image by Author

statsannotation is a Python package to optionally compute statistical tests and add statistical annotations on plots generated with seaborn.

To install statsannotation, type:

pip install statsannotation

To learn how to use statsannotation, let’s first start with loading the dataset of taxis in New York from seaborn.

Let’s the median total fare for each city:

Image by Author

We can see that the median total fare for taxis that pick up customers from Queens is the highest, followed by Bronx, Brooklyn, and Manhattan.

To get a better idea of the distribution of the total fare per city, we can create the box plot for the total fare per city:

Image by Author

To add statistical annotations to the plot, we will use statsannotions.

Start with getting the total fares for all rides per city:

Next, get all possible combinations of the two cities for the comparisons:

[('Manhattan', 'Brooklyn'),
('Manhattan', 'Bronx'),
('Manhattan', 'Queens'),
('Brooklyn', 'Bronx'),
('Brooklyn', 'Queens'),
('Bronx', 'Queens')]

Now we are ready to add statistical annotations to the plot! Specially, we will use the Man-Whitney U test to compare two independent groups.

The null hypothesis is that the total fares of the two cities are equal. The alternative hypothesis is that the total fares of the two cities are not equal.

Manhattan vs. Brooklyn: Mann-Whitney-Wilcoxon test two-sided, P_val:7.225e-01 U_stat=9.979e+05
Brooklyn vs. Bronx: Mann-Whitney-Wilcoxon test two-sided, P_val:1.992e-02 U_stat=1.608e+04
Bronx vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:1.676e-02 U_stat=2.768e+04
Manhattan vs. Bronx: Mann-Whitney-Wilcoxon test two-sided, P_val:5.785e-04 U_stat=2.082e+05
Brooklyn vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:3.666e-12 U_stat=9.335e+04
Manhattan vs. Queens: Mann-Whitney-Wilcoxon test two-sided, P_val:2.929e-30 U_stat=1.258e+06
Image by Author

The meaning of the number of stars in the plot:

      ns: p <= 1.00e+00
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04

ns stands for not statistically significant. In general, the smaller a p-value is, the stronger evidence there is in favor of the alternative hypothesis.

In the plot above, we can see that there is a significant difference in the median total payment between most pairs of cities except Manhattan and Brooklyn.

If you don’t like the star notation and want to add p-values to your plot instead, specify text_format="simple" :

Image by Author

And you will see the p-values for the comparison between a particular pair of cities!

Congratulations! You have just learned how to add statistical annotations to your seaborn plot. I hope this article will give you the skill to investigate the relationships between two data on a deeper level.

Feel free to play and fork the source code of this article here:

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
AddAnnotationsartificial intelligenceAugKhuyenlatest newsmachine learningPlotsSeabornSignificanceStatisticalstatsannotationsTran
Comments (0)
Add Comment