How Data Leakage affects model performance claims | by Georgia Deaconu | Jan, 2023
This year has seen several important scientific advancements enabled by machine learning driven research. Along with the enthusiasm came also some worry related to the reproducibility issues encountered in ML-based science. Several methodological problems have been identified, out of which data leakage seems to be the most widespread. Generally, data leakage can skew results and lead to overly optimistic conclusions.There are several different ways in which data leakage can occur. The objective of this post is to present…