Why Some AI Studies Lie About Being Good

Original Title

Machine learning algorithm validation with a limited sample size

Not medical advice. For informational purposes only. Always consult a healthcare professional. Terms

Summary

When scientists use small amounts of data to train AI, they can accidentally trick themselves into thinking it works better than it does. This happens when the AI gets to 'peek' at the test data during training, especially when picking which data features to use.

Sign up to see full results

Get access to research results, context, and detailed analysis.

Surprising Findings

K-fold CV remains biased even at n=1,000 samples

Most researchers assume larger samples fix overfitting, but this shows the validation method itself is the problem—even big datasets aren’t safe with flawed testing.

Practical Takeaways

Always use nested cross-validation or a strict train/test split when working with small or high-dimensional datasets.

high confidence

Unlock Full Study Analysis

Sign up free to access quality scores, evidence strength analysis, and detailed methodology breakdowns.

0%
Lower QualityOverall Score

Publication

Journal

PLoS ONE

Year

2019

Authors

A. Vabalas, E. Gowen, E. Poliakoff, A. Casson

Open Access
1350 citations
Analysis v1