Why Some AI Studies Lie About Being Good
Machine learning algorithm validation with a limited sample size
Not medical advice. For informational purposes only. Always consult a healthcare professional. Terms
When scientists use small amounts of data to train AI, they can accidentally trick themselves into thinking it works better than it does. This happens when the AI gets to 'peek' at the test data during training, especially when picking which data features to use.
Surprising Findings
K-fold CV remains biased even at n=1,000 samples
Most researchers assume larger samples fix overfitting, but this shows the validation method itself is the problem—even big datasets aren’t safe with flawed testing.
Practical Takeaways
Always use nested cross-validation or a strict train/test split when working with small or high-dimensional datasets.
Not medical advice. For informational purposes only. Always consult a healthcare professional. Terms
When scientists use small amounts of data to train AI, they can accidentally trick themselves into thinking it works better than it does. This happens when the AI gets to 'peek' at the test data during training, especially when picking which data features to use.
Surprising Findings
K-fold CV remains biased even at n=1,000 samples
Most researchers assume larger samples fix overfitting, but this shows the validation method itself is the problem—even big datasets aren’t safe with flawed testing.
Practical Takeaways
Always use nested cross-validation or a strict train/test split when working with small or high-dimensional datasets.
Publication
Journal
PLoS ONE
Year
2019
Authors
A. Vabalas, E. Gowen, E. Poliakoff, A. Casson
Related Content
Claims (6)
When a study doesn't include enough people, the results might just be due to chance and not reflect what's really going on for most people.
If you're testing a machine learning model using a common method called K-fold cross-validation, you might think it's working better than it really is—especially if you're tuning the model using all your data first. This can trick you into believing your model is accurate when it won't work as well on new data.
Nested cross-validation is like having two layers of checkups when testing a model—it keeps the test data totally separate so the model doesn’t cheat, giving a fairer score no matter how much data you have.
If you pick the most important features using all your data — including the test set — your model might look better than it really is, because it's secretly cheating by seeing data it shouldn't see yet.
Splitting data into training and testing sets gives a fair measure of how well a machine learning model works—just as reliable as more complex methods—because it keeps the test data completely separate so the model doesn't cheat by seeing it early.