If you pick the most important features using all your data — including the test set — your model might look better than it really is, because it's secretly cheating by seeing data it shouldn't see yet.

Evidence from Studies

Supporting (1)

0

Community contributions welcome

0

Machine learning algorithm validation with a limited sample size

Computational/Algorithm Study

2019

The study shows that picking features using all the data (including test data) tricks the model into thinking it's better than it really is, and this causes more misleading results than adjusting the model's settings.

Contradicting (0)

0

Community contributions welcome

No contradicting evidence found

Gold Standard Evidence Needed

According to GRADE and EBM methodology, here is what ideal scientific evidence would look like to definitively prove or disprove this specific claim, ordered from strongest to weakest evidence.

Source Study

Machine learning algorithm validation with a limited sample size

0

DOI: 10.1371/journal.pone.0224365