In January, a Google branch focused on health-related research released an artificial intelligence (AI) model trained on more than 90,000 mammogram X-rays. Google Health said the model could, in some instances, perform better than human radiologists in robustness and speed.

It claimed the algorithm could identify more false negatives, i.e., images that appear normal but contain breast cancer, better than previous work — but when researchers from other institutions attempted to replicate the findings, they found the paper lacked a sufficient description of the code and models used.

That prompted a team of coauthors affiliated with McGill University, the City University of New York (CUNY), Harvard University, and Stanford University to publish a rebuttal in Nature, arguing the lack of transparency in Google’s research “undermines its scientific value.”

The piece, titled Transparency and reproducibility in artificial intelligence, challenges scientific journals to hold AI researchers to higher standards and asks colleagues to be more forthcoming with their “code, models, and computational environments” in publications.

RELATED: What’s up with Twitter’s image-cropping algorithm?

“Scientific progress depends on the ability of researchers to scrutinize the results of a study and reproduce the main finding to learn from,” Dr. Benjamin Haibe-Kains, Senior Scientist at Princess Margaret Cancer Centre and first author of the article, says in a statement.

“But in computational research, it’s not yet a widespread criterion for the details of an AI study to be fully accessible. This is detrimental to our progress.”

According to the authors of the Nature paper, the vagueness of the Google paper prevented researchers from learning how the algorithm works, making it impossible to apply the findings to their own institutions. 

“On paper and in theory, the McKinney et al. study is beautiful,” says Dr. Haibe-Kains, “But if we can’t learn from it then it has little to no scientific value.”



According to VentureBeat, science has a reproducibility problem in general, but it is even more pronounced in AI.

A 2016 poll of 1,500 scientists found that 70 per cent of the respondents had attempted to reproduce another group’s experiment and failed. An August 2020 pre-print study that analyzed more than 3,00 AI papers found that “learning models tended to be inconsistent, irregularly tracked, and not particularly informative,” according to VentureBeat.

In a statement, Dr. Haibe-Kains acknowledges the Google paper is indicative of a larger trend in computational research.

“Researchers are more incentivized to publish their findings rather than spend time and resources ensuring their study can be replicated,” Dr. Haibe-Kains says in a statement.

RELATED: AI is overwhelmingly white, and that can lead to biased algorithms

“Journals are vulnerable to the ‘hype’ of AI and may lower the standards for accepting papers that don’t include all the materials required to make the study reproducible–often in contradiction to their own guidelines.”

This can be harmful in several ways, the authors of the Nature paper argue.

Because researchers cannot replicate improperly-described models efficiently, it can slow down the transition of potentially life-saving AI algorithms into clinical settings. Not having a complete understanding of a model can also lead to unwarranted clinical trials.

The authors say the problem can be rectified by upholding the three pillars of open science: sharing data, sharing computer code, and sharing predictive models.

“We have high hopes for the utility of AI for our cancer patients,” says Dr. Haibe-Kains. “Sharing and building upon our discoveries–that’s real scientific impact.”

Support we rep stem