data-analysisdatamachine-learningprecision-medicine8 min read·June 25, 2025

Why AI Promises in Cancer Care Keep Failing: A Wake-Up Call for Bioinformatics

Image generated using Freepik AI image generator

Paper selection rationale:

My choice of this research paper is deeply personal. Having lost loved ones to cancer, I have witnessed firsthand the devastating impact of treatment uncertainties and the heartbreaking reality of therapies that don’t work for everyone. This experience ignited my passion for precision medicine and the promise that AI could help predict which treatments will work for which patients, potentially sparing families the anguish of ineffective treatments and lost time.

I selected this systematic review by Corti et al. (2022) because it aligns with my academic goals in precision medicine and bioinformatics. As someone studying the intersection of AI/ML, Omics data, and clinical applications, this paper addresses the critical gap between AI’s theoretical promise and its practical implementation in cancer care.

Relevance to my career goals:
This research directly relates to my interests in:
— AI/ML applications in healthcare decision making.
— Statistical methodologies for biomedical research
— Clinical data analysis and electronic health records.

Main problem addressed:
The research addresses the fundamental question: Why aren’t AI algorithms successfully translating from research papers to clinical practice in breast cancer treatment?

Significance in bioinformatics:
This problem is crucial because bioinformatics increasingly relies on AI to process complex, high-dimensional data from genomics, transcriptomics, proteomics, and clinical sources. If our AI methodologies are fundamentally flawed, we’re building an unreliable foundation for precision medicine.

Authors’ objectives:
— Systematically evaluate AI methodological quality in breast cancer treatment prediction
— Assess adherence to reporting standards (TRIPOD guidelines)
— Identify risk of bias using the PROBAST framework
— Evaluate data and code availability for reproducibility

Background and Context

Essential background information:
Breast cancer affects over 2.3 million women annually worldwide, making personalised treatment prediction crucial for improving outcomes. Traditional approaches rely on population-based clinical trial data, which poorly predict individual patient responses due to tumour heterogeneity and patient-specific factors.

Research context in existing knowledge:
This study builds upon several foundational concepts from my coursework:
From omics-data learning: modern cancer research generates massive data sets from:
— Transcriptomics: Gene expression profiling reveals tumour subtypes and pathway activities.
— Proteomics: Proteins’ signatures indicate functional states and drug targets.
— Metabolomics: small molecule profiles reflect treatment responses.
— Microbiome analysis: gut bacteria influence drug metabolism and immune responses.

From AI/ML concepts: Machine learning algorithms promise to integrate these diverse data types to predict treatment outcomes, but successful implementation requires rigorous validation methodologies.

From Research Ethics: This paper addresses critical ethical concerns about algorithmic bias, particularly the underrepresentation of diverse ethnic populations in AI training datasets.

Previous studies and theories:

The research challenges the current AI optimism by systematically evaluating whether existing studies meet basic methodological standards. Unlike previous reviews focusing on AI performance metrics, this study examines the fundamental quality of research methodology: a crucial gap that could explain why promising AI results fail to translate clinically.

The authors apply established prediction model guidelines (TRIPOD, PROBAST) to AI research, bridging traditional biostatistics with modern machine learning evaluation.

Methodology Examination

Methods and Techniques used:
The authors conducted a systematic review following PRISMA guidelines, searching multiple databases (MEDLINE, Embase, SCOPUS, Google Scholar, PubMed Central) in July 2021. They identified 1,124 studies, with 64 meeting inclusion criteria for AI-based breast cancer treatment outcome prediction.

Statistical and AI concepts applied:

From my statistical learning: The use of TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) represents gold-standard methodology for prediction model evaluation. This 22-item checklist ensures:
— Proper study design documentation
— Appropriate statistical methodology
— Adequate validation approaches
— Transparent reporting standards

PROBAST framework application: This tool evaluates four key domains:

Participants: Selection bias assessment (similar to sampling theory concepts)
Predictors: Feature selection validity
Outcomes: Endpoint definition quality
Analysis: Statistical methodology rigour

Why these methods were chosen:

T-test concept application: While not directly used, the systematic approach mirrors hypothesis testing principles. The authors tested whether AI studies meet established quality standards (null hypothesis: studies meet standards vs. alternative: studies are deficient).

Data collection rationale: The comprehensive database search ensures minimal selection bias, while the systematic author correspondence for data/code availability provides novel insights into reproducibility, a critical concern in computational research.

Tools and software importance:

The systematic review methodology itself represents a “meta-analytical tool” for evaluating research quality, similar to how we use statistical software for data analysis. The TRIPOD and PROBAST frameworks serve as standardised “quality control algorithms” for research evaluation.

Results Evaluation

Key findings:

From a statistical perspective: The results reveal concerning patterns that violate basic principles of robust research:
— High bias risk: 72% of studies (46/64) showed high risk of bias, equivalent to failing basic statistical validity tests
— Poor reporting standards: Like calculating means without reporting standard deviations, most studies omitted crucial methodological details
— Inadequate validation: Similar to using training data for testing, many studies lacked external validation

Specific metrics:
— Data availability: 77% of studies lacked accessible datasets (equivalent to not sharing raw data for statistical analysis)
— Code availability: 88% didn’t provide analysis code (like not sharing statistical analysis scripts)
— Ethnicity reporting: 81% failed to report participant ethnicity (major sampling bias concern)
— Model calibration: 99% didn’t report calibration metrics (equivalent to not checking if predicted probabilities match observed outcomes)

Supporting initial hypotheses:

The results strongly supported the authors’ hypothesis that methodological weaknesses limit AI clinical translation. The high bias risk and poor reporting standards explain why promising AI results often fail in real-world implementation.

Unexpected outcomes:

The severity of the reproducibility crisis was striking — author response rates for data (28%) and code (18%) sharing were lower than expected, indicating cultural barriers beyond technical limitations.

Contribution to bioinformatics:

These findings highlight that bioinformatics success depends not just on sophisticated algorithms, but on fundamental research methodology. Without proper validation, bias assessment, and transparency, even advanced AI techniques may produce unreliable results.

Implications Discussion
Authors’ conclusions:
The authors concluded that fundamental changes in AI research practices are required, including:
— Mandatory external validation across diverse populations
— Improved code and data sharing for reproducibility
— Prospective validation in real-world clinical settings
— Better adherence to established reporting standards

Practical implications:

For clinical practice: Current AI systems may perform poorly when deployed in real healthcare settings due to overfitting and a lack of proper validation. This could lead to incorrect treatment recommendations and patient harm.

For research methodology: The study demonstrates that sophisticated AI techniques cannot compensate for poor research design — fundamental statistical principles remain crucial even in the machine learning era.

From my cloud computing learning: The data sharing challenges highlight the need for secure, federated learning platforms that enable collaboration without compromising patient privacy.

Future research influence:

This work will likely influence:
— Funding requirements: Granting agencies may mandate data/code sharing and external validation
— Journal policies: Publishers may require TRIPOD/PROBAST compliance for AI studies
— Regulatory frameworks: FDA and other agencies may adopt these quality standards for AI medical device approval

Authors’ identified limitations:

— TRIPOD/PROBAST frameworks may not capture AI-specific issues like hyperparameter tuning
— Publication bias likely favours positive results
— Rapid AI evolution means some methodological advances may be missed

Theoretical implications:

The research suggests that AI in healthcare requires hybrid approaches combining machine learning sophistication with traditional biostatistical rigour; neither alone is sufficient for reliable clinical translation.

Personal and Societal Reflection

Most interesting findings:

The reproducibility crisis was most striking in the disconnect between published results and actual data availability, which reveals systemic issues in scientific practice. As someone learning both AI/ML and research ethics, this highlights how technical advancement must be coupled with scientific integrity.

Course connections:

This research integrates multiple course concepts:
— Statistics: Importance of proper validation and bias assessment
— AI/ML: Limitations of sophisticated algorithms without proper methodology
— Research ethics: Implications of algorithmic bias and data transparency
— Omics data: Need for integrated, multi-modal approaches
— Cloud computing: Infrastructure requirements for collaborative, reproducible research

Real-world applications:

The findings could influence healthcare policy, research funding priorities, and clinical decision support system development. More importantly, they provide a roadmap for developing trustworthy AI systems that could genuinely improve patient outcomes.

Personal questions raised:

How can we balance innovation speed with methodological rigour? What role should bioinformaticians play in ensuring AI reliability? How can we make research more collaborative and transparent?

LLM tool usage:

I utilised Claude AI to assist in structuring this analysis and identifying key concepts. The LLM helped me:
— Organise complex information systematically
— Connect course concepts to research findings
— Identify relevant statistical and AI/ML principles
— Develop critical analysis frameworks

Learning from LLM usage: AI tools excel at synthesis and organisation but require human expertise for critical evaluation and domain-specific insights. The combination of AI assistance with domain knowledge produces a more comprehensive analysis than either alone.

Societal impact considerations:

Ethical implications: The severe under-representation of diverse populations in AI training data perpetuates healthcare disparities — a critical social justice issue requiring immediate attention.

Challenges addressed: The authors identified systematic barriers to scientific progress, including inadequate data sharing infrastructure, competitive research environments, and insufficient quality standards.

Future directions: The research suggests developing standardised AI validation frameworks, mandatory diversity requirements for training datasets, and collaborative data sharing platforms.

Broader impact: Reliable AI could democratize access to high-quality cancer care globally, but current methodological weaknesses risk exacerbating healthcare inequalities.

Reference:
Corti, C., Cobanaj, M., Marian, F., Dee, E. C., Lloyd, M. R., Marcu, S., Dombrovschi, A., Biondetti, G. P., Batalini, F., Celi, L. A., & Curigliano, G. (2022). Artificial intelligence for prediction of treatment outcomes in breast cancer: Systematic review of design, reporting standards, and bias. Cancer Treatment Reviews, 108, 102410. https://doi.org/10.1016/j.ctrv.2022.102410

Stanford Data Ocean provides Stanford certificate training in precision medicine without costs to anyone whose annual income is under $70,000 USD/ year. Apply for scholarship here: https://docs.google.com/forms/d/e/1FAIpQLSfi6ucNOQZwRLDjX_ZMScpkX-ct_p2i8ylP24JYoMlgR8Kz_Q/viewform

Originally published on Medium

← All articles View on Medium↗