16 Comments
User's avatar
Jacob's avatar

Ha you did the thing the original essay warns against of forgetting that fields besides your own have a surprising amount of detail! For “antibiotics for bacterial pneumonia” consider that viral and bacterial pneumonia themselves present similarly and it was a huge deal to understand the difference. Then notice that gram positive vs gram negative vs mycobacterial infections require different antibiotics, and even different bacterial species or subspecies have different resistance spectra, plus invisible-to-all-staining differences in beta lactamase expression change everything again. For chronic lung infections, you can do susceptibility testing with cultured clinical isolates but this predicts response surprisingly badly, maybe because of bacterial genetic variation within a lung, maybe because biofilm phenotypes alter metabolism and antibiotic permeability, but you can maybe resensitize to antibiotics via treatment with EDTA or succinic acid… I could go on for days (phage, vaccines, anti biofilm antibodies, …) and that’s only lung infections specifically! Then for like catheter infections you can start thinking about how like shark skin inspired coatings can prevent biofilm formation on catheters and a zillion other things

Expand full comment
Fragile's avatar

I’ve been an ml engineer and now as someone with incurable blood cancer, have started looking into paths that apply my expertise to the problem. This was a great post to read and I’ll be scouring the rest of what you’ve put out. Thank you!

Expand full comment
Rohan Saxena's avatar

Wishing you the best of luck

Expand full comment
Matej Šarlija's avatar

Her -> great weekend fun, HER-2 -> not so much.

Anyways once again you wrote a wonderful essay that deals with a lot of what has been going on through my head during the week.

I'm 100% certain that the current LLM / GPT related SOTA methods will do so much more for biology that it ever did for text.

Other thought is that we seem to know as much about cancer as we know about consciousness or intelligence.

Expand full comment
Matej Šarlija's avatar

Also, is this (overall article) why you've moved from viruses to cancer?

Expand full comment
Gordon Shriver's avatar

> I think one of the craziest things we’ve found is that tumor cells can pump out exosomes—tiny lipid vesicles—carrying microRNAs that reprogram distant tissues

Are these analogous to the lipid nanoparticles used for mRNA vaccines?

Expand full comment
Karson's avatar

Mostly, though lipid nanoparticles are simpler and don’t have the same membrane composition as exosomes. Imagine lipid nanoparticles as the most basic exosome, whereas all natural exosomes have important regulatory proteins on the surface and environments inside that dictate what they do and where they go after being secreted from a cell.

Expand full comment
Alex Brown's avatar

*poring over :)

Expand full comment
Kevin Horgan's avatar

Abhishaike - Nice essay. Everything is algorithmic.

Expand full comment
Hadrien Mary's avatar

Fascinating essay. This really resonates with me and what we are trying to tackle at BeLiver.

I really like the idea of learning the joint distribution of these binary biomarkers. Though I guess removing the threshold could unlock more signal, even if it means more noise.

You've cited OncotypeDX, and I believe this is the pragmatic approach that has the potential to improve the standard of care, adding real clinical value for patients today.

Even simplistic ML models fed with the right kind of data can have strong predictive power and still unlock biological understanding and interpretability. I have doubts about images being the right kind of data for those cases.

And moving forward, I really hope foundational models will be able to add this missing inductive prior knowledge that can have an even stronger impact in areas such as precision oncology.

Can't resist a bit of self-promotion here. If you're interested, we've been working on this exact problem at BeLiver: https://beliver.fr/science and our foundational scientific paper (https://www.biorxiv.org/content/10.1101/2025.01.03.631224v1).

Expand full comment
Shankar Sivarajan's avatar

This administration's FDA might be exceptional in its approach, and I would be more cautious in extrapolating from its decisions. The more conventional one likely to succeed it will almost certainly revert to its usual function of consigning untold millions to the invisible graveyard. For "safety," of course.

Expand full comment
Ziyuan Zhao's avatar

Great article — it really pushed me to articulate some thoughts I’ve been wrestling with lately.

Technically, I agree that multimodal foundation models are where the field is heading. But I remain skeptical about how much these deep neural networks can really advance our basic understanding of cancer biology or immunology. Even if the FDA approves a black-box test, that decision often rests more on real-world validation than on scientific transparency. Saying that “clinically useful variables are hiding in cancer datasets” is true, but also a bit empty — what matters is whether the model helps us understand why particular combinations of histopathologic and clinical variables predict one outcome over another.

A deeper problem, I think, is that most of these models inherit off-the-shelf architectures — CNNs, then ViTs — along with objectives and augmentations from computer vision. These designs rarely encode biological priors, so the learned embedding space ends up being powerful yet hard to interpret mechanistically. You can cluster the representations, but the relationships are still empirical and correlative. That’s why I worry that the vision of “fusing all data modalities into a single representation” may be technically elegant but scientifically complacent — it risks optimizing for prediction while losing sight of explanation.

I work in a research lab in Boston that generates large H&E and multiplexed IF datasets, and my colleagues and I are trying to address some of these challenges — how to make the learned representations biologically grounded rather than purely statistical. I’d be very interested to hear your thoughts on how to reconcile predictive accuracy with mechanistic insight.

Expand full comment
Jean Paulson's avatar

I found this deeply interesting as a recently diagnosed HCC( primary hepatocellular carcinoma)patient. I have resisted doing much research as the emotional maelstroms,daily life, explosion of urgent paperwork and aftermath of a tumor rupture (which derailed planned resection of liver with its jellyfish tumor attached and shunted me to chemo--Tecentriq+Vegzelma--)have pretty much taken all my available headspace. Could you point me to some (lay-person accessible) sources that may be useful to my particular cancer? Many thanks for that, and for the article.

Expand full comment
Abhishaike Mahajan's avatar

Perfectly understandable! Unfortunately, I am not an oncologist, and whatever I do know about cancer lies outside of that particular subtype, so I'm not sure how much help I could be in linking to useful places. I have found r/cancer posts interesting to go through, and I do see that there are some threads about that particular subtype: https://www.reddit.com/r/cancer/search/?q=primary+hepatocellular+carcinoma&cId=65a1a81f-e107-493f-a8fc-97bb877e8146&iId=b7eef302-a3af-4e9d-a28f-cfaf80ce06b2

Hope everything turns out well :) hopefully some people smarter about cancer management can chime in here

Expand full comment
Jean Paulson's avatar

Many thanks for your response, and the link.

Expand full comment
Satish Gaurav's avatar

No mention of "The Cancer Code by Jason Fung"?

Expand full comment