15 Comments
User's avatar
Surag Nair's avatar

On the point of data quality over quantity — if the end goal is to make patient-level predictions (e.g., response to therapy), won’t we eventually need large-scale data (10-100k+ patients even)? High-dimensional, multi-modal data per patient is crucial, but with few patients, the analysis risks becoming more descriptive than predictive. That’s still great for hypothesis generation but maybe not for ML. One analogy is models that predict sex from retinal images where the signal is real and non-obvious, but only becomes robust and generalizable with scale.

Expand full comment
Abhishaike Mahajan's avatar

i think it is an open question how much data is necessary! i think in the short term, i am much more bullish on hypothesis generation, which is also why it is good that noetik’s collected dataset is (currently) one of a kind. i agree data throughput will need to improve regardless, but the bottleneck is much more on the machine side, and people besides us are working hard on that (spatial transcriptomics companies)

Expand full comment
Matt Schwartz's avatar

I think there's an opportunity to combine quantity and quality. In endoscopy, we're finding that we can use massive quantities of unlabeled data to train a self-supervised encoder. That encoder allows us to train downstream application decoders with relatively small datasets that are well-curated and labeled. The example we've shown so far is that we can take the placebo arm of a Ph3 ulcerative colitis trial that's 300 patients and classify the responders vs. non-responders from only their baseline colonoscopy video!

Expand full comment
Eric Kernfeld's avatar

Hi Dr. Owl. I spent a big chunk of my Ph.D. evaluating counterfactual predictions about genetic perturbation outcomes. I spent some time looking at the OCTO-VC demos and I found it very worrisome. There is a growing graveyard of similar models that seem to do worse than the mean of their training data. Here are 8 independent evaluations that differ in many details but are all broadly compatible with poor performance of virtual cell predictions.

Ahlmann-Eltze et al.

https://www.biorxiv.org/content/10.1101/2024.09.16.613342v5

Csendes et al.

https://pmc.ncbi.nlm.nih.gov/articles/PMC12016270/

PertEval-scFM

https://icml.cc/virtual/2025/poster/43799

scEval

https://www.biorxiv.org/content/10.1101/2023.09.08.555192v7

C. Li et al.

https://www.biorxiv.org/content/10.1101/2024.12.20.629581v1.full

L. Li et al.

https://www.biorxiv.org/content/10.1101/2024.12.23.630036v1#libraryItemId=17605488

Wong et al.

https://www.biorxiv.org/content/10.1101/2025.01.06.631555v3#libraryItemId=17605840

My Ph.D. work

https://www.biorxiv.org/content/10.1101/2023.07.28.551039v2

I would be interested to hear your thoughts on this. Are you worried about it? If OCTO-VC doesn't predict counterfactuals well, how will that affect Noetik's strategy?

Expand full comment
Abhishaike Mahajan's avatar

what was worrying about specifically the octo demo?

Expand full comment
Eric Kernfeld's avatar

OCTO-VC showed very few examples of successes, and the examples seemed to be selected by starting with known causal effects and then checking model predictions. It would be more reassuring to also include negative examples (perturbation has no effect) and show that OCTO-VC does not predict an effect, or to start with the top model predictions. It would also be reassuring to see an acknowledgement of prior negative findings up front along with a strategy for how to work through them, like in the txpert demo. Without that, it's hard to tell whether the OCTO-VC team considers these findings relevant.

Expand full comment
Abhishaike Mahajan's avatar

That's fair! We'll hopefully discuss some of the negative cases in future tech reports

also, thank you for the comment :) we'll hopefully live up to your (very understandable) standards with follow-on releases

Expand full comment
Eric Kernfeld's avatar

Cool!

I want to apologize for the harshness of my comments and congratulate you on your new endeavor. I tend to get bogged down in the details, but there's a lot of different ways to get insight from -omics. Wishing you folks success.

Expand full comment
Abhishaike Mahajan's avatar

no worries at all! i greatly appreciate the high standards and think its much more useful information to have than pure positivity

Expand full comment
Alexander Honkala's avatar

say high to Ron for us!

Expand full comment
zdk's avatar

All the good ones are leaving NY for SF 😞

Expand full comment
Abhishaike Mahajan's avatar

currently plan to stay in NY! at least for the moment

Expand full comment
Calvin McCarter's avatar

There are two similar-yet-different strategies to that of Noetik, and I'm curious for your thoughts about each of them. One direction (eg Tempus) is to first focus on scaling one's patient population, and then on getting additional modalities of data for individuals of particular interest. To be fair, at least in Tempus' case, the "get additional modalities of data" is driven by patients and doctors, not Tempus itself, but it turns out that those all select for the same thing (difficult cases with poor existing mainline treatment options). In constrast to these, it seems that Noetik is going to have data from fewer individuals yet more modalities, at least to begin with. Is your optimism about Noetik primarily driven by optimism about this strategic bet, or by optimism that Noetik has the ML chops and vision to build and utilize foundation models?

The other direction is to focus on perturbation data rather than observational data. The advantage there is that a perturbation screen hit also directly tells you how exactly to modify the disease state. (The disadvantage is of course that perturbation data is less realistically contextualized.) Do you think Noetik's models will also be able to answer the question of how to perturb disease states, or do you think that other parallel work in AIxbio (eg via protein structure models) will commodify solving this problem?

Expand full comment
Abhishaike Mahajan's avatar

Hmm, hard to pattern match the companies plan into one of those two

Here are some general thoughts that maybe help answer the questions:

1. In-vivo, observational human data is the most valuable, and collecting this (with many modalities) to train a model is our highest priority

2. We can poke at ‘perturbational’ data by helping other pharmas design their clinical trials. We’re doing this right now with one of our partners (the Agenus partnership) and it is where I feel most bullish (blog post someday!)

3. You can poke at (less realistic) perturbational data at higher scales with mouse models, which we are currently doing as well.

4. ML is rarely a good moat, but (i think) data fed into the ML can be! I think we picked a set of data that will be very hard for other people to ‘accidentally’ acquire, eg the Tempus/Flatiron strategy, purely bc its so new/expensive to gather (mostly the spatial transcriptomics). Someday this wont be the case, but all advantages eventually fade

5. I think we have early signs that our largely observational data is good enough to get at *some* disease state perturbation, though caveats on it of course not being perfect. Blog post about this actually coming soon

But tbh, strategy matters a lot more than any ML/even the data fed into the ML, I think the BD team at Noetik is really quite good, though I’m at less liberty to talk about how they are pondering things

Expand full comment
Maya J's avatar

What's your take on datasets and models like State x Tahoe-100M and their comparative off-the-shelf value (as a function of their scale and training) compared to smaller tailored datasets like Noetik's for hypothesis generation?

Expand full comment