Owl Posting

The makings of a good bioweapon

Abhishaike Mahajan — Thu, 18 Jun 2026 15:33:50 GMT

Note: I’ve been traveling through Europe for the past week or so, and have not had time to finish my larger ongoing essays. So, this is a piece I wrote back in December 2025 about bioweapons programs. Also, a few friends and I are hosting an NYC meetup on July 16th, you should come by!

An ogre of a creature, something that had been born just weeks back, chewed on a padded rectangle. This rectangle was wirelessly connected to the tablet I was holding, and chirped that the bite force of whatever was gnawing at it hovered at roughly 4,700 PSI. That was the last thing I needed before I could finally hit the switch. The creature’s head vanished, replaced momentarily by a red aerosol and the sound of wet pennies hitting glass. Generally good practice to pack these things’ skulls with a plastic explosive when they first slide out of their birthing tank, because six-inch glass really isn’t tough enough to prevent one of these newer breeds from getting through to me, and replacing it with something seven-inch thick, or even a foot, just felt like kicking the can down the road.

I extracted the few biopsies I needed from the corpse, its body still gushing various gases from its various organs, and called in a cleaning crew. This one was Generation 47. I went through the checklist, compiling together a list of metrics to place into a slide-deck later. The cleaning crew arrived in their hazmat suits, spraying dissolving enzymes before the next iteration arrived. I handed one of them the vials containing my biopsies. “Could you hand this to the evolution team?” He laughed, a hearty, full guffaw, and told me to find some other idiot to be a messenger boy. I breathed in deeply and delivered it myself.

After arriving back at my observation chamber, I received a call from the external womb team, who told me that Generation 48 was on their way. I thanked them for the notification and hung up. I hated the external womb team. They had been given a budget of nearly $50B to keep the production line of this project moving as quickly as possible and it increasingly felt like the developmental biologists who ran the whole thing had long since abandoned the hope of doing anything useful, in pursuit of increasingly bizarre aesthetic modifications.

And, speak of the devil, Generation 48 was a perfect example. Their team wheeled in their atrocity on a sterile chrome gurney, plopping its drugged, swollen body into the walled-off room in front of me. They had really outdone themselves this time, creating something that looked like it had been designed by a group of giggling twelve-year-old boys. It had iridescent scales that shifted from oil-slick purple to something resembling a sunset over a chemical spill as it shifted nervously back and forth, and, as it yawned, revealed rows of needle teeth that had rims of gold leaf on them. And it was somehow even bigger than the last one, because bigger is always better.

My manager was a man named Alexander Smirnov, who walked in as I was mentally weighing whether it’d be easy to get away with ending the creature’s life immediately, eventually concluding that it’d raise too many questions.

Smirnov exclaimed, “Wow! Look at this thing! Isn’t it gorgeous?”

Smirnov resembled a knuckle, a thick one, a swollen creature with suspiciously thin limbs, as if he swallowed a prize hog and was hiding it in his belly, refusing to digest it and nourish himself. Between his ears lay a single puff of air, roaming around, excitedly colliding with the walls of his skull like a housefly trying to escape a windowpane. He had risen to his position through a kind of stochastic motion, bouncing from role to role until he’d accumulated enough momentum to become unmovable.

“Yes,” I said, “It’s certainly something.”

“Though I was looking at the statistics, this one’s bigger, sure, but it seems like the absolute size of its genome is smaller, no?”

I wanted to cradle his thick skull between my hands and push, push until they went straight through, just so I could feel and interact with the exact cluster of consciousness that produced such an inane comment.

“Yes.” I decided to say.

He nodded sagely. “Well, it’s a trade-off, isn’t it? Can’t have everything.”

“I did want to ask,” I murmured, “if you’d reconsidered my proposal yet to start a pathogen team? That I could lead?”

Smirnov looked crestfallen.

“Well,” he said, his voice dropping a register, losing the high-pitched, jovial charm it had just seconds ago. “It seems unlikely. I realize you have your own set of arguments for why we should be working on engineering viruses and bacteria, but it really is a tough argument to make upstairs. You have to understand, these people really like spectacles, things that go, pop! You know? Something they can really put alongside some visuals and copy. Our enemies must fear us, or something of the sort. Very difficult to do that sort of narrative with this proposal of yours!”

I fiddled with some parameters on my tablet, dropping in a few screaming prisoners armed with assault rifles into the creature’s den, though thankfully the glass muffled the whole debacle.

“I actually think the story there is quite clear, you know? It’s cheaper for one. Way, way cheaper. Think about the budget of the team we have running around here just to make these creatures in the first place. I wouldn’t need any of that! I’d literally just need budget to hire a dozen research assistants, a few bioreactors from a clearance sale, and the equipment needed to aerosolize the payload. It’d also be way more effective. We could depopulate a city in less than a week, all for less than a percentage of a percentage of what we’re spending now for this thing,” I said, gesturing to Generation 48.

The creature had already eaten the prisoners and, as if on cue, vomited. A spectacular, pressurized geyser of half-digested protein slurry, warped metal, and corrosive bile splattered against the glass with a meaty sound. An automated system sprayed water over the mess and the creature grunted in what seemed to be frustration.

Smirnov winced.

“Gross. Anyway, I hear you, but I feel like you still don’t quite have the rendering that I’d need. Have you chatted with Tara, our head of storytelling? She might be able to help you flesh it out.”

I had chatted with Tara. She maintained a vast, ever-growing vision-board, and she had instructed me to consider an area of the board that contained a picture of a sunset, a child laughing, and a handshake. She had asked me to ponder which of these three things my research proposal best aligned with and, after a contentious argument, I settled on the sunset. Tara waggled her eyebrows, her face stretching into a smile, and confidently announced, “Finally! This is the problem. Sunsets are what scientists want, but upper-management wants a handshake. Do you think we can get close to a handshake?”. I told her that I would work on moving it in that direction.

“I did discuss this with Tara, and she encouraged me to add a handshake flair to the proposal I wrote out.”

Smirnov nodded. “She’s very good at what she does.”

“Also,” Smirnov continued, leaning against the console, “the brass loves this. There’s a visceral quality, you know? People see one of these things and they understand the threat. You can’t put a virus on a poster.”

“Yes you can.”

“It’s not the same. It looks like a fuzzy ball. No teeth. No—” he gestured vaguely at the creature, who was lazily chewing its tail, “—presence.”

I cradled my head between my palms. “Okay. Okay. I guess the thing I’m still confused about is, why does any of that stuff matter? We’re making these things to kill people we don’t like. I do understand that, visually speaking, a big reptile is scarier. But is it actually better? I don’t think so. And in the limit case, a sufficiently lethal and contagious pathogen is very scary. Like, the Black Plague was extremely terrifying to all Europeans in the 1300s.”

“Well,” Smirnov mused, “that was a different time. People were more superstitious back then. They didn’t have the context to understand what was happening to them. These days, you tell someone there’s a new virus going around, and half of them think it’s a lie made up by their government. Maybe some of them even think it’s a good thing. But it’s hard to say that about something the size of a 10-wheeler trying to eat you!”

“It lands when they’re dead.”

“Ah!” Smirnov yelped, as if he’d finally grasped the real axiomatic difference between us. “That’s just the thing. We don’t want people to die too quickly! With a big creature, the enemy sees it coming. They have time to be afraid. They have time to tell other people to be afraid. It’s a force multiplier!”

I stared at him. He stared back, eyebrows expectantly raised, as if he had finally broken through to me.

I cleared my throat. “My point is that fear of invisible death is really a lot more significant than you’re describing it as. It swallows you up a lot more. You have to be scared of every little thing, your neighbor, your wife, the air, all of it. And you can’t even do anything about the fear, you just need to wait it out and see if your skin starts sloughing off, or your eyes start bleeding, or whatever, all the while knowing that you will have doomed those closest to you. In terms of morale loss, I’d even go so far as to say that it is even worse than dropping down a few dozen of these creatures.”

Smirnov frowned.

“That is an extraordinarily unpleasant way to think about things,” he said.

“I would contend that our job is to create unpleasant things to do unpleasant things to people.”

Smirnov made a noise that was somewhere between a sneeze and a cough. “That is one way of looking at it, but there is something to be said about having some restraint.”

“As opposed to this,” I said, pointing at the creature, “which is a measured and clinical exercise in restraint.”

“This is contained!” Smirnov exclaimed. “This is controllable. You can point it in a direction and say, ‘go there, eat those people, stop when you hit the river.’ A virus doesn’t care about rivers, and I’m certain doesn’t even know rivers exist.”

“Neither does this thing. It can swim. Generation 15 could hold its breath for six hours.”

Smirnov waved his hand. “That was a fluke. We’ve since removed the aquatic adaptations.”

“We’ve removed them three times. They keep coming back.”.

Smirnov moved to speak and could not complete the first word, instead choosing to nervously gap his mouth as he looked around, waving his hands in exasperation, as if gesturing to some invisible audience. In the meantime, I watched Generation 48 settle into a corner of its enclosure, curling up like a dog. I found myself feeling sorry for it. Then it opened one eye, a dinner-plate-sized orb of molten amber, and I remembered that I should not be looking directly at it.

“Look,” Smirnov said, “I’m on your side here. I really am. But you have to also understand the optics.”

“The optics?”

“Yes. The optics of funding a bioweapons division headed by someone who, and I’m just going to be direct with you here, comes across a little cold.”

“What?”

“You just described, in vivid detail, the experience of watching someone’s loved ones die while their skin sloughs off.”

“That was a hypothetical.”

Smirnov pinched the bridge of his nose.

“Have you considered,” he said slowly, “leading with the cost savings?”

“I led with the cost savings during the last board meeting,” I said. “You told me it made me seem ‘too focused on efficiency.’”

“Did I say that?”

“You did, and also said, and I quote, ‘You know who else was focused on efficiency? Train conductors in 1940s Germany.’”

Smirnov had the decency to look slightly embarrassed. “That may have been uncharitable.”

“It was, and, in fact, historically inaccurate. They were famously inefficient. That’s not the point anyway.”

“What is the point?”

Smirnov’s eyes twinkled with joy. He loved this. He loved this absurd back-and-forth, considered it a kind of sport. In the serpentine depths of his psychology, I was fairly certain that he had convinced himself that our arguments were a form of mentorship, and that I shared his sentiment.

I pointed at Generation 48, who was currently being battered around by a set of thick mechanical arms, meant to test its endurance to blunt trauma. “My point is: what is wrong with this one? Why can’t we just use it already?”

“Well, that one is just a prototype.”

“We’ve made forty-eight prototypes. At what point do we make something that isn’t a prototype?”

“When it’s ready.”

“When is it ready?”

“When it meets specifications.”

“Every time we get close, someone adds a new requirement. Last month it was venom glands. Before that it was echolocation. Then someone from the president’s office asked if we could make it breathe fire, and instead of saying no, you commissioned a forty-page feasibility study.”

“And it was fascinating! Did you know there’s a beetle that—”

“I don’t care about the beetle.”

Smirnov looked hurt. “The beetle was very relevant.”

“Let me ask you something,” I said, my voice almost quivering with rage. “Hypothetically.”

“Sure, shoot.”

“If I could guarantee—guarantee—that a pathogen program would produce a deployable weapon within eighteen months, would that change anything?”

Smirnov sucked air through his teeth.

“Define ‘guarantee.’”

“Guarantee. Certainty. One hundred percent confidence.”

“Nothing’s ever one hundred percent.”

“These things are currently at zero percent. We’re currently operating at zero percent. Listen to me, listen to me very carefully. I’m offering you a hundred versus zero.”

Smirnov pressed both palms against his temples.

“That’s not how I’d frame our progress.”

“How would you frame it?”

He considered this for a moment.

“I’d say we’re at one hundred percent of our current trajectory.”

“That doesn’t mean anything,” I said.

“It means we’re on track.”

I laughed. “On track for what?”

“On track for the future,” he announced. “The future is always on track, because it hasn’t happened yet.”

A tense pause hung in the air between us. It was at this precise moment that Generation 48 took an interest in my conversation with Smirnov, and began to stare directly at us. We tried our best to look away, but its gaze was eventually impossible to ignore once it unfurled huge, elephantine ears from its nape. Its meter-long tongue slurped the recently washed glass. What I felt from it was not anger, or even hunger, but something stranger: a spirit of inquiry.

“嘻嘻!” the leviathan cooed, “汝二人何所語耶?”

Smirnov blinked. I blinked. The creature blinked, though it took nearly two full seconds for its eyelids to complete their journey across its massive amber orbs.

“噫? 汝等聞吾乎?”

Smirnov was first to break the silence. “Is that Japanese? I am sensing something Asian here.”

“It’s Classical Chinese. It said ‘can you hear me?’”

“I didn’t know you knew that language. Why is it speaking that?”

I pulled up the specification document on my tablet, trying to find any mention of linguistic capabilities across the twelve hundred pages of requirements, amendments, and sub-amendments that governed Generation 48’s design. None popped up.

“汝等胡為爭辯不休耶?” the creature asked, pressing its snout closer to the glass. Why do you keep arguing?

I felt strange. A fuzziness had erupted in my pelvis, and it was crawling up and through me, from the tips of my toes to the top of my head.

“汝貌可畏哉!” the creature bubbled, its voice a low, resonant thrum that seemed to bypass my ears entirely and settle somewhere in my molars. You look scary. I look scary? To it? What?

“This is remarkable,” Smirnov breathed, staring at Generation 48, his body in the shadow of the creature. “Do you think we could get it to learn English? The president would lose his mind. Can you imagine? A press conference with this thing? We could put it on a big, long leash, have it answer questions—”

“Smirnov.”

“—maybe get it a little hat, something military, with a—”

“Smirnov.”

He turned to me, his eyes bright. “What?”

Smirnov did look scary. His face seemed stretched out, distorted, the corners of his lips seemingly stapled to his earlobes, too few teeth, or perhaps too many, and his eyes looked like those of a goat. His skin was pooled up, whorled, divots popping in and out through the pallid canvas of flesh.

“吾聞汝議,” Generation 48 softly said, “善哉，言之有理” the glass gently cracking against its massive body curling against it. I heard your proposal. It made a lot of sense.

The fuzziness had graduated from a sensation to an architecture, building something intricate and terrible behind my eyes. My fingers felt distant, like they belonged to someone standing very far away. Smirnov looked like a stain, a globbed stain smeared across the observation room, and he began his gibbering again about something or other. My tongue was a foreign object, thick and furred, and when I tried to speak, I heard only Classical Chinese tumble out, perfectly coherent. Thick letters of the language filled every sensory experience, all sounds being replaced with onomatopoeias, everything perfectly translated before it reached my ears. I lazily swiveled my head around, which felt like it weighed a thousand pounds, and saw the immense creature staring down at me, its frame having crawled through the glass, the whole structure caved inward, without me having noticed anything at all.

Reality, for a brief moment, began to snap and click, its joints moving into a new configuration. The creature began to speak again, but its voice came from the wrong direction even though I could see its mouth moving in front of me. "कच्चित् कुशलम्?” it asked, its eyes glowing with concern. Are you okay? It had switched to Sanskrit.

I wanted to say no. What came out was a slurred: “मन्ये वायौ कश्चिद् दोषोऽस्ति”, the words filling up my vision. I think there’s something wrong with the air.

It giggled, a deep sound hammering into my ears. I felt tired. So I lay down as Generation 48 gently rose upward, through the roof, and into the endless night sky, older and older languages enveloping it the whole way. As it reached its zenith, almost perfectly overlapping with the moon, it began to sing a melody that I instantly recognized as a chorus from a play performed eons ago, long before civilization as we understood it had ever existed, back before crops, back before languages even had names. The ballad was beautiful, and I wept, and it repeated it, and I wept again. The melody carried on the wind, drifting across borders, through ventilation systems, into lungs and bloodstreams. It spoke through weather alerts, wedding vows, emergency broadcasts, voicemail greetings, school assemblies, cockpit announcements, missile silos, confession booths—all of it, until every throat on earth had been taught the words, and every mouth opened to sing them, and did not close again.

How to build a cancer vaccine, and whether they will work this time

Abhishaike Mahajan — Mon, 08 Jun 2026 14:06:09 GMT

Grateful to Benjamin Vincent and Alex Rubinsteyn for our many conversations on this topic, and comments on drafts of this essay!

Introduction

When most people hear of “cancer vaccine,” they’ll think of normal vaccines. Perhaps they’ll even think of what ostensibly is a cancer vaccine: the HPV vaccine. These vaccines—and those akin to them—are not the subject of this essay, as those are preventive vaccines against an infectious cause of cancer. When you inject one of those, you are vaccinating against a virus. The virus causes cancer. Prevent the virus, prevent the cancer. This is standard vaccinology applied to an oncogenic pathogen, and amongst the approved ones, they work decently well, but are not, in any meaningful sense, what oncologists mean when they talk about cancer vaccines.

Typical cancer vaccines are vaccines given to you when you have cancer.

These have been worked on for forty years, and have largely failed.

It’s a grim field. I’ve talked with a fairly high number of biology folks at this point, and ‘cancer vaccines don’t work, right?’ is a common sentiment amongst them, even those who have never touched the area. Of course, the researchers who actually work in this specific domain will include some nuance as to why things aren’t so cut and dry, but the point is clear: this stuff is challenging. It’s not like people aren’t trying either. There have been many, many attempts to make cancer vaccines work, and each result has left an increasingly bitter taste in their mouths.

But there is something in the air these days. If you really try, you can feel it too. There is optimism afoot in cancer vaccines. Really, there may be optimism afoot in cancer at large. Sid’s stories and Rosie’s story have lit something of a fire underneath many people’s feet, and all sorts of eyes are being directed here. Is it time? Have we arrived? Are genuine cancer vaccines on the horizon?

Maybe. But let’s not get ahead of ourselves, and ensure that we understand the science here.

The immunological theory behind cancer vaccines

It’s a bit simple isn’t it? Cancer cells futz around with their genome, which makes them produce non-standard proteins. And as you may know, the immune system has machinery for noticing weird proteins inside cells. This is true for the weird proteins produced by virus-infected cells, and it is true for cells on the verge of going cancerous. And when our patrolling T-cells detect these weird proteins, they will politely ask the cell to kill itself. This is happening inside you right now, removing many would-be-cancers before they ever have a chance to flourish.

But sometimes the T-cells fail to notice, the would-be-cancer becomes a real cancer, and it becomes an annoyance to us.

In principle, the fix is simple: give the immune system a hint. Take the cancer-flavored protein, package it up alongside a chemical that signals “this is a real threat,” and inject it. Dendritic cells pick up this signal, scurry it off into the lymphatic system, present it to T-cells, who get very upset and go off hunting for the source. And the source is the cancer.

This is all correct, but we are skipping a very difficult challenge here. Specifically that, while getting the immune system to take your hint seriously is easily done by the chemical—also known as an ‘adjuvant’—it is a bit more puzzling to figure out what the right hint is.

Let’s take a guess. How about proteins that cancer over-expresses, or expresses in tissues where it shouldn’t? This is not so bad of an idea, and there are some good candidates here. HER2, which is amplified in some breast cancers. MAGE-A3, which is normally only expressed in testis but turns on in melanoma and various other tumors. These are often called tumor-associated antigens, or TAAs, and identifying them, in the eighties and nineties, was a small cottage industry. And identify them we did; there’s the aforementioned ones, alongside MUC1, NY-ESO-1, PRAME, MART-1, and a small zoo of similar candidates. And because TAAs are shared across many patients with the same cancer type, you can build a single off-the-shelf vaccine and ship it to everyone.

Given that we still have cancer today, the TAA-vaccine era did not exactly wildly succeed. We will talk later about why, but for the moment it is enough to note that the broad strategy of “find a protein the cancer makes a lot of and vaccinate against it” did not generally produce durable clinical benefit, and the field eventually started looking elsewhere.

Where else is good to look? Well, we should ponder what T-cells actually see. When T-cells are knocking on the door of abnormal cells to judge their internals, they do not perceive aberrant proteins floating around in the cytoplasm. What they see are short peptide fragments displayed on the cell surface, loaded onto a class of molecules called MHC, meant to act as a quick summary for what is going on inside the cell. Every cell in your body is constantly chopping up a sample of its proteins and displaying the resulting fragments on its MHC. And if a peptide is not on MHC, a T-cell cannot see it. If it is, and the underlying protein from which it is derived is mutated, the presented peptide too will look very different.

In other words: perhaps you don’t actually want any old cancer-flavored protein to be part of the vaccine. You want peptides, ones that are unique to cancer, that are displayed on MHC. To be clear: of these three desires, two were already well-understood back in the TAA days. All TAA cancer vaccines of olde also used peptides that are presented on the MHC. But TAAs were only associated with cancer. They were not unique to cancer. Because of this, there exist extremely few T-cells in your body that will respond to a vaccine containing them, making any eventual immune response extremely weak. Why don’t you have such T-cells? Because of a process called central tolerance, which is your body’s attempt to prune away all ‘self-reactive’ T-cells to prevent them from attacking your own body.

But there is a different category of cancer-flavored peptide, one that your immune system has never seen before. Remember: cancer cells futz around with their genome. They accumulate point mutations, and some of those mutations land in protein-coding regions, and some of those produce slightly altered peptides that get chopped up and loaded onto MHC alongside everything else. And perhaps a tumor cell’s TAAs have mutated so heavily, so thoroughly, that they hardly resemble the natural one.

Happily for us, this is often true. These heavily altered, MHC-displayed peptides that result from genome-futzing are often referred to as neoantigens.

Neoantigens are the natural way to build a cancer vaccine. They are displayed on MHC. And the T-cell repertoire that can recognize them is, in principle, fully intact, because they did not exist when the immune system was learning what to ignore. Of course, the logistics get worse now. Useful neoantigens are unique to your tumor and your tumor alone, which means we’ll need to pump out a brand new vaccine for each cancer patient that walks through the door.

Still, maybe we’re willing to put up with this if it is a bona-fide cure for cancer. As of today, there are two ways to discover these hyper-unique neoantigens to put in a cancer vaccine.

The first is to directly pull whatever is currently sitting on the MHC of a fresh tumor cell. This is a technique called ‘immunopeptidomics’, where you grab MHC complexes off a cell surface and run them through mass spectrometry to identify all extant peptides on the surface. This is the ground truth. It is also rarely done. To do it, you need a sizable, cryopreserved tumor sample to run through the mass-spec machine, and even then you tend to recover only a sliver of the actual immunopeptidome due to sample noise. It is not something you will ever use on a routine clinical timeline, even for the ultra-wealthy slice of cancer care—the size of the tumor required is often too ‘demanding’, and cryopreservation is a type of tumor storage method you just rarely see.

The second, far more common path is to sequence the tumor and predict what would be presented on the MHC. In other words, take the sequence you’ve pulled off the tumor, compare to the patients normal sequence, and identify the mutations. For each mutation that lands in a protein-coding region, you can construct the mutated protein sequence. Simply take the reference protein, swap in the mutated residue at the right position, and you have a hypothetical mutant protein the cancer is producing. Then you slide a window across that sequence around the mutated residue and generate every possible short peptide of the lengths that the MHC tends to display—typically 8 to 11 amino acids.

So if the mutation is at position 200 of the protein, you generate every 9-mer that contains position 200: positions 192–200, 193–201, 194–202, and so on through 200–208. Same for 8-mers, 10-mers, 11-mers. For a single point mutation you end up with maybe 30 to 40 candidate peptides. For a tumor with a few hundred mutations, you end up with thousands of candidate peptides.

This should give us a list of mutant peptides that, in principle, the cell could display. What’s next? Well, there are about four steps in between a protein being expressed and peptide fragments of it ending up on the MHC. But all of these are a bit hard to directly study. One way around this pickle is to rely on an easier-to-study proxy: is a candidate peptide physically able to bind to the MHC? Now, just because a peptide can bind to MHC doesn’t mean it will be presented on the MHC, but it is a useful filter to have. Necessary, but not sufficient!

Bu it is worth asking a question: why bother with the candidate list at all? Can’t we just be maximalist about it and stuff thousands of candidates into the vaccine? It only takes one (or maybe a few) to hit. It’s not like there are any downsides to being aggressive here.

Sadly, there is a downside to being aggressive.

Namely, a concept called “immunodominance”, which is the observation that when you present the immune system with a mixture of antigens, the resulting T-cell response tends to concentrate on one or a small handful of “winners,” with the remaining antigens getting ignored or generating responses so weak they might as well not be there. Why any given peptide wins the immunodominance tournament is a complicated function of neoantigen abundance, precursor T-cell frequency, the kinetics of antigen processing in the dendritic cell, and a pile of other factors that we mostly cannot (as of today) predict from neoantigen sequence alone. What you can predict is that something will win, and there is no guarantee that the winner is one of the peptides actually presented on the tumor cells you are trying to kill.

Let’s go back to filtering the peptide candidate list. We must deal with one more thing. Not only do neoantigens differ between people, but the underlying display port—the MHC—also varies. There exist thousands of different MHC types across the human population, each one of them having specific chemical preferences for which peptides will sit stably inside it. There’s HLA-A*02:01—the most common MHC allele in people of European descent—which has a strong preference for peptides with leucine or methionine at position 2 and leucine or valine at the C-terminus. HLA-B*07:02 prefers proline at position 2. HLA-A*24:02 prefers tyrosine or phenylalanine at position 2 and phenylalanine, leucine, or isoleucine at the C-terminus.

Complicated!

This may feel like a very machine-learning shaped problem and, it has, in fact, been treated as one for the better part of twenty-five years. The earliest attempts were exactly what you’d guess from the rules above: take the observed MHC preferences—leucine here, valine at the C-terminus there—and freeze them into a position-specific scoring matrix, a lookup table that grades each candidate peptide on how faithfully it honors any given allele’s known tastes. SYFPEITHI and BIMAS, in the late nineties, were essentially this and were surprisingly decent. Then came pan-allele models like NetMHCpan and MHCflurry that learn from the amino acid sequence of the MHC molecule itself, and can therefore hazard a guess for peptide that’d sit within MHC types they have seen only a handful of times, or never at all. At first, these models were trained only on in-vitro binding affinity data between peptides and MHC complexes, but these days, they are increasingly being trained on the—albeit meager—sets of immunopeptidomics datasets out there.

Unfortunately, all existing models have a fundamental problem, and the problem will not go away even if the models are pushed to their theoretical limit: they can only approximate the population-wide expectation of presented peptides given the peptide + MHC allele input. This is not the same as what is actually being presented on the tumor cell, which comes down to whether the tumor is transcribing the source gene at all, whether its antigen-processing machinery is even intact, or maybe something else entirely. None of this is legible from a peptide sequence and an allele name! You can, of course, feed the model this extra, contextual information, but such a model does not yet exist today.

Moreover, we’re ignoring a very big dragon here: most of our understanding of MHC-peptide complexes is derived from the canonical human proteome. But there’s a lot of differences between the canonical set and the actual set! The latter of which contains ribosome-only proteins, post-translational modifications of peptides, spliced-together proteins, and likely many, many more. None of these are derivable from knowledge of a tumor’s sequence alone, and so even our starting candidate list is often a sliver of what is truly found on the surface. For what it is worth, this is likely to be true for even immunopeptidomic workflows, as interpreting those results requires comparisons to some reference set, and the typical reference set is, again, the canonical human proteome.

But let’s say we solve all these issues. Now we’ll run into a problem that no workflow, no matter how sophisticated, can fully solve while being isolated from real, living human cells: peptide presentation is not the same thing as peptide immunogenicity. What is immunogenicity? It is a blanket term that covers three characteristics: capacity for a T-cell to recognize a peptide (binding), capacity for a peptide to force that T-cell to proliferate and kill (function), and whether the net impact of this leads to any clinical benefit.

You can only test the last category via in-vivo dosing. But can you test recognition and T-cell function-altering through simpler means? Technically no, all of this stuff should come down to the individual—their TCR repertoire, their tolerance history—and not the peptide alone. But we shouldn’t be too hasty. Surely there is some vague sense of immunogenicity that could be divined entirely from a peptide sequence, and no information about a specific individual’s immune cell population, no?

People have certainly tried. In 2020, an international consortium called TESLA, the Tumor Neoantigen Selection Alliance, ran an experiment on this exact question. They handed the same tumor sequencing data—exomes and RNA-seq—to twenty-five teams, let each predict which neoantigens would be immunogenic using whatever pipeline they favored, and synthesized the predictions to test them against real patient T-cells to assess both binding and function.

The best approaches could indeed enrich for immunogenic peptides from sequence alone! Not perfectly, but better than random. To do this, they used MHC presentation, which we have already discussed to death, but more interestingly, they also used a pair of crude proxies for immunogenicity that require no knowledge of the patient's immune system at all. One is foreignness, which is to say, how closely the peptide resembles known, common pathogen epitopes. Very neat! This is an implicit bet that you carry pathogen-reactive T-cells from some prior infection years back, and an immunogenic peptide will take advantage of them. The other is agretopicity, which is the ratio of how well the mutant peptide binds the MHC versus its wild-type parent according to a machine-learned model. This is based the theory that a mutation which sharply improves binding presents the immune system with something strange, and our immune system does not like strange things. Both are computable from a peptide sequence, MHC sequence, and a binding predictor, and have continued to be used throughout more modern immunogenicity prediction systems.

These are useful, but they are, once again, statements on population-wide expectations, and not on your individual tumor.

Things may be on the precipice of changing though. The frontier models of the last year—such as TCRBagger—have begun taking the patient's own measured TCR repertoire as a direct input, conditioning immunogenicity predictions on what an actual, real patient has. And it seems to lead to improved performance! Why hasn’t everyone been doing this all along? Well, the capability to measure immune repertoires at all is relatively recent, less than a decade old, and doing it perfectly is somewhat intractable for reasons that we’re not going to get into here. And still, it does not make for a perfect neoantigen selection system.

Where do things go from here? The preclinical paths forward seem quite predictable. Creating better neoantigen candidate lists by mining non-exome regions, setting up larger immunopeptidomics datasets to train better peptide-MHC-binding models, and improving our ability to do large-scale TCR sequencing all seem important for the future of cancer vaccines.

But before moving one, I should admit something. A lot of complexity about this system has been stripped away from my explanations, since trying to be very precise about immunology is always a bit of a losing game for both the reader and writer. For those who are interested, I’ve added some further details in the footnotes.1

Now, how have cancer vaccines built on top of all of this theory fared?

The past and present of TAA/CTA cancer vaccines

In the late 90’s, GSK had identified MAGE-A3—now one of the canonical TAAs—as an interesting target for a cancer vaccine, and there was a clever reason why. While MAGE-A3 was up-regulated in both melanomas and lung cancers, it is typically only found in the testis. This is what is known as a cancer-testis antigen, or CTA. These are a very, very special subtype of TAA. Since the testis is an immune-privileged site, a human’s T-cell repertoire can be assumed not to have been pruned against MAGE-A3 the way it had been against the rest of the human proteome.

This was quite exciting for GSK, and they ended up running two enormous Phase 3 trials on it. One trial for resected stage III melanoma enrolled over 1,300 patients. And another trial in early-stage non-small cell lung cancer enrolled 2,272 patients—still one of the largest cancer vaccine trials ever conducted.

Both trials read out negative, no patient subgroup seemed to benefit, and the whole thing was shelved.

We could mention the other TAA cancer vaccines, but those feel less instructive than MAGE-A3, because MAGE-A3 ought to have worked. Every other TAA vaccine suffers from the fact that their targets are self-antigens, so the T-cell repertoire has been thinned against them. So why did this, and seemingly every other CTA-associated cancer vaccine, not work?

To some degree, the answers are basic. MAGE-A3 expression can be spotty/evolved-away from, and antigen-presenting machinery can simply fail in late-stage cancers. But the much bigger problem was the delivery method. A sobering fact of drug development is that some very clever ideas can simply be ahead of their time, and not yet have the rest of the ‘tech tree’ developed enough for them to be best deployed. MAGE-A3 was such a case. It was delivered as a recombinant protein paired with an adjuvant called AS15, both of which had an excellent track record in infectious disease vaccines and were at the cutting edge of vaccinology in its time.

This never could have worked, and to see why, you have to understand a structural asymmetry between infectious disease vaccinology and cancer vaccinology.

Oversimplifying things a lot: the immune system has two arms. The first arm makes antibodies to bind specifically to things that don’t belong (a virus, a toxin, a foreign protein), either neutralizing them directly or flagging them for destruction by other cells. The second arm sends out cytotoxic killer cells that go around inspecting other cells in your body and inducing them to commit suicide if they look unhealthy—the phenomenon we mentioned at the very start of the last section. Antibodies handle threats that exist in the spaces between cells. Killer cells handle threats that have gotten inside cells, where antibodies cannot reach.

And when you inject a recombinant-protein-based vaccine into a patient, the primary immune response created is the antibody response. This is perfectly fine for many infectious diseases, but for diseases where the pathogen lives inside cells—tuberculosis, malaria, HIV—the killer arm is required, and protein vaccines have struggled for decades with exactly these. Cancer too is in this second bucket. Sadly, neither bucket was deeply understood during the early 2000s, and so a protein-based MAGE-A3 vaccine was tried and—predictably to us in the present—failed.

What a shame. But we have evolved beyond our primitive ways. These days, instead of forcing the ‘correct’ immune reaction via a vaccine, one could simply infuse in genetically-engineered immune cells that correctly poke at MAGE-A3 the way we’d want—a treatment modality often called TCR-T, or T cell receptor engineered T-cell therapy. This is expensive and doesn’t scale and is not really a cancer vaccine, but at least it is a perfect representation of what an ideal immune response looks like.

This was tried. Twice in fact!

How did it go? It was extremely toxic. In one myeloma/melanoma trial in 2013, two patients died of cardiogenic shock within days of infusion. In another, also in 2013, the treatment produced fatal CNS toxicity in two other patients. Why? Cross-reactivity. It turns out that if you build something to interact with MAGE-A3, you’ll also build something that accidentally interacts with an awful lot else. And it empirically turned out that these engineered immune cells were happy to also react with entirely natural MHC-peptide complexes—one from titin, a structural protein in cardiac muscle, and one from MAGE-A12, a brain-expressed protein that shares substantial sequence homology with MAGE-A3.

Hmm. Well, you may ask, getting back to the subject of this essay, how about mRNA vaccines that use MAGE-A3 antigens? It’s funny you mention that. For immunologic reasons we won’t get into, this should have actually worked in getting the right immune response, and it should have also led to little cross-reactivity since we can depend on the adaptive immune system to be more careful than we are with cell therapy infusion.

And indeed, your suspicions are correct. Using an mRNA-encoded mixture of several CTA antigens2—including MAGE-A3—BioNTech ran a Phase 1 trial in 2014 that produced great immune profiles in roughly three-quarters of evaluable patients, and, in 2024, a Phase 2 in checkpoint-refractory melanoma read out positive. The failure of the protein-based platform and the successful first doses of its successor were separated by roughly twelve months!

The program ended up being cut, but it seems to be more because BioNTech has a slew of other, seemingly more promising mRNA, CTA/TAA-based vaccines.

Even more importantly, BioNTech is increasingly realizing that we live in the future. Next-generation sequencing has dropped the cost of tumor-normal exome sequencing into the range of a routine clinical assay, making n-of-1 neoantigen vaccines, ones that needn’t worry about off-target effects, genuinely viable. Even more importantly, cancer care as a whole has massively improved in ways that compound with cancer vaccines: namely, checkpoint inhibitors, which came onto the scene in 2011. While cancer vaccines help generate an immune response, a checkpoint inhibitor simultaneously prevents those T-cells from being switched off. So the stage—by the late 2010s—was set up for a very interesting future.

The upcoming era of neoantigen cancer vaccines

In late 2019, BioNTech, Genentech, and Memorial Sloan Kettering did something very brave. They started dosing patients in a Phase 1 trial of BNT122, an mRNA vaccine encoding up to twenty patient-specific neoantigens, delivered via lipid nanoparticle in sixteen patients with resected pancreatic ductal adenocarcinoma (PDAC). Why resected patients, also known as ‘adjuvant’ settings?3 The hope here was that a sufficiently powerful cancer vaccine would obliterate the remaining cancerous pancreatic cells that were left in the aftermath of the surgery, hopefully helping the ~80% of PDAC patients who experience recurrence.

Before I explain the trial results, there is some useful context to share. First, the neoantigens were identified using the exact same gene-level process as I explained in the ‘theory’ section, settling on twenty neoantigen candidates to include in the vaccine. Because no immunopeptidomics was used (though we can’t know this for sure), these candidates were genuinely a risky bet. Second, the whole process took between nine and twelve weeks from surgery to dosing, meaning the cancer may very well have diverged from the neoantigens used. Thirdly and finally, PDAC is just a nasty disease that has chewed through many, many otherwise promising drugs.

Altogether, BNT122 was put in a situation that would have been the most difficult to shine in. But if it did shine here, there is a good chance it might shine anywhere.

And in 2023, there were signs of shining. In this three-year follow-up, eight of the sixteen patients had mounted a measurable T-cell response to their personalized vaccine, and the other eight had not. Among the eight responders, none had recurred, and all were still alive. Among the eight non-responders, seven had, and the median survival time was 13.4 months. This was, in 2023, the cleanest single piece of evidence the field had ever produced that personalized neoantigen vaccines could do something real, in a disease that had defeated essentially every other immunotherapy thrown at it.

At AACR 2026, a few weeks ago as of this writing, the team presented the six-year follow-up. Of the eight responders, seven were still alive, recurrence-free. Of the eight non-responders, two were still alive.

This should bring some tears to our eyes. Pancreatic cancer is one of the few outright death sentences in oncology, and surgery does not typically save you from it taking what it wants from you. The cancer has an 80% chance of recurring within five years, demanding its pound of flesh. But for the lucky patients whose immune system listened to BNT122, nearly all of them managed to stave off the disease.

The natural question is whether any of this generalizes. Does the broader neoantigen vaccine paradigm work in the other places we’d want it to work?

Weirdly—judging by the rest of BioNTech’s clinical portfolio—the answer is an emphatic ‘no’.

Three other trials were run using the same cancer vaccine design process. In early-2025, it failed in first-line metastatic melanoma. In mid-2025, it stalled in adjuvant muscle-invasive bladder cancer, after a “safety event [was] observed in the safety run-in population”. Finally, in November 2025, BioNTech disclosed in its third-quarter report that the trial in adjuvant colorectal cancer had crossed the boundary for futility at its first interim analysis, though this trial continues with the customary “the data are not yet mature enough to draw reliable conclusions about efficacy”.

So: in a single calendar year, the same type of vaccine produced what may be the most extraordinary efficacy signal in the modern history of cancer vaccines, while simultaneously failing first-line melanoma, getting paused in bladder, and tripping a futility boundary in colorectal.

What’s going on here? Wasn’t pancreatic cancer supposed to be the hardest condition? Why is it failing on the other, easier cancers?

Let’s think. Here’s something: if you look carefully at misbehaving pancreatic tumor cells, you’ll discover something interesting. Specifically, they typically have extremely low tumor mutational burden (TMB)—the number of mutations in a cancer cell's DNA—compared to most other cancer subtypes. This is usually a bad thing, as it means fewer neoantigens for the immune system to pick up on, thus usually worse response to immunotherapy. But…this may be partially offset by the fact that if the haystack is small enough, it makes it that much easier to find the needles. So, perhaps immunodominance is a much bigger issue in higher-TMB cancers, where choosing the wrong neoantigens ruins the game, whereas it simply is statistically more likely to pick the right ones for low-TMB cancers.

In other words, PDAC may be uniquely suited to cancer vaccines.

It’s an interesting story, but is it true? Maybe not. Melanoma should be the obvious failure mode here, as it is known to have especially high TMBs. And yet, while BioNTech’s approach failed here, the other big success story in the neoantigen cancer vaccines field is Moderna's cancer vaccine, which succeeded in melanoma. Why didn’t BioNTech’s approach work? The difference may come down to setting; whereas Moderna tested their vaccine for cancer recurrence post-resection, BioNTech tested it in patients with metastatic melanoma, which is a fundamentally different therapeutic problem, and one likely far less suited to cancer vaccines.

So…maybe TMB doesn’t matter, but instead the setting in which the cancer vaccine is applied? Well, wait a minute. If cancer vaccines ought to work in adjuvant settings regardless of TMB, then BioNTech's failures in adjuvant CRC and adjuvant bladder are deeply confusing. The drug should have worked there!

It’s all quite complicated, and the same questions we’re grappling with here are the same ones that the cancer vaccine field in general is confused by. Nearly every trial result you’ll see here is heavily confounded, and teasing out what any given result means is incredibly difficult. Everything from the adjuvant used, whether combination therapy was used, whether pre-treatment protocols like lymphodepletion were applied, what it even means for a patient to have an ‘immune response’ to the vaccine, all of this—and more!—is rarely comparable from trial to trial, and naive interpretations of the arbitrary decisions made here can lead to entirely incorrect takeaways.

For instance, let us return to BNT122, the miracle PDAC BioNTech cancer vaccine. There are very, very strong reasons to, a priori, believe that this vaccine could never work. Why? Remember, its neoantigen identification process likely relies on sequencing, not immunopeptidomics. Earlier, I stated that this was a risky bet due to the very real possibility that none of these neoantigens are present on the MHC, or, even if they are, that they are not even immunogenic. Yet, their gamble seemed to pay off.

But did it actually?

Yes, patients who had a ‘measurable T-cell response’ lived far longer than patients who had no such response. But what does a ‘T-cell response’ even mean? It means that we could detect, in your blood, T-cells that recognize peptides we put in the vaccine, roughly 6 months after vaccination started. This is a very logical definition. But you may notice a bit of a sleight of hand here; this definition also demands the existence of an intact T-cell repertoire, which almost certainly independently predicts patient survival quite well! Alternatively, perhaps the well-established PDAC phenomenon of a natural immune response occurred, and the cancer vaccine’s neoantigens happened to closely overlap with the natural neoantigen response. Who knows?

BioNTech is not trying to deceive anyone here. It is very normal for Phase 1 trials to have no controls, and to be unconcerned with assessing efficacy or teasing out strict causality. An upcoming, randomized Phase 2 trial is planned, and we will learn more then. My point is that lots of press has been written about this trial, a fairly high fraction of it heavily implying that neoantigen cancer vaccines are genuinely on the precipice of working. Perhaps it is! But perhaps not, and there are at least some reasons to believe the dissenting opinion.

Before we move on, you may instinctively ask: why hasn’t anyone tried to simply do…immunopeptidomics to identify the correct neoantigens? Isn’t that the obvious path here? Yes, it’s annoying, yes, it requires doing mass-spec on a very hard-to-get type of tumor tissue (cryopreserved), but these companies have tens to hundreds of millions to throw away on clinical trials. Why wouldn’t they set themselves up for success?

It’s just really, really hard. We didn’t discuss it at length earlier, but to see anything at useful depth via immunopeptidomics, you need on the order of a hundred million tumor cells, or north of a hundred milligrams of wet tumor. And even if you can summon up this amount of tumor, the mass spec itself typically has incredibly low sensitivity. In one representative 2022 study, researchers ran deep immunopeptidomics across seventeen colorectal patients, recovered nearly forty-five thousand unique presented peptides, and identified exactly two mutated neoantigens. And one of them was a common driver mutation you could have guessed without switching the mass-spec instrument on!

But there is a way around this. As I alluded to earlier, you cannot run immunopeptidomics on every patient, but you can run it once, on a large pool of tumors, treat the peptides it recovers as ground-truth labels, and train a model to predict—from sequence alone—what the spectrometer would have seen. Do that well enough and you have laundered an unscalable wet-lab assay into a cheap computational one: the mass spec happens once, in the training set, and every patient afterward has a way to filter their cheap, sequencing-based candidates more easily.

One company took this seriously: a biotech from the mid-2010’s called Gritstone Bio. Their model, called EDGE, was trained on tumor peptides pulled directly off the MHC by mass spec, rather than on MHC-peptide binding-affinity tables everyone else was using, and they reported it predicting presentation far better than the standard tools. Gritstone then built GRANITE, its personalized neoantigen vaccine, on top of that model.

Unfortunately, GRANITE’s colorectal data came in underwhelming, and the company filed for bankruptcy in 2024. Why did the approach fail to work? It’s hard to say for certain. Yes, it may very well be that the whole approach doesn’t work, but there are nuances to keep in mind. Gritstone maybe chose a particularly bad indication, or GRANITE needed more training data, or something else entirely, and their investors were unconvinced enough to give them any more money.

The stranger approaches to cancer vaccines

Technically speaking, TAAs and neoantigens cover the full landscape of possible ways to design cancer vaccines. What remains are edge cases that lie in between: cell-based cancer vaccines, and shared neoantigen cancer vaccines.

Cell-based cancer vaccines are not super relevant from where we stand today, but they are an interesting story.

Consider GVAX. GVAX is a procedure in which you take whole cancer cells—sometimes the patient’s own tumor cells, harvested at biopsy and expanded in culture; sometimes allogeneic, drawn from immortalized prostate cancer cell lines—engineer those cells to secrete something called ‘GM-CSF’, irradiate them so they can no longer divide, and inject them back into the patient. Once there, the GM-CSF forces dendritic cells to pay attention to them, those dendritic cells scoop up whatever cancer-flavored antigens happen to be conveniently lying around in the irradiated debris, and the immune system starts hunting for cancers that match those antigens. And importantly, no human involved need know what those antigens are! The cancer and the immune system have their own private dance with each other, fumbling together TAAs and neoantigens all in one go.

This is so fun. It is like a bizarro, steampunk version of attenuated-virus vaccines. The company behind it, Cell Genesys, raised several hundred million dollars to develop this concept across prostate, pancreatic, and a half-dozen other indications, and the platform was tried in more than a dozen trials over the better part of twenty years. It did not work, and Cell Genesys folded in 2009. Why? Probably immunodominance. Asking the immune system to ‘figure it out’ works with viruses, where the number of proteins is small and uniformly foreign. A cancer cell’s proteome is incredibly large, and the vast majority of them are self-antigens.

It would be unfair, though, to leave the cell-based cancer vaccine era on a note of unbroken failure, because one of its close cousins did the impossible: it got approved. Sipuleucel-T—sold as Provenge—remains the only therapeutic cancer vaccine the FDA has ever waved through, and it is assembled from roughly the same parts as GVAX. You leukapherese the patient to pull out their antigen-presenting cells (APC), staple a prostate TAA (prostatic acid phosphatase) to the same GM-CSF "pay attention" signal, and infuse the now-activated cells back into the patient, three times across a month. So instead of relying on the immune system to figure things out at all, you’re giving it the exact substrate you care about: the TAA presented on the APC. A Phase 3 in 2010 for metastatic castration-resistant prostate cancer found that the vaccine extended median survival by about four months, which isn’t too bad.

It was also accompanied by the bizarre finding that it did not change the size of the tumor at all or change PSA levels, leading to this fun 2010 article titled ‘Costly New Prostate Cancer Drug Works In Mysterious Ways’. As far as I can tell, what Provenge was actually doing under the hood to prolong survival has not yet been excavated. Sure, yes, it certainly increases T-cell infiltration, but why didn’t it reduce the size of the tumor? Unclear!

But it got approved, which is all that really matters. So why isn’t Provenge a triumphant chapter in this essay? Because the therapy cost $93,000 per course, was time-consuming to manufacture, and got lapped within two years by oral pills—abiraterone, enzalutamide—that delivered comparable survival benefit from a bottle for a fraction of the price. Dendreon’s market cap topped $7.5 billion the year of approval in 2010 and the company filed for bankruptcy in 2014. Drugs are a hard business!

Moving on: let’s consider shared neoantigen vaccines, which are relevant from where we stand.

KRAS G12D is the single most common KRAS mutation in pancreatic cancer—present in roughly 40% of patients—and shows up in a sizable fraction of colorectal and lung cancers; in patients with the relevant HLA alleles, the same mutation can yield the same presented peptide. It is a true neoantigen in the immunological sense: this mutated peptide does not exist in healthy tissue, central tolerance has not pruned the responding T-cell repertoire, the response can be clean and sharp. But because the mutation recurs identically across thousands of patients, and presents the same peptide on the same MHC alleles every time, you can build one vaccine and ship it to everyone who has the right mutation and the right MHC allele, much like TAA/CTA vaccines.

As of today, the KRAS side of shared neoantigen cancer vaccines is ongoing. Elicio’s ELI-002 is the most clinically advanced example of it, and the early auguries are cautiously good: the trial keeps postponing its readout because fewer patients are relapsing than they had expected. But the company remains blinded as to whether that is the vaccine or simply good fortune; the pivotal analysis has slid from late 2025 to “mid-2026”.

The most interesting question her is: can’t you scale this up? The roster of recurrent driver mutations is finite, the roster of common HLA alleles is finite, and when you multiply them together and filter for the pairings that actually work, you’re left with a manageable library of pre-made vaccines that could cover a substantial portion of cancer patients today.

Unfortunately, there are very few driver mutations as cooperative as KRAS.

An earlier hero of our story—Gritstone Bio, the same entity who explored immunopeptidomics—is an exemplar of this phenomenon. Alongside poking at n=1 neoantigen cancer vaccines, they had a separate program focused on shared neoantigens. Their version was a twenty-antigen cassette of shared neoantigens drawn from KRAS, TP53, BRAF, and others.

Unfortunately, KRAS is somewhat of a freak: a single recurrent point mutation, in a gene the tumor expressed at high levels, that happens to throw off a novel MHC-binding peptide the immune system was never tolerized against, which is also immunogenic. Most of the other famous driver mutations are not like this. Most of them, even if they technically present on the MHC, are not useful neoantigens because the underlying protein is rarely expressed at high levels, or are not immunodominant, or are similar-enough to self that no mounted immune response will be sufficient.

And Gritstone discovered exactly this in a Phase 1 trial named ‘SLATE’. In it, they tested the shared, twenty-neoantigen approach and found that one of the sparsely-expressed neoantigens—TP53—was immunodominant, drowning out the more trustworthy KRAS response. They reformulated this to be KRAS-only, re-running it as SLATE-KRAS, and—as mentioned earlier—went bankrupt before a mature Phase 2 readout.

Will there be genuinely, off-the-shelf cancer vaccines someday made available? Time will tell!

Conclusion, and what lies ahead

Drug development often displays a frenetic nature, in which something promising is identified and then ground into dust by a series of poorly-designed follow-on trials before anyone can figure out exactly what’s going on. This is truer nowhere else than in cancer vaccines. To be fair, this is no one’s fault. A lot of this stuff was genuinely underdetermined in difficult-to-predict ways; who could have possibly known that the exact type of vaccination—protein versus mRNA-based—would lead to entirely different immune responses?

But it does seem like things are, against all odds, slowly being figured out. While BNT122’s cancer vaccine in pancreatic cancer has reasons for us to doubt it, Moderna’s results for their cancer vaccine in resected melanoma (KEYNOTE-942) dropped just a few weeks back and this seem to be probably real. It is in a Phase 2B, so there is randomization and sample sizes are decently high. Here is the survival curve:

The confirmatory Phase 3, INTerpath-001, has finished enrolling roughly 1,089 patients in the same cancer setting. We should wait to cheer on too heavily, because a successful-looking Phase 2 does in no way imply a successful Phase 3! Remember that the TIGIT craze I wrote about a few weeks ago was launched on the basis of a ‘promising-looking’ Phase 2, and no Phase 3 afterwards succeeded.

Still there is a structural reason to think the present of cancer vaccines differs from the previous decades of abject failure. Recall that MAGE-A3 was not a stupid idea; it was an early one, a clever bet placed before the rest of the tech tree had grown in. Three things have since clicked into place that were unavailable to the people running those enormous, doomed protein-vaccine trials in the 2000s. Next-generation sequencing collapsed the cost of a tumor-normal exome far enough that building a bespoke vaccine per patient is feasible, mRNA delivery turned out to reliably elicit the correct arm of immunity that protein-based vaccines never could, and, perhaps most importantly, checkpoint inhibitors came onto the scene to allow cancer vaccines to actually help mount an immune response.

All three are the soil a cancer vaccine needs to grow in, and they only finished arriving in the last decade or so.

But even if it does end up working here, and Moderna finally lands themselves another blockbuster of a drug, much remains to be figured out. Remember, cancer vaccines are not a drug, not really. They are a manufacturing process, and a fairly high fraction of this process is still being worked on.

For instance: it’d be a shame if all cancer vaccines were useful for was getting rid of residual, neighboring cancer cells from surgically removed tumors—the ‘adjuvant’ setting. Yes, early-cancer detection tools are improving, so perhaps we are slowly entering a future where this does describe most patients. But from where we stand today, hundreds of thousands die each year from metastatic cancers, their organs peppered with rot, something no surgery in the world could fully remove. Immunotherapy was one of humanity’s first tools against this horror. High-dose IL-2, though brutal enough to put patients in the ICU, was producing durable complete remissions in a small slice of metastatic patients as far back as the early nineties, and the checkpoint-inhibitor revolution that followed turned metastatic melanoma—a reliable death sentence within living memory—into a disease that a real fraction of patients now outlive by a decade or more.

Immunotherapy proved this was achievable, but it is precisely the standard that cancer vaccines, for all their adjuvant-setting triumphs, have not yet come close to meeting.

Why not? Perhaps the immune priming is not yet good enough, so we must get better at selecting neoantigens. Perhaps the turnaround time for a cancer vaccine is still too long, so we must find ways to speed it up. Perhaps the immune system or tumor microenvironment of advanced cancer patients is too broken down to even listen to the vaccine, so we must reach into the realm of cell therapies, which have their own host of problems to deal with. Indeed, much work remains to shore up the full potential of cancer vaccines, and it is unlikely that a genuine, honest-to-god cure for cancer is just around the corner. This stuff is hard, and it will continue to be hard.

But despite all the tweaks to figure out, the optimism in the air should be paid attention to. For the first time, the underlying machinery is plausibly mature enough for the original, forty-year-old idea to, against all odds, finally work.

I have been saying ‘MHC’ all along, but there are actually two, very different types of MHC. The one I've been describing—class I—sits on essentially every nucleated cell, displays those short 8-to-11-mers, and is read by ‘CD8 T-cells’, the ones knocking on doors and politely requesting suicide. But there is a second, class II, which lives mostly on ‘antigen-presenting cells’, carries a much longer peptide—roughly 13 to 25 amino acids—and is read by ‘CD4 T-cells’, whose job is less to kill than it is to coordinate and egg on everyone else's killing.

I am not being too reductive by focusing on MHC-I, as all a tumor cell has is class I. But! When people go measure the T-cell responses these vaccines actually raise, a large fraction come back CD4 rather than CD8, which is a bit of a surprise to a field that had spent twenty-five years tuning its predictors for class I. So class II is unambiguously involved. Whether it is load-bearing, or merely a helpful nudge to the CD8 response, or simply along for the ride, no one can presently say. I am going to keep ignoring it regardless, because the distinction doesn't change what a cancer vaccine is fundamentally trying to do. If you desire an interesting takeaway from this, I’ll offer one up: MHC-II antigens are far worse characterized than MHC-I ones, mostly due to technical difficulities. Interesting white-space opportunity for data collection? Or a rational decision by immunologists triaging their resources? We’ll see!

Curiously, it wasn’t CTA antigens alone included in the vaccines! BioNTech also included melanocyte-specific antigens, which would lead to an immune response that could also attack normal melanocytes, causing vitiligo-like depigmentation. But this non-fatal toxicity was—in cases of fatal metastatic melanoma—viewed as a worthwhile trade. But you may ask: shouldn’t the cohort of T-cells capable of responding to melanocyte-specific antigens have been pruned out before they were allowed to roam your body? You’re right! They should have been! But some otherwise healthy patients have a fraction of these self-reactive T-cells circulating around.

No, you aren’t misreading. The word ‘adjuvant’ is indeed used in two, very separate ways. One refers to the immunostimulatory chemical given alongside an antigen/neoantigen, the other refers to treatment given after primary treatment (like surgery) to eliminate residual disease. Why is the same word used for both? 'Adjuvant' descends from the Latin adiuvāre, 'to help.' The chemical helps the antigen; the therapy helps the surgery.

The ballad of TIGIT

Abhishaike Mahajan — Tue, 26 May 2026 15:43:56 GMT

There exist drug classes that seem, in retrospect, cursed. As these chemicals worm their way through the clinical trial system, they consume billions of dollars along the way, and squelch through thousands of sick patients. When finally it dawns on everyone how useless the whole endeavour was, the drugs life is at last cut short, nothing useful left in its destructive wake. The prototype here are amyloid-beta drugs. These are Alzheimer’s treatments that are widely perceived as immense disappointments, with the negative sentiment even leaking to the broader public. To be fair to these chemicals, the story here is a bit more complicated than the tabloids let on. Lots of amyloid research was not fake, and the drugs may genuinely be useful for early-stage Alzheimer’s. But they remain, regardless, disappointments.

Beyond amyloid-beta, which has been steadily disappointing for awhile now, there is one other such category of drug whose particular dance has just recently wrapped up. It may very well someday gets its chance in the spotlight, but it will take time. Because it—just like every other chemical in this class—shares a searing, burning radioactivity. You should not touch them. You should not suggest touching them. In fact, no serious person should touch them for years to come, because to do so will be to receive the scorn of other serious people.

What I am talking about are, of course, TIGIT drugs.

TIGIT emerged in the wake of boundless enthusiasm from over a century of grueling cancer immunotherapy research. Much of this work went nowhere, but a small fragment of it helped produce the most valuable molecule in existence: Keytruda (pembrolizumab). This drug was so astonishingly, grossly successful that it would be barely an exaggeration to credit Keytruda with creating a Big Pharma. Since its approval in 2014, it has saved millions of years’ worth of patient lives, and will likely continue to save millions more.

So, if you worked in pharma R&D in the mid-2010’s, and you were on the hunt for the next big thing, “something like Keytruda” was the most attractive thing on the board. And TIGIT drugs were supposed to be that.

An explanation of what TIGIT actually is would require you to hold roughly seven concepts in your mind at the same time, the names of which—in characteristic immunology fashion—are not helpful in the slightest. What is important to understand is that TIGIT is a particular protein, and theorized to be another immune-system brake. The aforementioned century of immunology research had already proven that these brakes mattered: Keytruda worked by blocking a different brake and allowing immune cells to attack tumors again. TIGIT seemed to offer the same promise, since tumors appeared to exploit it to quiet nearby immune cells. But this story was set to be even more intriguing. Unlike Keytruda’s target, TIGIT sat at an especially busy intersection of immune regulation. Blocking it might not merely release one brake, but rather two brakes and one accelerator, tilting the local immune environment towards such an absurdly anticancer direction that it was unthinkable that it wouldn’t be clinically effective.

So, the theory went: block TIGIT, or create an ‘anti-TIGIT’ drug, and you’ve got something even better than Keytruda on your hands.

Dollar signs appeared in the eyes of nearly every pharmaceutical executive upon learning this. Roche was the first here, their group establishing the above scientific observations, publishing them in a 2014 paper. "The immunoreceptor TIGIT regulates antitumor and antiviral CD8+ T cell effector function”. The molecule that emerged from this work was something called tiragolumab; the first anti-TIGIT drug to exist.

Its initial clinical debut was at ASCO 2020, a major oncology conference. There, Roche discussed the results of a 135-patient phase 2 trial in metastatic non-small-cell lung cancer, randomizing patients one-to-one to the standard-of-care, plus either tiragolumab or placebo. The combination produced a response rate of 31% versus 16% in the placebo. Tiragolumab seemed to work. Yes, it wasn’t a cure for cancer, but neither was Keytruda, which still managed to rake in nearly ten billion dollars a year. The FDA granted tiragolumab a breakthrough designation in January 2021 on the basis of that study, and, within a year, Roche began to spin up phase 3 trials.

Blood was in the water for TIGIT, and though Roche was first to bite, others followed. Merck had vibostolimab. BMS had BMS-986207. BeiGene had ociperlimab, for which Novartis paid $300 million in early 2021 for co-development rights. Arcus had domvanalimab, for which Gilead in late 2020 paid $175 million up front plus a $200 million equity position in the company. iTeos, a little Belgian immuno-oncology outfit, had something called EOS-448, which GSK licensed in mid-2021 for $625 million upfront. Everyone wanted a bite and was willing to pay for it.

Typically, with drug classes that have as much buzz as TIGIT did, companies like to run multiple trials in parallel, each one focused on a different cancer or patient subpopulation. This is to avoid a situation where your drug works spectacularly, it gets approved for the cancer subtype you tested it on, and then you have to watch on as your competitors’ drugs flood the remaining subtypes with their copycat chemicals. And the theoretical evidence for TIGIT was so strong, so overwhelming, that when combined with the promising phase 2 results, pushed Roche to go all in. At one point, they were running twelve concurrent Phase 2 and Phase 3 trials, each focused on a slightly different patient population, altogether covering ~5,000 human lives. This effort, which was branded ‘SKYSCRAPER’, represented one of the largest parallel-indication programs in modern immuno-oncology, its total costs likely running into the multiple billions.

In May 2022, the first crack showed. Roche reported that its first major Phase 3 trial, in first-line small-cell lung cancer (SKYSCRAPER-02), had missed on progression-free survival, or PFS. But this was not the end of the world. Small-cell lung cancer is a rather miserable disease. Relatively little works here anyway. This subtype has swallowed a long procession of drugs that excelled in other settings, so this was not viewed so much as a failure as it was an admirable, Hail Mary attempt that was almost assuredly not going to work out anyway.

But a few weeks later came a bigger problem: Roche’s flagship lung-cancer trial (SKYSCRAPER-01), tested on an ostensibly curable type of lung cancer, also missed on PFS. To be clear: they did not miss it by a lot. Roche would spend the next two years insisting that the values were in the right direction, just not at statistical significance.

Either way, the company demurred, the PFS metric is not what matters most. They were not wrong. PFS means something quite specific: from the start of the trial, how long did it take for a patient’s cancer to either grow on imaging or kill them. It is a useful data point, but it is ultimately a fuzzy surrogate of the metric that people actually care about: overall survival, or OS. How long did this patient live? Unfortunately, this metric takes years to read out and is confounded by whatever subsequent lines of therapy the patient picks up after the trial, so PFS is an often relied-on proxy metric.

And Roche believed that OS would ultimately exonerate tiragolumab.

And in August 2023, Roche ‘accidentally’ leaked data suggesting that overall survival had indeed improved on the drug. The stock ticked up on what in retrospect was the last uncomplicated moment of optimism in the TIGIT race.

On November 26, 2024, Roche reported the final OS analysis. The flagship trial had missed. The survival trend had narrowed to the point of insignificance, and the trial that was supposed to anchor the entire program—the indication on which Breakthrough Therapy Designation had been granted, the signal on which ten other trials had been launched—could no longer be the anchor.

But the flagship’s collapse wasn’t even the nadir. The nadir, really, was Roche’s worse-than-nothing readout in July 2024 (SKYSCRAPER-06), in the interim between the flagship’s PFS miss and its OS miss. It did not merely fail to show superiority to the standard of care, but was actively worse. Patients on tiragolumab died faster than the control group.

Everything began to unwind from here on out for Roche. A planned follow-up was canceled before it had really begun, and another was deprioritized. Over the subsequent year, the GI indications collapsed one after another, a locally advanced esophageal-cancer trial failed, a head-and-neck study was abandoned, and the last major Roche hope, in first-line liver cancer, missed PFS with no trend toward OS. The only success, awkwardly, was a trial in esophageal squamous-cell carcinoma (SKYSCRAPER-08), which produced statistically significant survival results. But by the time the full paper appeared in early 2026, Roche had already removed tiragolumab from its pipeline.

The TIGIT game, for Roche, had ended.

But what of the other players? Could it be that tiragolumab was the problem, and not the TIGIT hypothesis? Perhaps a different molecule, one still targeting TIGIT, would have worked.

After Roche, Merck was the second biggest believer in TIGIT. Remember when I said Keytruda had almost single-handedly created a Big Pharma? Merck is that pharma. And with their patent over Keytruda set to expire in 2028, they were the ones most interested—and best positioned—to own its successor. In their exuberance, they decided to match Roche: twelve parallel trials of their own, each one running an anti-TIGIT drug called vibostolimab.

The same pattern repeated. In May 2023, Merck’s melanoma trial was halted because vibostolimab was causing such a high rate of immune-related adverse events that patients were discontinuing therapy faster than any efficacy signal could accumulate. In August 2024, a small-cell lung-cancer trial was halted for OS futility, with the combination arm running worse than the control on both efficacy and safety. In December 2024, two more lung-cancer studies were abandoned halfway through the trial. And by 2025, Merck announced the discontinuation of the entire vibostolimab program.

But there was one last hope. What if the biological story here wasn’t complete? What if the original Roche paper, a decade back at this point, had gotten something wrong?

Every anti-TIGIT drug was structured like an antibody, a protein shaped like a ‘Y’. Only the top two segments—known as the Fab region—are actually interacting with TIGIT, while the bottom region—known as the ‘Fc’ region—interacts with an entirely separate set of receptors on an entirely separate set of immune cells. Typically, the two work in tandem. The Fab region binds to TIGIT-expressing cells, and the Fc region grabs onto nearby immune cells, forcing them to kill whatever the Fab region has attached to; a phenomenon called antibody-dependent cellular cytotoxicity (ADCC). Importantly, TIGIT is expressed on tumor cells and immune-suppressing cells, so ADCC was a reasonable thing to aim for.

But this could backfire. TIGIT was also expressed on the cancer-fighting T-cells that the drug is meant to support. So yes, these drugs may kill your enemies, but they will also kill your army, and the empirical net effect of this is little impact on how long a cancer patient will live. But it doesn’t need to be this way. While naturally-created antibodies always perform ADCC to some varying degree, there’s a lot more room for creativity with antibodies created in a vat: you can simply break the Fc region by mutating it. The result is ‘Fc-silent’ antibodies, which should still bind to TIGIT-expressing cells, but not kill them. Whether this would work at all was, luckily, testable. While Roche and Merck had spent billions on their Fc-active molecules, Arcus and Gilead had, in parallel, been developing domvanalimab: an Fc-silent anti-TIGIT antibody.

For most of 2024 and 2025, domvanalimab was carrying the collective hope of the entire TIGIT field on its shoulders, as essentially the last well-powered phase 3 program still running with a mechanistically distinct molecule. Starting in early 2024, the drug entered the crucial test of the Fc-silent hypothesis: a phase 3 trial in upper-GI cancers (STAR-221).

It did not work. In December 2025, the trial was halted.

And what of everyone else? The smaller bets around the edges were erased with even less ceremony. Novartis had paid $300 million in December 2021 for an option on BeiGene's ociperlimab, and in July 2023, Novartis looked at the emerging TIGIT phase 2 data across the field and simply handed the rights back, forfeiting the option fee in what turned out to be one of the better decisions any business-development team made that year. BeiGene continued on alone, and in April 2025 its phase 3 trial was terminated for futility on an interim OS analysis, ending the program. GSK's bet was the most expensive and, in some sense, the most depressing. In June 2021 they had paid iTeos, a Belgian immuno-oncology shop, $625 million upfront plus up to $1.45 billion in milestones for belrestotug. Again, zero benefit. GSK and iTeos mutually terminated the program, the collaboration, and any further enrollment in the study, all in the same press release. Two weeks later, iTeos announced that it was winding down, and was later bought by an outfit known for acquiring down-on-their-luck biotechs in hopes of selling off their parts.

Despite it all, TIGIT has not yet technically died. As one article puts it: ‘AstraZeneca becomes TIGIT’s last man standing’. Their drug is called rilvegostomig and is currently in eleven Phase 3 trials. Unfortunately, the core thesis of the drug is contingent on Fc-silence meaning anything, so it is difficult to imagine history pans out differently here.

In 2026, a BMJ Oncology analysis would give a clinical name to what had happened: “herding.” The authors estimated that nearly 49,000 patients had been enrolled in anti-TIGIT trials by pharmaceutical companies, at a cost of more than $3 billion, all because their fellow pharmaceutical companies were doing the same thing. The Fc-silent hypothesis had been tested and had failed. The Fc-active hypothesis had been tested and had failed. Combinations with every conceivable drug, across every conceivable demographic, across every conceivable cancer diagnosis—all of it, tested, and all of it, in the aggregate, failed.

Today, amongst many oncology investors and researchers, TIGIT has become close to a dirty word. Never, ever suggest touching TIGIT. It will not work.

After all this, one cannot help but ask: what had gone so wrong?

Unfortunately, the field does not yet have a clear answer. And it is unlikely there is one answer. As is often the case in biology, a target that sits at the busy intersection of many valuable things is that the very thing that makes it attractive as a target also makes it almost impossible to reason about cleanly. Perhaps TIGIT alone does not move the immune system in one direction, but instead tugs on a dense, locally contingent web of signals whose meaning changed from tumor to tumor, patient to patient. Perhaps modulating TIGIT is genuinely important, but would require the modulation of a half-dozen other targets for it to have the benefit that everyone expected of it. Perhaps TIGIT was actually transformative, but only for a very specific cohort of patient that the clinical trial apparatus is simply not built to discover at scale. Perhaps something else entirely.

What does feel likely is that TIGIT was not nonsense. The billions wasted was not an outgrowth of the publish-or-perish industrial complex, or something of the like. Genuinely intelligent theory was here, backed by years of genuinely intelligent wet-lab effort, and its eventual failure was, as far as I can tell, predicted by absolutely nobody. In fact, TIGIT was the golden child of what translational biology ought to look like. It had human genetics-adjacent plausibility, clean immunology, druggable extracellular geometry, a commercial precedent, and early clinical signal. It simply did not work.

Lots of people boil down the problem of drug discovery to toxicology, or target selection, or trial scalability. All these matter, yes. But sometimes the people behind a drug can do everything right, and it will still fail. Keytruda taught the pharmaceutical industry that the immune system had brakes, and it earned a place in the annals of cancer biology history for that. TIGIT taught the more humiliating, expensive lesson: not every brake is attached to wheels.

How financial architectures shaped (and will continue to shape) Chinese drug development

Abhishaike Mahajan — Mon, 11 May 2026 15:23:49 GMT

Note: this essay is connected to a prior one titled “Curious cases of financial engineering in biotech”. I conclude that piece with the following paragraph:

To end this off: I have deliberately left out China, which may be the most aggressive current example of financial architecture shaping a drug pipeline. That deserves its own essay, and will get one soon.

This is that essay. And to those who already know vaguely understand this area: yes, ‘NewCos’ are part of the ‘current state’ section. The future will get more complicated!

The current state of Chinese drug development

If you had to take a guess, why has China been out-licensing drugs so frequently?

I naively assumed it’s because China got very good at moving through early-stage clinical development fast and because the domestic market is simply not as good as the US’s. Neither of these are wrong, but they are incomplete, as they do not explain the timing. The speed advantage and the weak domestic market have both been true for the better part of a decade, but you really only started to hear about the out-licensing in the past few years; 2022 if your job depended on it and 2024 if not. Something else has to be doing the causal work.

And much of that “something else” has to do with finance. My claim is not that finance created Chinese biotech productivity, merely that it determined how the shape of that productivity interacted with the rest of the world. I’d like to start by discussing the contribution of one thing in particular: Chapter 18A of the Hong Kong Stock Exchange (HKEX). Many, many things can be traced back to this particular rule.

But before we wonder what Chapter 18A is, we should first ask: where did Chapter 18A come from?

The proximate cause of it was none other than Alibaba.

In 2013, Jack Ma’s company was preparing to go public, and Hong Kong was the obvious venue; Alibaba was a Chinese company and HKEX was one of the largest Asian exchanges. The problem was that Alibaba wanted to list under a specific partnership system, in which a self-selected group of twenty-eight insiders would have the perpetual right to nominate a majority of the board. Hong Kong’s Listing Rules had forbidden this sort of arrangement for three decades under a principle known as “one share, one vote.” The HKEX, after some months of public handwringing, declined to bend. So Alibaba took its IPO to New York, where arrangements like this had been legal since forever, and in September 2014 listed on the NYSE at a valuation of $231 billion. It was, at the time, the largest IPO in history.

Charles Li, the head of the HKEX, was understandably unhappy about a Chinese success story listing somewhere that was not China, and wrote this:

We respect the company’s decision and wish it well.
We are proud of our tradition of respect for the rule of law and adherence to principles.
However, we also need to find ways to make our market more responsive and competitive, particularly with respect to new economy or technology companies.
We have to consider possible changes where they might be necessary, with everything according to our due process. The Listing Committee’s work on shareholding structures didn’t start because of Alibaba and will not end now because of Alibaba.
We need to ensure our markets continue to be relevant in the new era of economic development.

Over the following four years, HKEX rewrote its rules to ensure an Alibaba situation never happened again. The product was a three-part reform package that took effect on April 30, 2018: Chapter 8A, Chapter 19C, and Chapter 18A. This last one, Chapter 18A, is what we’ll be concerned with, because it is the only one that was specific to biotech companies. It allowed, for the first time, pre-revenue biotech companies to go public, without needing to satisfy any of the standard profit, revenue, or cash-flow requirements that other Chinese companies had to.

Now, this doesn’t mean there were no requirements, but they were softened to match the financial flavor that early-stage biotechs often have. The requirements were as follows: at least one Phase I clinical trial completed with no regulatory objection to proceeding into Phase II; an expected market capitalization at listing of at least HK$1.5 billion (roughly US$192 million); two fiscal years of operating history under substantially the same management; and enough working capital to cover 125% of projected costs for twelve months after listing.

Charles Li, in the run-up to the rules taking effect, stated that he hoped Hong Kong would overtake NASDAQ in Chinese biotech listings within five years.

The initial results were spectacular.

Ascletis listed in August 2018. BeiGene, which had already listed on NASDAQ, did a secondary in Hong Kong. Innovent Biologics listed in October 2018 and quadrupled in the following year. Junshi, CanSino, Shanghai Henlius, Akeso, and dozens of others followed. By the end of 2021, the peak year, the cumulative capital raised under Chapter 18A had crossed HK$100 billion, and the number of listed companies was approaching fifty. Hong Kong had, just as Li had hoped, become the second-largest biotech listing venue in the world.

And then the biotech winter of 2021-2022 happened. By the end of 2022, of the 56 biotech companies listed under Chapter 18A, only 13 were trading at or above their IPO price. By the end of 2023, only 9 were.

The trajectory of what happened next should be quite clear. Here you have a cohort of roughly sixty pre-revenue biotech companies in one corner of the world, each holding a pipeline of clinical assets ranging from plausibly valuable to genuinely world-class, each prevented from raising equity by the collapse of its own share price, and each locked out of every other public-market financing channel due to geopolitical risk and uncertainty.

On the other side of the Pacific, US pharma companies were staring into the patent cliff, which represented somewhere between $180 billion and $250 billions of revenue at risk from drugs coming off patent by 2030, and desperately scouring the world for assets with which to fill the gap.

These two sides were made for each other. The 18A cohort had clinical pipelines and no capital. Big Pharma had capital and not enough pipelines.

Thus, the out-licensing boom you have heard so much about. In December 2022, Akeso licensed ex-China rights to ivonescimab, its PD-1/VEGF bispecific, to a small Miami-based company called Summit Therapeutics for $500 million upfront and up to $5 billion in total deal value. This was, at the time, the largest single-asset deal ever struck by a Chinese biotech. Two years later, in September 2024, ivonescimab beat Keytruda in a Phase 3, non-small cell lung cancer trial shocked the world, and every Big Pharma BD team reorganized itself around the working assumption that the next blockbuster might come from somewhere in Chongqing or Shanghai or Beijing or Guangzhou. In 2024 alone, Chinese firms out-licensed 94 projects to overseas companies, up from essentially zero a decade earlier. In 2025, the figure was 157 deals worth $135.7 billion. In the first half of 2025, roughly 32% of global innovative-drug out-licensing value originated in China, up from single digits a few years prior.

Could this have happened without 18A?

If the 18A cohort didn't exist, the Chinese biotech industry would be a collection of private companies. Most of them would still be venture-funded, with valuations set by the more conservative Chinese VC culture rather than by the initially frothy public markets that slowly cooled. As such, the urgency to monetize pipelines would be considerably lower, and perhaps there would be little reason to aggressively do transpacific sales of intellectual property to Western buyers. And most important of all: without the clearly legible financial signals that public listing—which would not have existed without 18A!—offered to Western buyers, perhaps most would be too uncertain to ever commit hundreds of millions of dollars upfront to a China-based company they had never heard of, almost certainly slowing down the boom.

On the other hand, a lot of what drove the Chinese biotech ascendancy has nothing to do with 18A and would have happened regardless. China’s primary regulatory authority for drugs, the NMPA, ran through a sequence of reforms starting around 2015 that compressed drug approval timelines from years to months and cleared a backlog of roughly 20,000 applications in two years. The Chinese CRO ecosystem, WuXi and so on, professionalized to the point that running a Phase I in China was genuinely cheaper and faster than running one in Cambridge. The talent got better too! A generation of Western-trained scientists returned to run R&D at Chinese biotechs under the Thousand Talents Plan. And on the demand side, the Western patent cliff was going to happen anyway.

So, the cleanest version of the argument is something narrower than "18A caused the out-licensing boom," which is probably too strong, and broader than "18A was a minor contributor," which is too weak. 18A did not create Chinese R&D productivity, but it did shape how that productivity interacts with Western markets. Which is pretty interesting!

And the dominoes that were set up by 18A are only continuing to fall; Chapter 18A gave legibility to Chinese biotech’s, which led to out-licensing, which surely should lead to something else. And what is that something else?

‘NewCo’s’. These days, Chinese biotech’s are getting quite good at their job now, so good that they are beginning to get a bit more interested in the ‘nearly infinite upside potential’ economics that makes drug discovery so appealing. Past that, HKEX biotech’s trade at a substantial discount to their NASDAQ-comparable peers, more or less permanently, due to intense price negotiation by the Chinese government. These two, combined with the fact that China gets upset if one of its companies sets up shop abroad, has pushed financiers into increasingly creative territory.

And a solution soon manifested. Perhaps instead of a Chinese biotech accepting cash or royalties in exchange for their precious molecules, they should instead work with American funds to set up a US-based company around those molecules, taking a big chunk of equity for themselves, with the American funds taking the rest. You could argue that this is seemingly against the spirit of China’s discomfort with its companies setting up shop abroad. I agree! But China is seemingly fine with it.

Kailera Therapeutics is the cleanest recent example of this. On the Chinese side, Jiangsu-based biotech Hengrui contributed its GLP-1 portfolio. Bain Capital, Atlas Venture, and RTW put in $400 million, a former US pharma executive Ron Renaud took the CEO seat, and the whole structure was operational within months. The US-based investors get a promising company in their portfolio; one fluffed up by the starry-eyed and optimistic US markets. And Hengrui takes equity in this US-based vehicle, which, compared to a cash payment or bounded royalty, is uncapped on the upside. Both sides win.

As always, it’s worth being a bit concerned by new and exciting developments in finance. What should we be worried about here? The obvious one is that every dollar of Western venture capital that gets deployed into a Kailera is a dollar that doesn’t get deployed into a US-originated asset, with all the obvious caveats that venture capital is not neccesarily a fixed pool where every dollar is a one-for-one displacement. Either way, it’s a rational thing to do, play the same M&A game that pharma usually does, but with the side that is actually winning. Is the long-run consequence that the US stops being good at the sort of early-stage discovery it was historically best at?

Whatever the answer is, we’ll certainly be made aware of it in the upcoming decade.

The future of Chinese drug development

Well, maybe not a decade.

Curious cases of financial engineering in biotech

Abhishaike Mahajan — Mon, 27 Apr 2026 12:29:18 GMT

Note: finance topics are slightly sensitive, so, while nothing in this article contains proprietary information, I will not include the names of people I talked with for this piece. I appreciate everyone who reached out to help me put this together!

Edit: The follow-up to this article has been published: How financial architectures shaped (and will continue to shape) Chinese drug development.

Introduction

For $250 million and ten years of your life, you may purchase a lottery ticket. The ticket has a 5% chance of paying out. When it does pay out, it pays roughly $5 billion. A quick calculation will show you that the expected value of the ticket is $250 million. This is essentially what drug development is. Or rather, it’s what drug development was, twenty years ago. The upfront payments have been climbing, the hit rates falling, and expected values have, at best, held flat. Should you buy a ticket?

Perhaps not. In fact, any reasonable player should have long since stopped playing this stupid game. Unfortunately, we still need drugs. People have cancer, and heart failure, and Alzheimer’s, and a thousand genetic diseases that nobody has ever heard of, and the only industry on Earth currently set up to do anything about any of this is the same industry running the lottery-ticket business described above. The game is dumb and we need it played anyway.

So the real question is not whether to play, but how to make playing less awful for those involved. And the answer, increasingly, is ‘financial engineering’: a set of structural tricks that let people hold more tickets than they otherwise could, or buy a fraction of the winning tickets after they’ve been drawn, or some other strange, clever thing that all financiers find obvious and everyone else has never heard of. All this, done to trade and barter over the risk inherent to the whole enterprise, slicing it into pieces small enough that someone, somewhere, is willing to hold each one in exchange for something.

I’ll walk through a handful of these, the people who invented them, and case studies involving the tactic. And at the end, we’ll ask the question of whether all these tricks are, in aggregate, altering what the pharmaceutical industry decides to value.

The first such trick, and the one that perhaps kicked off the start of the whole effort, was dreamed up by a man named Andrew Lo.

Finance tries to make failure survivable: the Andrew Lo thesis

Andrew Lo is a finance professor at MIT's Sloan School of Management. Among all TED talks that have ever been produced, there are few worth watching. Andrew’s talk, which has the wonderful title ‘Can Financial Engineering Cure Cancer?’, is one of them:

I recommend you listen to the full thing, because it really is quite good. If you’re strapped for time, the core thesis is as follows:

Individual drug programs fail about 95% of the time. But this doesn’t mean the expected value of a single program is necessarily bad. As I said at the start: a 5% shot at a $5 billion blockbuster against a $200 million development cost is technically positive EV on paper. But this implies that you need to be able to survive the 95% of outcomes in which you lose everything, and most investors, reasonably, will not.

Lo's insight, published in a 2012 Nature paper, was simple. Just bundle 50 or so drug programs into a single entity, one with a war chest of $5 to $15 billion, and roll the dice. The individual drug programs are still terrible standalone bets, but if they're sufficiently uncorrelated, at least one is almost guaranteed to hit, and it will hit big enough to pay off all the programs that failed. Which means you can keep playing, forever. Of course, the ‘uncorrelated’ bit is the ‘spherical cow’ part of all this. It’s impossible to do it perfectly, but it can be done well enough for risk to fall dramatically.

There’s an extra layer of complexity here about how if you can get the portfolio risk to be low enough, you can issue debt against the portfolio to sell as bonds, which unlocks a much larger pool of non-venture capital who want more stable returns. This is arguably the most interesting thing that Andrew believed in, but this particular bit never really went anywhere. We’ll discuss the obvious ‘why not?’ question at the end of this section.

The direct descendant of this whole thesis—at least the ‘drug portfolio’ part—is BridgeBio Pharma, founded in 2015 by Neil Kumar, who was Andrew’s student at MIT. It is structured almost identically to Andrew’s original thesis: a central holding company that creates subsidiary companies, each focused on a single rare disease. Each subsidiary has its own equity structure, its own management team, and 1-2 drug programs. If a subsidiary's drug fails, it dies, but BridgeBio survives. If it succeeds, the parent holds enough equity to capture massive upside. The company IPO’d in 2019, is now worth billions, and has a pretty good stock trend for a biotech.

There are spiritual cousins as well, such as Roivant Sciences, founded in 2014 by Vivek Ramaswamy. It has a nearly identical corporate structure to BridgeBio—what’s come to be known as a ‘hub-and-spoke’ model—but whereas BridgeBio does de novo drug development in rare diseases, Roivant in-licenses drugs that big pharma has abandoned for non-scientific reasons: portfolio reprioritization, executive turnover, M&A reshuffling, quarterly earnings pressure. There are lots of these molecules floating around, and if you hire good enough people, you have the ability to spot them before anyone else. Roivant went public in 2021 at a $7.3 billion valuation, and its subsidiaries have completed twelve consecutive positive Phase 3 studies. And it has an even better stock history!

This solves the fundamental problem of biotech, no? Really, in retrospect, it’s astonishing that we let anybody create a non-hub-and-spoke biotech. You have a set of bets, each one of which is individually stupid, and then you put them in a bag, and the bag becomes smart by virtue of each bet being insanely high variance. It is the obvious thing to do.

Unfortunately, upon trying this out, we will run into two big problems. The first one is that running many drug programs at the same time is really hard. And the second one is that people know running many drug programs at the same time is really hard, and they will price any attempt to do so accordingly.

An exemplar of the first lesson is Centessa Pharmaceuticals. Centessa was founded in late 2020 by Medicxi, a life-sciences venture firm, as another implementation of this thesis: ten private biotech companies, each with its own single asset, combined under one holding entity, taken public in May 2021 at $20 a share. Though they are often held up as paragons of the Andrew Lo thesis (including by me!), Roivant and BridgeBio weren’t real hub-and-spoke enthusiasts. Centessa was. Whereas Roivant in-licensed abandoned pharma assets and BridgeBio concentrated almost entirely on rare genetic disease, Centessa bravely stuck to the Lo script: a portfolio of genuinely uncorrelated clinical risk. Their spokes covered: hemophilia, oncology, pulmonary hypertension, narcolepsy, fibrotic disease, autoimmune disease — if there was any correlation risk, it was that drug development was occurring at all.

The model did not work. Within eighteen months Centessa was shutting down spokes. By 2023, they had abandoned the hub-and-spoke model entirely and pivoted to a single-asset company focused on orexin agonists for sleep disorders. That pivot, to be clear, worked spectacularly. Lilly bought them for $6.3 billion in early 2026, making Centessa one of the more successful biotech exits of the decade. But they got there by becoming a single-asset company. What had gone so wrong with the original thesis? The surface answer is a mix of capital and luck. Several spokes failed on their own merits, and the 2022-ish biotech market crash closed the door on funding whatever was left. Centessa shareholders ended up all right in the end, but hub-and-spoke models are empirically not silver bullets for the hard problem of drug development.

The second problem here is that people simply may not believe in your so-called ‘uncorrelated risk portfolio’. This will obviously happen when you raise money to pursue the venture, and it will, surprisingly, happen again once you go public.

As an example: did you notice that big drop in BridgeBio’s stock in late-2021? This is when their lead candidate acoramidis—a treatment for a rare heart condition called transthyretin amyloid cardiomyopathy—failed to beat placebo on its primary endpoint in a Phase 3 trial. The stock dropped 72% in a single day. This was not the tidy portfolio-theory response. The rational response would be “well, BridgeBio has four other clinical-stage programs and $800 million in cash, so the diversified portfolio thesis should protect us." The market said "holy shit, the lead asset is dead, the portfolio theory behind this company is nonsense, sell it," and priced that sentiment accordingly.

The funny part of this all is that BridgeBio kept running the trial. The 12-month primary endpoint had failed, but the study was designed to run to 30 months, with a harder secondary endpoint: death and cardiovascular hospitalization. In July 2023, the longer-term data read out, and acoramidis worked, with the secondary endpoint being met. The stock surged 76% in a day, BridgeBio eventually won FDA approval, and the drug—now on the market—is called Attruby. Stressful!

Well, that’s that. But we should return to Andrew Lo for a second. The part of Lo’s idea that did not arrive, at least not in its original form, was the bond-market part. Why has no one implemented what was arguably the most clever part of his pitch: issuing debt against your drug portfolio, allowing you to access vast sums of institutional, low-risk capital?

Well, to some degree, someone has, but only for approved drugs. BioPharma Credit is one such institution, and makes secured loans to commercial-stage biotechs, typically collateralized by the revenue stream of one or more approved products.

But nothing like this exists for clinical-stage stuff. Why not? Happily, Lo himself offered an answer, almost a decade after his first paper. For one, biotech is simply not used to that type of financing so they don’t do it, and two, the extreme scale of financing that this unlocks has simply not yet been needed, so nobody can raise it. But the third most important point is a lack of institutional support. There is no biomedical Moody's—no quantitative, authoritative voice that can tell a pension fund how risky a portfolio of drug assets is. And even if there were, there is no biomedical Fannie Mae—no government-backed entity that acquires biopharma loans and securitizes them into something an institutional allocator would actually buy. Our field exists in the same state that mortgages were in the 1930s, which were considered too risky for banks to buy until the federal government created these two pieces of infrastructure to make it safe.

But, Lo posits, the need for capital eventually changes behaviors, biology is poised to only grow far larger than it is today, and models for drug portfolio risk adjustment are only getting better. Four years after the paper, I am unsure whether much has changed, but we’ll see what the future holds.

Finance makes future success tradable: royalties and synthetic royalties

Drug royalties are pretty simple. You discover an interesting target or chemical, but don’t want to bother with developing it further. So you pawn it off to a big pharmaceutical company with a lot of resources, alongside a contractual agreement that you’ll receive 3% of net sales if a drug based off your work is eventually approved and commercialized. And like any contractual agreement, it can be bought and sold.

Royalty Pharma, founded in 1996 by Pablo Legorreta, is the largest company in this market and possibly the purest expression of financialized drug development that exists. It has no labs, no therapeutics arm, and no ambition to discover drugs itself. It buys royalties, from universities, academic medical centers, small biotechs, individual inventors, and holds them. The portfolio includes claims on 7 of the 30 most-prescribed drugs in the United States. It reported $2.38 billion in revenue in 2025 from what is, spiritually, a filing cabinet.

Is this rent-seeking? If you look at the details, it actually feels pretty fair to all parties involved. A university that has a royalty over some particular drug developed by a professor has no ability—or desire!—to forecast its chance for success, its revenue if approved, or how to hedge the risk that a competitor enters the market. It also very likely has a preference for less money today than more money over the ten years of a drug’s exclusivity period. Royalty Pharma and its competitors have the opposite preferences and all the abilities the university lacks. The university gets liquidity and certainty; Royalty Pharma gets a claim on an approved drug's revenue stream at a discount to its expected value. Both sides win.

But the more interesting recent development is the rise of synthetic royalties.

A traditional royalty is a pre-existing legal right. It exists because someone did the original research and negotiated a licensing agreement. A synthetic royalty is different. It’s a manufactured financial claim on future drug revenues that didn’t previously exist. Consider an example: a biotech company has a drug in clinical development, one that it owns entirely. It needs money. It doesn’t want to issue equity (dilutive) or take on debt (requires collateral). So it invents a drug royalty from scratch, an entirely new obligation that did not previously exist, and sells that. Now they do not own the drug’s IP entirely, some other party owns 3% of the future sales of it if it ever succeeds, and the biotech gets non-dilutive capital today.

What’s the difference between these increasingly complex synthetic royalty agreements and typical, bespoke pharma deals? They feel similar. And yes, they are functionally equivalent in terms of cash flow or deal structure. Where the difference lies is in each party’s intent. In typical pharma deals, the buyer cares about something strategic, say, operational control over a drug’s development journey. Buyers of royalties, synthetic or otherwise, do not care about that. They care entirely about the probability-weighted present value of the future payments, and you can imagine how useful this decoupling of capital from often burdensome partnership demands can be.

The royalty market is, in some sense, a secondary market for the financial value typically embedded in pharma licensing agreements. And it's still early.

One report found that there were 102 major royalty transactions from 2020 to 2024, noting that synthetic royalties are growing at 33% annually. The buyer pool includes not only royalty-centric funds like Royalty Pharma, but increasingly pension funds and private equity as well. The same institutions Andrew Lo wanted to be in on biotech are getting in on the game, just in a different way.

This whole class of synthetic royalties is growing more complex over time, with some even including milestones built into the sold contract, such that the seller (the biotech) receives even more upfront capital upon the achievement of Phase 2 trial success or outright drug approval. The whole concept is also growing physically larger. In June 2025, Royalty Pharma and Revolution Medicines announced a $2 billion funding agreement—$1.25 billion of which was structured as a synthetic royalty—to fund the development of daraxonrasib; the largest ever transaction in this particular asset class.

But at the same time, within synthetic royalties, you can see the beginnings of a financial instrument that is strange enough that one cannot easily predict its second- or third-order effects. Pharma partnership agreements can be burdensome in the demands they make, but they are at least ‘time-bounded’ in ways that are easy to plan for. Synthetic royalties follow a company around forever, as long as a drug is under patent, actively extracting value all the way, their only contribution being an initial surge of capital. This is nobody’s fault of course, least of all the royalty holders. ‘We are selling to willing buyers at the current fair market price’ and all. But the cumulative effect, as more drugs carry more synthetic royalty obligations, is a pharmaceutical economy where an increasingly large fraction of every dollar of drug revenue is pre-committed to financial intermediaries before the drug reaches a single patient.

But it’s not as cut and dry as ‘synthetic royalties are bad' because of this. Consider the Revolution Medicines case from earlier. Their drug daraxonrasib has a strong chance of being a blockbuster, and so scaling global commercialization will be enormously expensive. An equity raise would have diluted ownership right before value-inflecting Phase 3 readouts (which were excellent!), traditional debt at that scale would be impractical, and a pharma partnership would surrender commercial rights to what could be a decade-long franchise of label expansions. The synthetic royalty allowed Revolution to sidestep all three, largely as a result of the royalty being tiered, decreasing with sales volume, and dropping to zero above $8 billion in annual net sales. If daraxonrasib becomes a true blockbuster, the royalty burden effectively caps out and becomes negligible as a percentage of revenue.

But why would Royalty Pharma agree to this at all? Isn’t this clearly not in their favor? Not at all: they likely just did the numbers, and anything above some certain threshold in yearly sales is both unlikely and unneeded for their portfolio math, so they are happy to give the tail scenario away for free.

All of this, only possible because there is an entity willing to buy a manufactured claim on future revenue that didn't exist until someone decided to create it. The royalty market shows the basic pattern: once a future drug cash flow becomes legible, someone will turn it into a security.

Finance rewrites the incentives: PRVs and CVRs

What we’ve discussed so far assumes some degree of intentionality. Andrew Lo purposefully came up with his thesis, Royalty Pharma deliberately built a business around drug royalties, and so on. But there are two particular financial instruments that were intentionally designed at the start, but have slowly begun to display an unpredictable life of their own once deployed. I’d like to discuss them because I think they do a great job in demonstrating not only how tradable instruments in finance can have material impact in how drug development works, but also how those impacts can be very difficult to predict in advance.

The two are PRVs, or Priority Review Vouchers, and CVRs, or Contingent Value Rights.

We’ll start with PRVs.

In 2006, three professors at Duke published a paper titled “Developing Drugs for Developing Countries". In it, they discuss a well-trodden problem: infectious and parasitic diseases create enormous health burdens in the developing world, but because the people suffering from them are poor, there's essentially no commercial incentive to develop treatments. Of course, ideally there would be some way to incentivize for-profit companies to do it. But financial incentives require money, and money requires Congress, and Congress requires political will that rarely materializes for diseases affecting people who can't vote in U.S. elections.

The fix, the authors argued, is to use a logistical incentive instead. If you are willing to develop a drug for a neglected disease, the government ought to help you out somewhere else in your drug portfolio.

How? By offering you a PRV. But what use is the PRV? Once a pharmaceutical company has wrapped up their clinical trial work and submits an application to the FDA for official approval, they must wait 10 months for FDA review. But if they submit this one-time-use-only voucher alongside the application, FDA should be forced to give you a review within 6 months. And just in case you don’t actually have an internal portfolio of drugs to allocate this PRV to, the voucher should also be sellable. Four months of time-value of earlier market entry for a ‘top-decile’ drug can be worth an awful lot, around $300 million according to the authors.

You can imagine a very neat feedback loop from all this. For instance, a small nonprofit or academic group develops a river blindness treatment, receives a voucher, and can then sell the voucher to Pfizer to use the proceeds to fund more neglected disease work.

In a rather astonishing act of ‘listening to healthcare economists’ that I don’t believe ever occurred thereafter, Congress enacted the program in 2007, just a year after the paper’s publication. It expanded again in 2012 to include rare pediatric diseases. And again in 2016 to include medical countermeasures against biological/chemical/radiological threats.

There are two things I find very interesting about PRVs.

The first is that, as the title of this section implies, the PRVs have gained secondary market price dynamics that its creators never intended. The buying cost of a PRV at any given moment is a function of how many are floating around, how many blockbuster drugs are approaching FDA submission, the competitive landscape, and whether Congress has recently done something to expand or contract the program. AbbVie paid $350 million for a single voucher in 2015—the all-time high, driven by the voucher being the only one out there and that their competitor was releasing a similar drug to theirs. Novartis picked one up in 2023 for $21 million—the all-time low.

How did Novartis get one for so cheap? Funnily, that particular story also illustrates the increasingly complex financialization of biotech quite well. When Novartis licensed a particular drug to a particular biotech back in 2019, it baked in a “pre-agreed, contractually defined percentage of the PRV value” into the licensing agreement four years before the voucher existed, and, in fact, before the drug itself had even been approved. And when the biotech got the drug approved and received the voucher in 2023, Novartis simply exercised the option to purchase it for a ridiculously low value.

Imagine being the person behind that deal!

The second thing, even more interesting is that the whole program has increasingly begun to bear no fixed relationship to the social good it was meant to incentivize. Why? Because even at the voucher's peak secondary market value of $350 million—though it usually oscillated around the $100M mark—it was not enough to shift a large pharma company's portfolio allocation in any meaningful way. In the few cases it did, it shifted it towards doing the absolute, most bare-minimum possible thing: approval of the drug, not utility of the drug. The voucher pays for the regulatory event, not the public-health outcome. In a great paper titled ‘The priority review voucher: a misconceived quid pro quo’, the authors say this:

…the PRV, except few examples, has largely failed to deliver medical benefits for patients suffering from neglected diseases because it rewards obtaining FDA marketing authorisation without regard for the products actually being available, affordable and equitably accessible for people.

Now, it would be lying to tell you that PRVs have not helped anyone. They have! But there have been enough cases of bad behavior here that it is worth wondering if there is something better that is possible. This is, in fact, being worked on, but it takes us off topic, so I’ve put some details about it in the footnotes1.

The second financial instrument I want to discuss is the CVR, or Contingent Value Right.

CVRs are simple. When an acquirer and a target company cannot agree on what a drug-in-development is worth—which is most of the time—they structure a simple conditional payment. If the acquired drug(s) hits a specified milestone, the acquirer pays the target's former shareholders an additional sum. Most CVRs are structured like normal pharma partnerships, as in, a closed, non-transferable contract between two partners. A small minority of them are structured as tradable securities, listed on the NYSE or Nasdaq with their own ticker symbols, but these aren’t particularly special beyond their raw size.

What is most interesting about CVRs is the perverse incentives they sometimes create.

When Sanofi acquired Genzyme for $20 billion in 2011, Sanofi issued CVRs tied to the regulatory approval and commercial success of Lemtrada (alemtuzumab), an MS drug that Genzyme had been developing. Up to $3.8 billion was on the table if the drug hit its milestones. But Sanofi was also simultaneously developing its own MS drug, Aubagio. Aubagio had no CVR obligations attached to it.

Sanofi was now contractually obligated to compete vigorously against itself, on behalf of strangers, for free. Predictably, it did not.

Obviously, Sanofi was sued for this. The former shareholders alleged that Sanofi deliberately slow-walked Lemtrada's FDA submission and under-invested in its commercialization to minimize CVR payouts. But deliberate sabotage is hard to distinguish from ordinary sluggishness. Sanofi settled in 2019 for $315 million—well short of the $708 million in missed payouts the shareholders claimed—without admitting wrongdoing.

The pattern repeated more recently and at even larger scales in 2019, with BMS's $74 billion acquisition of Celgene, in which $6.4 billion in CVR payouts hinged on three drugs hitting FDA approval by fixed deadlines. Two were approved on time. The third missed by thirty-six days. As a result, the entire CVR expired worthless. As you may expect, former shareholders again sued.

If we were to generalize this, the structural problem is simple: a CVR can make the buyer responsible for creating a payout that the buyer would rather not pay. But if that’s the case, why are CVRs—which are accelerating in their popularity—done at all? For one, the above case studies are very much not the norm, most go on perfectly fine. And two, the value of CVRs as a coordination mechanism, even when they go wrong, empirically outweighs the later headaches they cause.

Finance reaches failure itself: zombie biotechs

Royalty Pharma is not the only player in the royalty space. There are a few others, one of them named XOMA Royalty. XOMA is especially interesting, because it was once a traditional biotech company that developed and licensed drugs. And in 2017, it pivoted to become a royalty aggregator. And starting in 2024, it began to poke at the business of buying up, and liquidating, ‘zombie biotechs’.

Zombie biotechs are publicly traded companies whose stock price is below the cash on the balance sheet. This translates to investors saying that their IP, patents, clinical data, team, all of it, is not only worthless but is actively destroying value by burning through cash that would be better deployed sitting underneath a bed. Roughly 300 companies fit this description in mid-2024, most of them casualties of the 2020-2021 IPO bubble, when a lot of biotechs went public that had no business doing so.

These companies can’t raise equity (who would buy?), can’t take on debt (against what collateral?), and can’t be bought/merged with anyone (who would want them?). In an ideal world, the founders would simply put the whole business out of its misery, but they are collecting a paycheck anyway with their dwindling cash reserves and closing down a public company is a surprisingly legally fraught thing to do. So they just wander around as zombies.

XOMA’s insight was that this particular purgatory may itself be an asset class. They step in, acquire the company at or below cash value, and return cash to the shareholders who have been trapped in a slowly deflating stock for years. Then, they take a close look at everything the company created—patents, clinical data packages, licensing rights, partially completed regulatory filings—and sell it, keeping the profits for themselves. Or simply hold it, just in case it’ll be useful elsewhere in their portfolio.

The concept itself is not new. This is vulture investing, translated into biotech. But whereas a typical vulture investor’s goal is to flip an entire company onto someone else, the biotech vulture capitalist’s hope is to sell off pieces of the company. And the pieces can be surprisingly valuable. A drug candidate that failed a Phase 2 trial in one indication can be worth millions to, say, some of the hub-and-spoke companies we discussed earlier. Maybe Roivant believes that the endpoint was misspecified, or the indication was wrong, or that the drug is indeed useless, but that the PK/PD, safety signals, biomarker responses, regulatory responses, and dose-response curves uncovered during the trial are useful, and they’d be willing to pay vast sums for that data. What XOMA does here is make this information legible to potential buyers.

To help illustrate this, let’s consider a case study: Kinnate Biopharma.

Kinnate was an oncology company developing kinase inhibitors for cancer patients with specific genetic mutations. As the story goes for many companies of that era, they went public and by early 2024 were trading below their cash balance. There was no outright clinical trial failure, they simply ran out of money to develop their drugs further. In February 2024, XOMA announced it would acquire Kinnate for roughly $2.50 per share in cash, or $126 million. Then, over the next year, XOMA sold all five of Kinnate’s pipeline assets to other companies. In April 2025, they announced the completion of these sales, with terms entitling XOMA to up to $270 million in upfront and milestone payments, plus, keeping to their name, ongoing royalties ranging from low single digits to mid-teens on commercial sales. Kinnate’s shareholders received most of the upfront payment, and XOMA got to double its money in flipping the assets.

What would the counterfactual be if XOMA had not stepped in? Kinnate would’ve continued to bleed cash until it ran out. At that point, the IP would have been worth even less—the utility of biological information depreciates fast!—and the shareholders would have gotten back even less, perhaps nothing at all.

There is another player in this space worth discussing: Kevin Tang, through Tang Capital and its acquisition vehicle Concentra Biosciences. By mid-2025, Concentra had become one of the busiest buyers in biotech, making repeated bids for distressed public companies, with the explicit intention of closing them down, selling whatever assets could still be sold, returning some cash to shareholders, and keeping whatever spread remained.

Isn’t this quite similar to XOMA? Yes, both XOMA and Concentra are buyers of distressed, sometimes very clearly, biotechs. But the difference is when they arrive. XOMA typically shows up at the doorstep of companies that are clearly on death's door. But Concentra often arrives earlier, while the public company is technically alive and its board is still weighing bad alternatives: reverse merger, dilutive financing, slow wind-down, strategic review, or sale. And Concentra aggressively attempts to force the boards hand into a sale to them.

To be fair, ‘force’ is a bit strong of a word here. A better term would be ‘an offer they can’t (easily) refuse’. Concentra’s pattern is to accumulate a large minority stake, make an unsolicited bid, and dare the board to explain why shareholders should keep funding the burn instead of taking cash now.

Why can’t they refuse it?

Consider Jounce Therapeutics. In February 2023, Jounce announced a reverse merger with Redx Pharma, alongside a 57% workforce reduction. This was not exactly a happy ending, but it was at least a biotech ending: Redx’s pipeline would become the core of the combined company, Jounce shareholders would own a minority stake, and some version of the organization would continue to exist. Then Concentra appeared with an offer that promised even more liquidation to the shareholders, but one that would completely strip-mine Jounce to sell off as parts.

Tang is not doing anything illegal here, nor are boards literally compelled to accept every higher bid that comes along. But once a company has put itself in sale mode, the board starts to look less like a steward of a scientific project and more like an auctioneer for whatever value remains. This creates a bleak asymmetry. A reverse merger can be better for the people inside the company, better for the local biotech ecosystem, and perhaps even better for the vague moral category of “letting the science continue.” But that is not the job of the board to further. Their job is to ensure the shareholders are best served, and for them, Concentra’s highly liquid offer is difficult to argue against. In Jounce’s case, the Concentra transaction also came with an 84% workforce reduction. The board went with the Concentra offer.

Curiously, there are ways for companies to fight back against Concentra, and fight back they have. Their weapon is colloquially referred to as a ‘poison pill’, and goes as follows: if Tang keeps buying shares and crosses a threshold, usually around 10%, then every other shareholder receives the right to buy more stock at a discount, instantly diluting Tang. This does not resurrect the company, and it does not make Tang go away. It simply prevents Tang from buying enough stock in the open market to make liquidation feel inevitable before the board has themselves decided it is inevitable.

This is all quite interesting. But it is likely a transient phenomenon. The zombie biotech liquidation market is a finite resource; the 300 companies trading below cash are overwhelmingly a product of the 2020-2021 vintage, a specific historical moment when the bar for going public was unusually low. That cohort is being worked through. Some will be acquired by the players discussed here. Others will manage to raise capital and survive. Most will simply wind down on their own, returning whatever cash remains to shareholders without the intermediation of a vulture buyer. Unless there’s another IPO bubble of comparable scale soon, the supply of zombie biotechs will shrink over the next few years, and the opportunity that is currently being exploited will narrow.

So why do I mention this at all?

The zombie biotech business is worth dwelling on because it marks a kind of endpoint. Whereas every other instrument in this essay financializes drugs that might still become therapies, these are different. XOMA financializes drugs that won’t, and Concentra financializes drugs that likely won’t. If the frontier of financial creativity has reached the dead and dying, it tells you something about how thoroughly every other surface has already been colonized.

Conclusion: what does finance teach biotech to value, and should we worry?

Andrew Lo’s original insight was not that finance could make drug development easy. Nothing can make drug development easy. His insight was that finance might make failure survivable. I think this is directionally correct. Financialization is just the process of making implicit economic relationships explicit and tradable, and more liquid markets for biotech risk are almost certainly better than fewer. And what has happened since Lo’s 2012 paper is that financial engineering has been applied not just to the drug portfolio problem, but to every conceivable surface of the drug development process: partnerships, mergers, royalties, and even the death of a company.

Is this a bad thing? Probably not. Objecting to the decoupling of finance from therapeutic value is a bit sentimental, since, in theory, financial incentives should track therapeutic value. But how confident are we about that? Are we boiling a frog here? And if we are, what exactly is the frog?

Like I said at the start, it is important to understand that financial engineering is happening for a reason: this whole industry is excruciatingly difficult to build something in. It’s only getting worse too. Starved of capital, clever people will figure out ways to offer it in increasingly exotic forms to increasingly desperate scientists or companies, and little can prevent these two from finding each other at a bar. The alternative to a financialized biotech industry is not some prelapsarian era of pure scientific inquiry. It is the same industry, with the same problems, but less money and fewer ways to deploy it.

But let’s say we are being idealistic here. What, then, should worry us about financialization squishing its way deeper into drug development? I’m happy to raise my hand first: I'm a little worried about whatever private-equity did to hospitals happening, in slower and less visible ways, to molecules themselves. Yes, I realize the nature of drug discovery imposes a constraint that most financialized industries don't have: the thing has to actually work. The FDA is a binary filter that no amount of financial engineering can route around, and as long as that's true, the typical finance-driven enshittification story shouldn’t apply here.

But "working" and "mattering" are not the same thing. For instance, you’ll notice that both Roivant’s and BridgeBio’s drug pipelines share a similarity: a focus on rare diseases. Finance people love rare diseases. Small trials, clear genetic etiology, often no existing standard of care, accelerated approval pathways, and excellent unit economics. This is fantastic for the several hundred, perhaps several thousand, patients helped by this work, and I don’t intend to minimize it. But would GLP-1s come out of this process? Would PrEP?

This doesn’t have to be a big deal. All of these could coexist. Big pharma and startups continue to have high variance bets, the financialized folks stay low variance, they work together when needed, the world is at peace. But capital is finite, and drug development keeps getting more expensive and less predictable. My worry is not that BridgeBio and Royalty Pharma are doing something bad. They aren’t, and are in fact doing something very good. The worry is that they are doing something so legible, so well-suited to the preferences of the capital markets, that the money increasingly, naturally flows to them and nowhere else.

Is this a real worry?

On one hand: obviously not. The sort of financialized rare disease work presented here may look quite good, but it still makes up an extremely small portion of biotech funding—around 2%. And it is not like Roivant or BridgeBio are poking at some genuinely undiscovered alpha. They are about a decade old, and despite their success, still don’t have many peers. Maybe this market is self-limiting. Maybe there are only so many BridgeBio-shaped opportunities in the world, and the rest of the biotech-earmarked dollars must go towards the higher-variance stuff.

On the other hand, the counterargument is the patent cliff. Between 2025 and 2030, patents for nearly 200 drugs are set to expire, including roughly 70 blockbusters. More than $300 billion in revenue is at risk, or about one-sixth of the industry’s annual revenue. Patent cliffs are normal, but this one is unusually large, weighing in at three times the size of the cliffs of the 2010s in lost revenue. Five of the top 10 pharmaceutical firms face a potential hit exceeding 50% of their current revenue.

What changes after an event like that? Perhaps Big Pharma will increasingly look towards easier, lower-risk/lower-reward diseases. Maybe they’ll be increasingly sympathetic to royalty and synthetic royalty funding agreements, further cutting into the economics of a drug. Maybe this leaks over into the public markets, and the diffuse preferences of a thousand allocators would rather fund the pharmaceutical companies who go down that path, instead of continue with the status quo.

The frog is not any single drug or company. It is the industry’s willingness to fund biology that is illegible, expensive, and likely to fail, which is to say, the kind that occasionally changes the world. Again: financial engineering did not create this problem—that fault can be attributed entirely to R&D productivity decline. In fact, the financiers may even be an especially brave vanguard in giving biotech the veneer of being a viable asset class. But they still may wind up making the response to the underlying problem worse by offering a way to achieve returns in ways that slowly diminish our institutional capacity to create the next generation of revolutionary medicines.

To end this off: I have deliberately left out China, which may be the most aggressive current example of financial architecture shaping a drug pipeline. That deserves its own essay, and will get one soon.

Edit: The follow-up to this article has been published: How financial architectures shaped (and will continue to shape) Chinese drug development.

In June 2025, the CNPV (Commissioner’s National Priority Voucher) was announced by FDA Commissioner Makary, and represents a brave new direction of the concept: a non-transferable voucher that can be used for a 1-2 month review period and is awarded based on alignment with “critical U.S. national health priorities.” What does this mean? Nobody knows!

What we do know is that 18 vouchers have been awarded so far, 4 products have been approved through the program, and the whole thing has basically zero external visibility. If you go online, there is a lot of distaste about the whole thing, including two lawmakers who expressed that the program could “enable corruption by creating a new, lucrative gift for drugmakers and allies politically favored by President Trump.” I get it. But I think there is actually some utility in drug approval processes that are bespoke enough to let the federal government both accommodate practical constraints—manufacturing limitations, supply chain fragility—and extract concessions like price adjustments in return for regulatory speed. Obviously not ideal that such a program exists in the context of the volatile current administration, but I’m not especially opposed to a ‘we’ll fast-track good stuff through an opaque review process’ setup.

The printing press for biological data (Sterling Hooten)

Abhishaike Mahajan — Mon, 20 Apr 2026 14:11:56 GMT

Watch on Youtube, Apple Podcasts, or Spotify.

Introduction

After having written long-form essays over a weirdly diverse number of areas of the life-sciences, I am increasingly confident in my status as someone who knows a little about a lot of things. But every now and then, you meet someone who casually reveals to you an entire subfield who, up until your conversation with them, you’d never even thought of before. This happened to me when I met Sterling a few months back. We met in the elevator as we were both leaving an event, and by the time we’d reached the bottom floor, the conversation had become so interesting that we stood in the lobby for an hour as I pestered him with more and more questions.

Sterling runs a company called Iku Bio. Iku ostensibly does something quite simple: it helps biologics manufacturers figure out what to feed their cells. This is called media optimization, and it is done in an astonishingly old-fashioned way. An engineer runs a handful of experiments in a benchtop bioreactor the size of a Fiji water bottle, waits days for analytical results, and repeats, maybe three or four times before timelines force them to stop searching.

Sterling’s solution was to use printed circuit boards (PCBs)—the same green wafers inside your phone and your microwave—as the substrate for microfluidic bioreactors. Because PCBs are made via lithography, you get complexity for free. Because they’re already mass-manufactured at planetary scale, you inherit sixty years of cost optimization. And because they’re literally designed to carry electrical signals, you can embed sensors directly into the thing rather than cramming them in after the fact.

The result is a device that costs $8 per experimental lane versus $20,000 for the nearest comparable microfluidic system. And there are many, many ways for to improve from here on out.

This conversation covers the full stack: what cell culture media actually is and why it’s so much more than sugar water, why biologics manufacturing has more in common with semiconductor fabs than chemistry labs, how Sterling arrived at PCBs, and at the end of the talk, why he thinks a fair bit of lab automation is “philosophically a crime.”

Timestamps

[00:00:48] Introduction

[00:01:26] What is Iku Bio?

[00:05:00] Media optimization as the biggest lever

[00:06:23] What actually is media?

[00:13:07] Fetal bovine serum and the move to synthetic media

[00:15:10] Walk me through a media optimization workflow

[00:18:49] Why biologics manufacturing is closer to semiconductors than chemistry

[00:21:50] Matching the phase three batch and generics

[00:24:12] The 200-dimensional search space

[00:37:02] Printed circuit boards as a medium for microfluidics, and the utility of lithography

[00:40:48] Anatomy of the Iku device

[00:57:09] What sensors are on the device today?

[01:01:36] How do you use the Iku device to perform media optimization?

[01:14:44] Does media optimization survive scale-up?

[01:24:32] $8/lane vs. $20,000/lane: the economic utility of Iku’s device

[01:32:05] Why PCB microfluidics didn’t exist 10 years ago

[01:39:24] Who is the customer?

[01:43:14] What is the ultimate goal of Iku?

[01:49:07] What does the validation evidence need to look like?

[01:52:14] What would you do with $100M equity-free?

[01:57:31] Lab automation is in a strange place right now

Transcript

[00:00:48] Introduction

Abhi: Today my guest is Sterling Hooten. Sterling is the founder of Iku Bio, where he is building a microfluidic bioreactor built on a printed circuit board that cultures, senses, and streams biological data in real time, claiming 10,000x higher experimental throughput at a 100x lower cost. It is one of the most niche areas of wet lab automation that I think I’ve ever discussed on this podcast, and I don’t think I would’ve ever learned about it had I not stumbled across Sterling at an event a few months back where we had a conversation that was so fascinating that I immediately wished we had filmed it. Sterling, welcome to the podcast.

Sterling: Thank you for having me. Very big fan. Really enjoy your articles.

[00:01:26] What is Iku Bio?

Abhi: Thank you. So I’ve given a brief introduction of what you’re working on at Iku, but I’m sure I oversimplified some things. I’d like to hear your own pitch for what you’re doing there and why is it so valuable.

Sterling: So the largest problems of the 21st century — things in medicine, for climate, for material optimization — all of these are predicated on our ability to manipulate and control living matter. So advancing our understanding of biology is just so fundamental to these problems in the future, and yet the tools that we use right now to interact with biology are primitive. They’re primitive in an absolute sense, and they’re primitive in a relative sense to what we could be doing. At its core, biology is time varying, it’s parallel, and it’s sensitive. And yet the tools that we use right now — that interface destroys at least one of those properties. And in principle, advances in AI also would be an excellent connection with biology. But that interface is fundamentally broken. So lab automation right now is stuck at the Petri dish and the microtiter plate level. It’s equivalent to handwriting manuscripts in the 15th century, sometimes. And so what we’re building is a printing press for biological data. And the way that we’re doing that is we’re rethinking that interface between compute and biology, and we’re replacing traditional microfluidics with a printed circuit board that allows you to embed the fluidics — cells can live inside of it. And that allows you to communicate and control cells in a way that has not been possible before at high throughput. And the largest application that we see for that is in biologics manufacturing. Right now, biologics — it’s a half a trillion dollar industry and it’s supply limited. So every year, Samsung Biologics has to build a new $400 million facility. The reason they’re doing that is because you can only get so much out of a traditional fab plant. They’re closer to silicon fabs actually. And the largest lever that they have is in yield — so how much can you get out of these things, are they producing, and also what are the costs. The core of that comes down to literally how many of these dynamic cell culture experiments can you run. And that’s a process called media optimization. And it ends up that that one problem ends up being connected to this half a trillion dollar industry.

[00:05:00] Media optimization as the biggest lever

Abhi: So to paraphrase, if I wanted to increase biologics manufacturing by an order of magnitude — at least my capacity to produce like antibodies and the like — the lever that is most easily pushed on and most likely to give you the most bang for your buck is media optimization.

Sterling: It is the most bang for your buck. You are unlikely to get 10x on that. What you’re looking at is how much can I produce per unit time, and then how consistent is that. And if you can produce more per unit time, you get higher throughput for the entire facility. And then if you have more stability in the product — for biologics and for things that go in our bodies — that’s a desirable outcome.

Abhi: And so my conception of these bioreactors that are producing antibodies is you have a bunch of CHO cells maybe sitting in a very large tank. They’re sitting in a fluid of media and they’re constantly just excreting out these antibodies that are later purified. Iku comes in at the step of deciding what media to actually put into this tank. Is that fair to say?

Sterling: Correct. Yeah.

Abhi: What is — well, like I’ve never worked in a wet lab before.

[00:06:23] What actually is media?

Abhi: My conception of media is that it is sugar water that cells are generally fine with drinking up. I’ve learned that this is incorrect and I’d like to hear your take for what actually is media.

Sterling: I would say that that is a very limited view of what media is — not incorrect in that, if we were talking about media for growing yeast, sugar in water is pretty close to sufficient. But the more powerful way of thinking about media is that it is a very high dimensional control surface for what you can get cells to do, right? Cellular communication comes through things in the media, right? The media actually is the communication channel in a sense between cells. It’s also what carries nutrients into the cells. In mammalian cell culture, it’s closer to serum in blood. So it has either many different types of proteins in it. It’ll have different metabolites. It’ll have salts. In defined media it’ll have buffers to keep the pH. It basically has a lot of components — and there are hundreds of them really, down to things like magnesium. And each of these are really communicating and interacting with the cells. And they also work across different time periods. So you’ll have growth media, which is when you’re building up the cells, and then there’s media when you really just want them producing these particular things. And right now, if you buy or produce media internally, it tends to be connected to a particular clone or particular cell line. And so you will optimize the media for that particular cell line, or you’ll optimize media for — if you’re growing neurons. And so every — it’s complicated enough and important enough to the results that you get that exploring it is very valuable.

Abhi: Like I know that there are a few companies that have popped up claiming to technically redesign cell lines to make them better at biologics manufacturing. Does that also demand a change in media?

Sterling: It can demand — the key thing is that the biologics that we are producing now are becoming more complicated, and that is making media optimization more difficult. So you do tend to pair the cell line with a media line, both for repeatability and ease of use, also just for commercial reasons — that’s a better business. But you can — what really happens is you tend to take a standard growth media or something off the shelf, and then you will customize it for this particular thing that you’re trying to make. Because ultimately, productivity is really the interaction of these three or four things: it’s the cell line, it’s the media, it’s the process conditions or the tank that you put it in, and then the actual compound of interest and things that you’re trying to do.

Abhi: You mentioned earlier about like media is both a way — like nutrients for the cell — but is also the substrate upon which they actually communicate with each other. That second part was surprising to me. I did not naturally conceptualize cells in a tank actually talking to each other while they’re churning out antibodies. What are they communicating exactly? Does that question make sense?

Sterling: I think it’s maybe easier to think about it in the sense of our bodies, right? Cells will send out or communicate through different hormones, right? Those will get released. There are small signaling molecules that get broadcast — those are carried through the media. Well, in the body we call it blood serum, right? But in the sense, it’s media.

Abhi: You mentioned also that you have different stages of media that you want to introduce to the cells depending on the cell’s actual life cycle. Is that also true for serum in the human body? Does the body constantly adjust its own serum to whatever the cells need?

Sterling: Yeah. I mean, that is the way that cells differentiate, in a way. You’ve got some gradient that will happen, and then that gradient — that’s basically saying you’ve got different media, and that gradient can tell cells how to orient or can tell cells how to develop. And from stem cells, triggering when — what they’re going to end up being — that’s also basically — it becomes media as you add things into the cell environment there.

Abhi: So why — what’s stopping me from just replicating human serum for mammalian cells? Is that not the best substrate to use?

Sterling: Well, the first question is, where are you gonna get it?

Abhi: Well — I guess this is a more basic question. Do we understand human serum well enough to perfectly replicate it?

Sterling: Replicate it? I don’t know. What I will say — and that gets closer to what you were talking about originally — is that’s what we’ve been doing historically. But instead of using humans, which — not that — very limited supply, or limited willing supply —

[00:13:07] Fetal bovine serum and the move to synthetic media

Sterling: we’ve been using fetal bovine serum, so from calves. There are problems with that. It is highly variable. And for all of biologics manufacturing, the goal is reduce variability. And if one of your largest inputs is variable, that’s a problem. It’s also a challenge because things like — you can’t sterilize it in the traditional way. You can filter it, but you can’t heat it up without destroying — and things like prions, which could be quite bad, you would need to prevent those coming in. So the industry has really moved much towards formulated medias. So you’re building it up from the constituent parts, and that also allows you to — it reduces variation and gives you a lot more control over how you are particularly tuning that media.

Abhi: When you say like at some point fetal bovine serum was being used —

Sterling: Still. It is still in use. It’s mainly in use in research. I think — I’m — maybe there are some biologics manufacturers who are using fetal bovine serum. I don’t know. But I think the industry has pretty much moved to —

Abhi: At this point, would you consider that the synthetic serums that are attempting to recapitulate the biochemical properties of fetal bovine serum — the synthetic stuff is better? Or is it just like it’s easier to get, so you’re okay with not perfectly recapturing fetal bovine serum?

Sterling: I think it’s better.

Abhi: Okay.

Sterling: I think it’s better, and I think it’s better in that you again get to tune it.

Abhi: And so attempting to be more concrete about —

[00:15:10] Walk me through a media optimization workflow

Abhi: what is a media optimization engineer exactly doing? Let’s say I have a plate of CHO cells. I want to produce Keytruda, so pembro. I have a bunch of cells. I have all of them willing to produce the drug. They’ve been genetically edited to do that. What’s the next step?

Sterling: So the process in general is guess and check. So you will take a cell line that you’ve edited or produced for this. Most of the time it’s just — and then you’ll take it out from the freezer. You’re gonna grow it up a little bit. And then you will probably take four or five of those because you don’t kind of know yet, right — which particular strain will do best.

Abhi: So you’re trying with multiple strains.

Sterling: You’re gonna try with multiple strains. And then you will run experiments that allow you to — first you’re gonna run in microtiter plates normally, right. And you’re going to just see where are we, which of these cell lines seems like it fits best with these. After you’ve narrowed it down, you’re going to move to something that has more control. And the reason that you’re gonna move to something that has more control is that what happens in a microtiter plate is extremely disconnected from what happens in any kind of production environment. And the core reason for that has to do with flow. So in a microtiter plate, you get a lot of capillary issues, right? It changes the — you’ve got the surface tension kind of comes up, that changes the gas exchange rates. You get evaporation. And you don’t get any of the different gradients or different little bits of shear forces — all these things that actually affect how cells grow in large reactors. So what you do is you put it into what’s called a benchtop bioreactor. And so this is a little bit bigger than a Fiji bottle in terms of what it’ll contain, and it’s got an impeller in there and it’ll spin it around. So now you’re going to grow those cells in that media for 10 days or something, right? And during that time, you’re going to also change or control the pH level that’s in there. You’re going to control the temperature. You’ll set different impeller rates, seeing what’s optimal. And you’re going to run that for — one person can maybe run 12 of those experiments, 15 of those experiments. It’s pretty laborious right now to actually set those up. It’s gonna run, and during that time, you’re gonna pull off some samples. You’ll take those to the analytics section, depending on how booked up that is — that could be three days to a week sometimes to get all of your answers there. And then you’ll do that.

[00:18:49] Why biologics manufacturing is closer to semiconductors than chemistry

Abhi: I’m sorry, what questions are you asking at that point? What are the samples meant to answer?

Sterling: So ultimately, your sample is meant to answer how much total biologic did we produce in here, at what quality, right? And then the other question there is how overall — how consistent is it? Will it be — that’s actually a large sort of hidden cost, as I said. The best way to think about biologics manufacturing is to think about it as high precision manufacturing, closer to semiconductor manufacturing. That’s really the reason why Samsung Biologics is in the position that they are — because they took what they learned in terms of process control and brought that over. The reason that Fujifilm is a large manufacturer is because they took chemical process engineering and brought it over. Now, these were not biological companies, right? They are industrial manufacturing companies. And when you think about reducing process variability, one way of looking at that is how precise is the part that comes out. But then what makes up that, right, is like how much variation can we absorb without it affecting the end product? And so if you can come up with media and process conditions that are more forgiving, you’re relaxing it a bit, right? You can still end up with something that’s very precise at the end, but oh, we didn’t actually need as much — we were more forgiving over here. And that can be important because if you lose a batch of biologics, it’s very expensive. And that can happen. And it does happen. And so the way to reduce that is through media optimization. And so to finish on this — you’ve run that set of experiments, you’ve got your readout there. And those readouts, although those are the most important, you’re also going to characterize kind of everything in there that you can, because you want to see how those are affecting that actual result. Then you will repeat this. And depending on how much time you have, maybe you will get three or four runs at that, and then that’s it. And that comes down for biologics manufacturing to the regulatory reasons.

[00:21:50] Matching the phase three batch and generics

Abhi: So how much of — would you say the optimal cell lines and the optimal media — it’s like there is a threshold of quality you want to meet and after that you’re done, versus you are trying to make this as perfect as possible? Is it kind of dependent on what drug you’re trying to produce?

Sterling: I think the goal is match what was in the phase three trials. So in the process of taking a drug to market, during your phase three trials, the batch that you produced there — that is what all of the FDA’s evaluation was based on. So they want to keep that the same. So anything that deviates from that is undesirable.

Abhi: Is this true even when the drug goes off patent and the generics manufacturers — are they trying to make it even — they’re trying to improve the process even more, or even for them, they’re trying to replicate exactly what went on with the original company?

Sterling: That is a great question. I should look into that because — no, truly, because they do have to go through — so they have a couple options. The first thing is that they will basically just license the cell line and the media from the existing pharma company, right? Pay them for that. And then that way the pharma company can still get some revenue from that. The alternative is they need to come up with their own cell line and — I think the regulations are such that there’s a way of — I think it’s like if you can prove that it’s similar enough, then it just counts as a process change.

[00:24:12] The 200-dimensional search space

Abhi: And getting back to the question of actual media optimization — the media optimization person goes to the analytical chemist. The chemist tells you all you need to know about the samples that you’ve been given. You repeat this five to six times. What are the levers of change that you have over the media?

Sterling: So media is best thought of as this control surface for affecting what the cells are doing. What are the levers in there? You can change the components, and then you can change the concentration of those components, and then you can change timing of those things. And if you start with 200 or more — let’s start with 200 components that you could put in there, and then the different concentrations that they come in, and then the timing — that already is quite a large space to explore. Then you have that interacting with the cell and the different cell lines — larger space. And then with that fixed compound that you’re looking for. So the standard things that people are going to change or tune, right, is when is a carbon source coming in, and when — as you start producing different proteins, the needs of the cell change. So if you shift into a different mode for the cell — you can signal it to shift into a different mode, starts producing these other — all of a sudden its needs change.

Abhi: Mm-hmm.

Sterling: And being able to anticipate, buffer, and meet those needs — that then has a lot to do with the output.

Abhi: How much of the optimization — like even the direction or specifics of the optimization — can be theoretically known and applied versus just always empirically determined? I guess the more specific question I’m asking is, does a media optimization engineer — are they coming to every new problem almost like tabula rasa? Whatever experience they had in the past does not apply to this new cell line with this new drug.

Sterling: So the question of how tractable is this of a problem and what’s the current state of the art — the current state of the art is that best practices live in the mind of the practitioners. And a lot of that comes down to familiarity with that cell line, familiarity with the media they already have. And most manufacturers are working in a particular kind of domain or specialty, right? And so as you’re constraining that search space, it does make it easier to operate in there. However, it is not the case that you will one-shot it coming through. And then the second thing is, it’s actually reasonably easy to get caught in a local maxima. And if the cost of running those experiments or experiments themselves are sort of precious, you’re really not going to push very far out. The lever they currently use is mainly in strain engineering. And so they’ll try to select strains that’ll have the highest performance. But once those cells that you’re using are set, it does all come down to the media for optimization. In a model sense, it does seem that it’s tractable. It does seem like there’s transfer learning. How broad that really comes down to what experiments have we been able to feed into these models so far? And the answer is not very many. The largest facility that I know of for running sort of like dynamic cell culture experiments — they can run like 300.

Abhi: In parallel at any given time?

Sterling: Yeah. 300. And that’s like, the entire company is just doing that. So that’s the state of the art. And a lot of that comes back to the fact that it’s so manual.

Abhi: So the one last question I have before we move on to how Iku is fixing this — I can understand being able to easily modify concentration of the media. I understand being able to modify the timing of when you’re giving which media to the cell line. The components, the constituent components, feels a lot more complicated. Because that’s like 200 components. How much of that is like — in practice there’s 10 of them you modify at any given time, and the other 190 are pretty standard and all cell lines will need this.

Sterling: Yeah. So how much is like — what’s the core? Is there some —

Abhi: Dimensionality reduction?

Sterling: Yeah, like is there an 80/20 thing going on? Oh yeah, absolutely. Absolutely. Which, as I said, the glucose — your sugar source or carbon source, energy, the pH that you’re running at — those are, yeah, there probably are 10 that are dominating. But that’s why it’s actually so challenging — because there are 10 that are dominating, but because the system that we’re controlling is quite non-linear, it can amplify what are sometimes in certain conditions some small change. And my favorite example of this is that — this was in industrial manufacturing — but changing the amount, just changing the amount of magnesium at a particular point doubled the output. And it didn’t necessarily need — there was no a priori way of knowing that it would’ve been magnesium that went in there. And you can say, oh, okay, sure, that’s a lever and we should do that on each of these. But the problem is that potential exists for all of those other 190 things, right? So it’s like, sure, there are these core things that tend to dominate —

Abhi: But those 10 things could vary based on what the problem actually is.

Sterling: Yeah. Well, those core things of like — you do need to, the salts that are in there, right, and when energy comes into the system — those are definitely floor level. You have to figure those out. But then — and if you get those wrong, basically those are controlling the — where the floor is. So if you get those wrong, it kind of doesn’t matter what you do in these other areas. You’re not going to have high performance. But just because you get those right doesn’t mean that you have high performance at all. They’re just table stakes. You need to get those done.

Abhi: That makes sense. And so we mentioned this engineer who’s trying to produce Keytruda.

Sterling: Sure.

Abhi: They’re evidently building, at the very beginning, in a Fiji-shaped bioreactor.

Sterling: Yep.

Abhi: Doing these rounds of iteration, trying to get to something good. What is Iku’s proposal for a better way to do it?

Sterling: Our proposal is to rethink what it is that you’re trying to do when you run that experiment. So that Fiji bottle device gets used for two purposes, one of which is you want to grow cells and you want to grow them to feed a seed train. So you’re growing them, or you need that quantity of those cells. That’s one. And the second is that you need information and you need to be able to control the environment that the cells are in over time in order to get it. And so for this first set of things where you’re trying to grow a lot of cells or grow them up — great, perfect use for it. If you’re trying to extract the most amount of information and trying to control the cells, it’s a very limited way of doing it. Before starting on any of this, I’d actually seen some of these benchtop reactors and I asked them — if the thesis is that it gets better when you go smaller, why did you stop at the Fiji bottle? And the answer was, well, if we go any smaller, our sensors won’t fit. And that’s because they’re using off-the-shelf sensors. And if you ever see a photo of these things, it’s a hodgepodge of different things that have been kind of crammed in there. And that literally is — doing sensor design is its own field. And you need to design not just one type of sensor. You need to design many different types of sensors. And there’s also not that much of a benefit going from a Fiji bottle to half a Fiji bottle in size because of the manual labor and all these things. So our solution is to think about what’s actually the best platform for building sensors, and then can you put cells inside of it? And my last company was a robotics company. Any of the humanoids now that you see going on — I’m highly skeptical of the economics on these things — but any of the humanoids that you see, the core technology that enables them to move and interact with the environment — that was what we built. And that is a sensor problem. And it’s a sensor in a high-noise environment. And that is abstractly quite close to what we’re doing in biology, right? So the idea is, if you have a good place for building and placing sensors of different types around, now you’ve reduced the problem. And so, easy place to build sensors — now you just have to figure out how to grow cells inside of it and keep them alive. And if you pick a mass-manufacturable technique for doing that, it also solves some of the scaling problems. Because the challenge with controllable systems right now is that they still literally require somebody to come over, unhook everything, set it up. You can use disposables to take that down a bit. But it also takes — when you go larger, it takes more media. It’s more expensive to run it. It’s less repeatable. None of it makes sense except that it’s a difficult engineering problem.

Abhi: In a practical sense — I can buy that this form factor was chosen purely because our sensors aren’t small enough to fit in something smaller. What is the form factor that you guys have?

[00:37:02] Printed circuit boards as a medium for microfluidics, and the utility of lithography

Sterling: So the core differentiator is that we are reusing printed circuit boards, which are ubiquitous. They are in your phone, in your microwave. And we put microfluidic channels inside of them. And by doing that, it allows you to then have cells live inside. They can pass through, they can live inside there. And it turns out that making microfluidics previously that integrate those types of sensors is extremely awkward. And so you either don’t do it, or if you do do it, it’s still hand-finished. And so the big differentiator is everything comes straight from the fabricator ready to go. And this is a theme that has happened before. So in silicon photonics, which is where you take existing silicon fabs and you say, hey, can we use this in a new way? And not just to do integrated circuits, but can we now do things with light in it? Or in your iPhone, it has a light detector. That was a new way of using that. And the core there is that the process that’s used is called lithography, which is where you’ll take a mask, kind of like a snowflake, you project light down through that or something, and that causes certain things to react and certain things not. And lithography is a really powerful manufacturing technique because you get complexity for free. What that means is, normally if you’re doing traditional subtractive manufacturing, as your part gets more complex — you’ve got more nooks and crannies in here — it takes more time to make it, or you’ve got more tool changes, all these things. But with lithography, you pay that cost once. You pay that cost when you make your snowflake. But it actually doesn’t matter how complicated you make the snowflake for what’s down here. And so it pushes you to say, what’s the most complicated thing we can make here that has the most value? Because it literally costs the same. It doesn’t matter if it’s one line through here or some complicated maze. So that’s what semiconductors are doing. Then they apply that to photonics, right? LIDAR — printed circuit boards are made the same way. It’s lithography. And if you can leverage that in more complicated ways, you start both enabling capabilities that weren’t possible before, and also are riding a cost curve that’s really beneficial. So the idea is, every time that we have found as a society a new use for lithography, large industries get built off of that.

Abhi: And sorry, so where’s the lithography component coming in when you’re talking about building a new bioreactor?

Sterling: So the way that we make our chips — which you have, right?

Abhi: Yeah. Let’s — do we? Oh man. Here it comes out pretty small.

Sterling: Yeah.

[00:40:48] Anatomy of the Iku device

Abhi: I am seeing that there’s a bunch of circuits coming on from here. Walk me through the anatomy of this device.

Sterling: Sure. So the first thing is that it looks kind of cohesive, but it’s actually six layers. And each layer either is carrying electrical signals or fluidics, or routing fluids in there. And so for this particular chip, it has a channel that’s a millimeter wide and about a hundred — about the size of a human hair — tall. And that’s actually a great size for cells. And you can flow media and cells into it. And then it has all of the components that a benchtop bioreactor or a more controllable system would have. And the way that you make these is through lithography. So these lines and all of the features that are on here — there’s a snowflake kind of pattern that’s made for that. And then they will put what’s called a resist and an etch on. And so it will keep those lines where you want them and etch away everything else. And then you make the next layer, and then you make the next layer, and then you compress all of those together. And so the way to think about it is, it’s like a 2.5D space. So you’ve got a two-dimensional plane, but you’ve got multiple two-dimensional planes. And so topologically that’s going to allow you to do things like take a spiral and get to the middle, and you need to get out of it. So you can come up and out in a way and around. And it also allows you to put electrodes or different sensors in relation to the fluid, in relation to the cells in different places. And that’s kind of abstract, but let me give you a very concrete example, which would be — if you want to have a readout of electrical signals of heart cells, cardiomyocytes, you want to read across those cells. Well, you need to be able to put electrodes above and below them normally, right? Or you can put them side to side, right? If you’re trying to do these things, that’s like a primitive — that is really, it sounds very simple. And yet I will tell you, that is, with other techniques, a difficult thing to do. And so by switching to this new substrate, a whole class of problems that are traditionally quite difficult become substantially easier.

Abhi: And sorry, I don’t have a great conception of where do the cells — on this green thing, are those holes where you put the cells?

Sterling: It is, it is. And I actually have a drawing I should send to you. You can put up a drawing on this screen.

Abhi: Yeah.

Sterling: Because that is also part of the problem — from the outside it literally looks the same as any printed circuit board. Second thing is, in biotech, a printed circuit board looks like alien technology. But yeah, it has actually small holes. There are ways of getting fluids into the actual device. And then you can run them past sensors, or you can — it’s often easier to run the fluid past the cells. And then you’re kind of reading things out on the fluid.

Abhi: And so there’s not a specific chamber here where the cells sit. They’re literally in a line formation as you run fluids through them.

Sterling: In this particular chip — this particular chip is like a year old. In newer designs, you have more like a chamber. And you’re seeding that chamber and then your cells are growing over it. But the powerful thing about using this technique for making microfluidics is that you can make a large number of variations, and it’s a difficult problem in traditional microfluidics because you would need to make new molds. And a new mold is $25,000, $40,000 — you need to get a mold maker to come in and machine it. Your economics on that mean that you need to make a lot of them. With printed circuit boards, it’s easier to make variations to them and just do it. So we have a core catalog that we’re building — these are the designs for particular applications. But every new printing, it’s relatively easy to change it to whatever the condition is.

Abhi: Sorry, is it fair to say that typically microfluidics are not built using lithography, but you are building them with lithography?

Sterling: Microfluidics historically started with lithography. They were built using similar techniques used for semiconductors. And in most research labs, when people build microfluidics, that’s still the way it’s done.

Okay.

What you’ll do is you will make a silicon mold and then you cast a polymer over it. This polymer is called PDMS. And the desirable properties of it is that it’s optically — not transparent, but you can at least see into it, and it’s gas permeable. And so that allows you to have exchange of gases without — you can put it in an incubator and you can use it there. Downside of that is you can also get evaporation. The problems with that is you end up with a fragile output, and it’s also fairly labor intensive to do that. But people like it because you can do it in your own lab. The difference comes down to the use of lithography for the sensors and fluidic channels together in this thing. And critically, for silicon fabs, you need to be really careful about contaminants. So if you need, for example, a gold-plated electrode, you cannot do that in a silicon fab because you will contaminate — it’s not allowed at all. Very bad. So with the printed circuit board as a medium, basically you can integrate many more different types of sensor modalities than are possible with silicon. And then the second thing is just — the reason to use silicon is because you want extremely fine features and detail. Once you need something on the nanometer scale, it’s kind of the only option. But our thesis is that cells themselves are more on the five-micron scale, which is a few orders of magnitude difference.

Abhi: Yeah.

Sterling: And that’s actually the domain where printed circuit boards are a better place.

Abhi: Is there — so if historically people do use lithography for microfluidics, but they only use it for the channels and not the actual electronics — what innovation allowed you to actually include electronics in the design of the microfluidic?

Sterling: Yeah, so let me state that. Microfluidics is a really broad term. For example, DNA sequencing — Illumina, right? That’s using silicon for a microfluidic system. And doing the sensors. It’s a really useful place for doing that. But it has limitations in terms of where in space you can place things. The example I gave earlier about trying to read across these cardiomyocytes — you can’t do that with silicon. There’s no way to build a channel that size that you need for the cells to go through it, but it’s buried and you have electrodes above — it just — you just can’t make it that way. So the core innovation is, first of all, just conceptually thinking about printed circuit boards as a medium for making microfluidics. I’d been working with circuit boards for 10 years or something. Never occurred to me to put fluidics into them. Been talking to people about this for three years. Never met anybody who was like, oh yeah, I’ve seen that before.

Abhi: So as of today, there’s no one combining circuit boards with microfluidics?

Sterling: Not for — there is for diagnostics.

Abhi: Oh, okay.

Sterling: Yeah. So Professor Moschou at the University of Bath — she’s really the pioneer of putting fluids into the circuit board from the fabricator. And the reason that’s so important — that I keep coming back to it — is you can do a lot of things and, academics are prone to this, you can do a lot of things by hand that does not scale if you need to make hundreds of thousands or a million of things, right? If you’re doing that, you need to pick something that is mass-manufacturable. So in terms of cost and complexity, the cheapest thing to mass-manufacture for microfluidics — it’s either paper or molded things when you build a lot of it. But if you try to make microfluidics in a PCB in a lab, you can do all kinds of weird things. Getting it so that it’s compatible with the standard fabrication process — that’s a different ask, both because they’re not terribly keen on changing their processes for the most part. But then the second thing is that when you do it by hand, you’re introducing variability from the beginning. When you have it done in a fabricator, you’re inheriting the hundreds of billions of dollars that have been spent cumulatively on printed circuit board development. It’s been around for 60 years. Entire industries are built upon it being already very good. So let’s just reuse that thing that’s already quite good and low variability.

Abhi: Could you give me some intuition for how the device is actually put together? So my mental conception of lithography is you’re able to create these very fine channels in the silicon via shining light through a mask. What’s the next step after that? Maybe you do this on multiple layers to have this multi-layered system of channeled —

Sterling: Yeah. So for traditional silicon fabrication, it really is a mask and then you etch and then it’s a mask and you etch and mask and you etch. With printed circuit boards, it’s more like each layer can be made out of different materials. So this is where there’s an enormous amount of flexibility in terms of — it’s a much richer palette to start building out of. So the foundation is what’s called FR-4, which is a fiberglass structure. That’s why they’re normally green. It’s a fiberglass structure. And on top of it, it’ll come coated in a layer of copper, layer of copper on the bottom. And that is the simplest circuit board that you will buy. The cheapest one is just that, and it’s just been etched. And then they will put down what’s called basically a protective layer on it, so that you don’t just scratch off the copper. And then you’ll silkscreen it, which is if you want to put labeling and all these things. But at its core, that’s what the process is. When you add in microfluidics, there are techniques for being able to make the fluidic channels on one layer. And then as you need, you can just stack on another layer, and then that layer has fluidics, or in between them now you can route your heaters, right? You need to put your heaters there. Or if you want to put the electrodes or whatever your end sensor is, you’ll pattern that on that layer and then you sort of build it up and then you stack them together. You close it and then —

Abhi: So in V2 of this device, you have this chamber where the cells live. You have microfluidics connecting this internal chamber — maybe it’s external — to a bunch of pipes that feed in some particular axis of variation that you want to control during the media optimization process. And you also have embedded or maybe external sensors that are connected to the circuit board to have some sort of readout of what’s going on in this chamber where the cells live as the media is being applied. And what’s the output? What do you actually — what is the output of the system? I imagine one is maybe temperature, maybe another is internal humidity. What other axes are there that you can actually get straight off the sensor and straight off the device?

[00:57:09] What sensors are on the device today?

Sterling: So the way to think about it is that if you’re going to do any kind of cell culture, there are a set of table stakes that you need to be able to do in there. And those are temperature, pH, dissolved oxygen — we’re flowing things through, so you need to be able to measure flow rate. And those together — that’s the core set of things that our system is currently reading from. The next layer are the electrochemical sensors. So being able to read impedance is actually very useful. If you can read impedance for the media itself, you can detect some changes in how the media is adapting. And if you place them in relation to the cells, you can also correlate cell growth with impedance, which is based on how these charges sort of end up hitting against cell walls at different frequencies. So that’s a core thing there. You can do conductivity through it, which is partially used for offsetting where the impedance reading is coming from, because it can get interfered with in a lot of ways. And so you sort of need a reference point in order to do that. And then you can do other electrochemical techniques, like cyclic voltammetry. But the readouts right now are the impedance, flow, dissolved oxygen, pH, and temperature.

Abhi: Theoretically, I imagine all of these sensors already had miniaturized versions of them available. Is that true? Not true?

Sterling: Not the case. Not the case. Nothing that our system can do at the moment is anything that you couldn’t have done by hand or with a very custom setup. The challenge is, how do you do more than two of those, three of those, at a time? How do you build them economically? For example, the chip that I showed you, in any kind of reasonable quantities, it’s like $4 or something, $3. And that’s actually still even — you can get it down to less than a dollar on that. So if you’re buying sensors off the shelf, the economics are going to start killing you very quickly. And then the second thing is, it’s a challenge to integrate those things. So a big idea in robotics or engineering — any kind of real system — is that interfaces and connectors are what will kill you. They’re very common points of failure. So the best solution is no connectors. When you build sensors all in the same platform, you essentially get to do it with no connectors. So that’s the trade-off — harder, more difficult engineering from the outset, but lower variability and better economics at the outset.

Abhi: I imagine you get dissolved oxygen, pH, and a few of these other parameters. I imagine there’s still some you’re missing in the sense of — is the protein that I’m expecting to produce actually being produced?

Sterling: Yeah.

[01:01:36] How do you use the Iku device to perform media optimization?

Abhi: So it sounds like you’re allowed to optimize to a threshold and then after that you need the analytical chemist to come back in and do their thing.

Sterling: So our goal is to make the analytical chemist kind of a confirmation rather than be limited by it. And the reason comes down to lessons from control theory. So the first is that any system that you’re trying to control — in this case, cells — if they move at a certain rate or certain speed, and you want to be able to dampen that or amplify it, right? You need to be able to read it fast enough that you can come in and make an intervention. Anytime that you take a sample of something and do an offline measurement, that loop is normally too long, right? Sometimes that loop is five minutes or two minutes — okay, maybe you can work with that. If you need to take something to your analytical chemist, it’s probably hours or days. That information is not useful to you in the actual control of the culture, right? So what you want are real-time sensors. You want sensors that are truly integrated into the thing. For the sensors that we’re using now — that really is just the table stakes to enable us to start building in these other sensors. If you don’t have those core sensors, you can’t even keep the cells alive. There’s just no point. But being able to have live readouts of monoclonal antibodies — that is what we’re building towards in the device. It’s being able to have the optical sensors built in. It’s being able to leverage the biological techniques or chemical biology techniques that we have right now for getting signals out of cells. All of those are compatible with our system. And that’s where I think the real value starts becoming unlocked, because there’s a large difference, sort of philosophically, between just reducing the cost of something versus what questions become askable now. And the questions that become askable and the experiments that you could run — that’s what I think is so powerful about using this substrate as a technique. You make this core thing — can you grow cells in high throughput in this dynamic way? Okay. Once you have that, every new sensor system you put in gives you more lenses into it. And this comes back to why lithography is so powerful — normally you have to make a trade-off, right? Every sensor I put in, it costs me money. And so I’m only going to put in the sensors that I need here. But if it doesn’t cost us anymore, or if it’s basically trivial, then the idea is actually let’s just instrument it. Let’s just keep instrumenting it. And classically you would say, well, I don’t really care about those features and those things. Those things don’t matter. But what we’re moving towards is more of having fewer priors and having less human interpretation on the streams of data that are coming in. And so for example, the impedance sensing does not give you a simple number that comes out. It’s a complex number that comes out. Okay, whatever. You could still deal with that, but there’s a complex number across hundreds of frequencies. So you’re getting back this large readout. And then it’s changing over time. So if you and I try to decode that, it can be difficult, right? And we can argue about this, but machine learning is getting pretty good — arguably quite good at handling those types of things. And so the way that I separate these two — they’re what are called narrow-band sensors, and then they’re broadband sensors. So a narrow-band sensor is, for example, readout on temperature. You’re gonna resolve that to some either resistor variable or some Celsius basis, and you want that to pretty much just respond to temperature, not respond to anything else, right? Very easy thing to interpret. Same way with your lactate — you want something that only responds to the lactate in the media, nothing else coming out. These are narrow-band sensors. They’re meant to reject everything else. And then there are what I’m gonna call these wider-band sensors, which is — if you take a microscope and put it on something, that’s a fairly wideband, right? There’s a lot of stuff going on in there. There’s not just one answer about what’s going on. And you can sort of select — I think these things are more relevant to the questions I’m asking, or not. And things like optical, the impedance, some of these other electrochemical techniques, the magnetic fields that are in there — when you have machine learning on the other end to interpret that, it would be surprising to me that that’s not useful.

Abhi: This is maybe a naive question, but at the end of the day, all the signal you’re able to extract from this device is gonna be some electrical property of the tiny little bioreactor you have in there. Is that correct?

Sterling: No, the big picture is that we’re integrating all of these different modalities. So we are integrating the optical modality. My dream here is to get Raman sensing into — multiplexing Raman sensing across this, right? Having that method of looking at it. It’s having those with the lactate and the glucose and the monoclonal antibody readout, right? Or whatever those domains are — in an instrument sense, that’s extremely powerful. So that’s the goal.

Abhi: Okay. Interesting. I imagine some of these variables — you mentioned — are immediately interpretable. There’s a good value you should be reaching. I imagine dissolved oxygen is one of those. For the more complicated ones where you don’t know whether this is a good value or a bad value — like glucose or some other mineral — where does the ground truth come in? Is that where the analytical chemist comes in and they give one singular data point, like what’s good? And then the purpose of the system is to correlate everything that you put into the system and all these output variables you got out to that ground truth? Or something else?

Sterling: So I think a useful lens for this is from a book called How to Measure Anything. Highly recommend. This book changed my life. And the idea is the expected value of perfect information — that any reduction in uncertainty has some cost to it. So when we’re taking a measurement, there’s an economic aspect to that and therefore a trade-off. So knowing the temperature of this room — there’s not much value to us, right? Doesn’t matter whether we’re off five degrees or 0.1 degrees. For semiconductor manufacturing, matters quite a lot, right? You need really, really tight value there. So if you take that lens and you say — certainly overall, there’s a need to have precision on the readouts of how much antibody do we get out of this, and the quality of that, right? But earlier parts of the process — do you need that level of precision?

Abhi: Well, I guess at the end of the day, I imagine the whole purpose of the process is to get to antibody production. But I guess, is part of what you’re saying that there are earlier intermediate benchmarks you want to hit before you get to the antibody?

Sterling: What I’m saying is that your ultimate readout, right, is yield, titer, quality, and stability over these things. Those are the things you care about. And pretty much in that order. Even on the yield though, you’re still going to get — there’s still variation inherent in cells, right? Every batch you run, even though they’re trying to reduce variability, you’re still going to get some variation in there. So if you take a sample and you learn to two decimal points the titer that came out of that, the yield that came out of that — okay, great. But your process variability is 1% anyway, or something, 2% anyway. So knowing it to three decimal places doesn’t really help you. And then the second part of it is — if every measurement has a cost in some sense, can you change your measurement system such that you get the information that you need in a more economical way? And part of the way of doing that is by loosening constraints when possible. So ultimately, certainly you still need — you’re still gonna run it on your benchtop and your pilot things, and you are going to characterize it there, right? Because you do need ground truth from those things. But in terms of which is the right media or conditions to get to — okay, do you need two decimal points of accuracy on that? Do you need all of those readouts to do it? No.

Abhi: Is a good way of thinking about this — you start with the Iku device at the very beginning, and then once you’re happy with what you see, then you move on to the benchtop device? Allowing you to narrow your search space down to a very small number of parameters.

Sterling: Right. It would basically be like — you’re still going to end up — the process looks pretty much the same. The difference is what is the quality and speed that you came to that answer. What’s the quality of the answer you came to? What’s the speed that you came to it? And then the second part is, how many of those benchtop experiments did you need to run? Because there’s a difference between running them in an exploratory sense versus running them in a validation sense. In a validation sense, you’re just trying to make sure that things are repeatable. So you need to run, let’s say, three to five copies of it or something. But if you’re already quite confident that you’re at the optimal point, it doesn’t make sense to do the exploratory experimentation there anymore.

[01:14:44] Does media optimization survive scale-up?

Abhi: Moving on to — okay, you’ve done the Iku optimization and now it’s time to move on to the bigger things. How worried are you that moving the cells to a physically larger space forces the media optimization to move into a completely different direction?

Sterling: It’s definitely possible, and every time that you change physical shape and geometry, you do get some variation there. The confidence comes from understanding that — first of all, empirically, every microfluidic system that has flow integrated into it ends up correlating quite well with the larger system. The reason that people have hesitation about it is because they think about microfluidics that doesn’t have flow, and the recirculation effects. And that’s actually the key thing, right? It’s a question of, do you have flow in this thing or not? And how does that flow and those shear forces and the oxygen transfer rates and the gradients that you create — how are those representative of what’s going on here? So that’s one part of it. But let’s say you don’t buy any of that. The easier way is that it actually decomposes into two broad parts. There are parameters that change with scale. So these are things like your hydrostatic pressure — definitely changes with scale, right? You’re not getting away from that. Certain mixing times — these change. You can get pockets in very large reactors, right? These change. But then there are a set of parameters that empirically don’t seem to be scale-variant. And for the most part, media optimization seems to be scale-invariant.

Abhi: Do you imagine in the ideal setting that this is a closed-loop system that just continuously tries different media optimization parameters, feeds it all into a model, it plans the next round of media optimization, and that just goes in a loop?

Sterling: Yeah. So how does the — aside from running the experiments, how do you actually interpret and decide with it? So clearly the entire zeitgeist right now is about replacing the control layer with AI and models. And whether you can do that on experimental design from reading a bunch of papers and then this is the thing I’m going to build — I’m less convinced that that’s necessarily the best way. But for these types of experiments, certainly seems the way. It’s actually key for making the whole product, because otherwise you’re handed so much information back that the problem then shifts to processing it. So one of the lessons that I’ve taken from talking to people who have tried things in media optimization, tried doing cloud labs or doing these things — there’s a lot of hesitation around sharing cell lines. Understandable. And it also comes down to information about what the result of those cell lines are. So for example, a company that was running experiments externally was not allowed to look at the results of some of these analyses. It was in their contract that they’re not allowed to actually look at the results. So it’s really hard to improve or build your own model if you cannot look at the results. What we’re building is a federated model that allows the customers on-site to run the device. They can pull the model, get a new experiment design, that runs in there, and then the model weights are updated, right? This is the same way that the Tesla self-driving was trained, right? Federated learning resolves that IP-sharing complaint or constraint. And the reason that’s so powerful is that now you have a model that is learning from diverse experiments across different cell lines, at different places, but still on the same hardware. That’s really key, because otherwise there’s too much experimental variability in the data you’re getting back. And so you’re not gonna generalize well on that. And the sort of hedged bet here is that if it’s not tractable through machine learning and models, we are still building the highest throughput, most economic, and fastest way to get to that answer through still running experiments. And if it is tractable, we’re going to have the best model for running those experiments. And I think the answer is actually going to be a blend of both. I do not believe that experimentation is going away. But I do think that we will be able to get to much better answers much faster, because that’s really the ideal, right? The ideal is, once you have that model, now you can feed it in even earlier in the process, right? When you’re doing your strain engineering. So coupling those together becomes possible once you have a model.

Abhi: What parameters does the model actually intake? I imagine it takes all the inputs you’ve given into the system, all the outputs you get out of the system, and maybe what the system is actually meant to produce, and the strain itself. Is that everything or are there others?

Sterling: That’s — I think that’s a complete view.

Abhi: Okay. If the belief is that you’ll probably still need human experimentation to help the system along, and maybe the ML won’t fix everything zero-shot — can I conceptualize this as like there are 10x media optimization engineers, and they’ll be able to iterate much faster on this model system as a result of that? Or do you imagine media — bioprocess engineering is a pretty standardized field where these are the first 10,000 things you try, and maybe in the old world you get to try like 5% of that, and in the new world you try those 10,000 things? But ultimately it’s the same set of parameters that the media optimization engineer is tuning.

Sterling: So are we tuning a different, a larger set of things rather than just the engineer?

Abhi: Yeah. Like, all the knobs that the engineer usually gets to tune — do they also get to tune in the system? Or is it a subset, or maybe even larger?

Sterling: It’s a superset.

Abhi: Superset. Okay.

Sterling: You’re getting to tune far more. And it’s a superset in a few different senses. The first is that just bringing the economics down, making it automatic, allows you to — even if you had the capability previously to change a variable, you wouldn’t have essentially the budget or the time budget or the capital budget to actually exploit it. That’s one sense. The second is that it allows you to make finer interventions, with more feedback built in. So the reason for having the real-time sensors, why that’s important — what you actually want to do is be able to anticipate what the cell wants before it needs it. Because there’s always a delay between when something gets introduced into the environment to when it gets uptaken by the cell, right? So ideally I actually want to see those signals happening before the cell needs it. Now, in order to do that, you need real-time sensors that are picking up on that and starting to match that. So that’s a domain that’s just not possible —

— in other systems.

[01:24:32] $8/lane vs. $20,000/lane: the economic utility of Iku’s device

Abhi: I’m curious about — I assume there are microfluidic bioreactor systems that at least exist in the literature. How much improvement do people generally see by going to these systems versus the Fiji-shaped benchtop?

Sterling: Right now? I would say close to zero. And the reason is economic. So the one metric or lens for looking at it is just what is the all-in cost to getting that dynamic cell culture data — that one experiment, that data. And there’s two components to that. The first is, what’s your CapEx, right? How much did it cost to actually get this device in here and use this thing? And it’s really this CapEx per experimental lane. And then the second is, what is the OpEx on that? Every time that we run the experiment, how much does that cost? And so to give an example — the benchtop reactors, depending on whether you’re going with the gold standard or some of the derivative ones now, let’s say the CapEx is between $5,000 to $15,000, $20,000 for each experimental lane. And then your OpEx is — you’ve got not just the media, you need to also take the time to grow the cells up to be able to seed it. You’ve got the human coming in and running it, and then you’ve got the actual disposable, or you’ve got cleaning the thing and sterilizing it. So it ends up being around $1,500, $2,000 every experiment that you run. The closest microfluidic system in capability — it’s only four lanes and it’s $80,000. And so that gives you a per-lane cost of still $20,000. And then the disposable costs are I think still around $500, $700 for each thing. So there’s no — there’s not much economic reason to it. The reason that that product is on the market is because it cuts down on media utilization. But that’s why I think that’s not a very successful product. What we’re building is — in philosophy, there’s a difference between changes in degree and changes in kind, right? So it’s like, okay, you take a little step, you take a little step, and it’s just, okay, it’s different, but it’s not qualitatively that different. And then when you 10x or you 100x something, right — all of a sudden new things get unlocked. And so we’re looking at a CapEx of $8 a lane, and we’re looking at an OpEx per experiment of like $20 or less, right? And so those two things together really transform what’s — and then if, as I said, you start integrating more sensor systems into it, those two parts are kind of fixed, right? The CapEx and mostly OpEx on that. But the amount of data and the amount of value that you can get out of it — that’s where I think there’s much higher place to go.

Abhi: Instinctively — if I understand correctly, both existing microfluidic systems and your system have lithography as the underlying manufacturing component. And yours has circuits integrated, so you can get these sensors. But if the underlying creation process is the same, why are microfluidics so much more expensive than your device?

Sterling: So that device I was just referencing is not made with lithography. It’s a molded device. But the key thing actually is that they don’t have active — there’s a big divide in microfluidics between passive and active microfluidics. So passive is like paper microfluidics or something, right? Your pregnancy test — that’s paper microfluidics. It just does one thing, doesn’t have feedback in it, doesn’t really have control and regulation. And then really separate is, can you come in here, can you sense things and change things as they’re going on? And most of the systems right now do not multiplex the control aspect across a large number of things, and the sensing part of it, and some of the actuation part of it. If you have to use molded plastic, there’s kind of no way to integrate sensors easily from molded plastic. It doesn’t come out of the factory with all these things into it. You still have to go and add all these things together, so then you’re adding in labor costs there, right? And all that. So even if some of the end result is, in certain capabilities, similar, the upstream manufacturing of it — because you can’t integrate everything together — really constrains your economics on it.

Abhi: And so even if the lithography-produced microfluidics device that’s potentially on the market — that alone may cost something similar to the Iku device. But all the sensors that are added on increase the cost.

Sterling: Right. Let me back up here and say that lithography as a technique does have this property where the cost doesn’t scale with how complicated you make it. The big difference is, in silicon, the base cost for making it is substantially higher than the base cost for making things in printed circuit boards. So in general — this is true of almost all forms of manufacturing, to my knowledge — as you increase precision requirements, you increase cost. And it tends to scale logarithmically, right? So if you — there are two ways that you’re using silicon and lithography, which is either you will make it as a mold — so you’re really just using the lithography as a mold, and then you’ll peel this casted thing off of it. Or people will actually use the silicon and make the channels in there. But the problem with silicon is it’s really expensive. In general, we do not make disposables out of things that are made in silicon lithography. Because to make something this size — probably $400 or something.

[01:32:05] Why PCB microfluidics didn’t exist 10 years ago

Abhi: Why — if it seems like the big innovation here is combining lithography — or doing lithography on the circuit board as opposed to doing it either in silicon or via a mold — both of which seem more expensive than the printed circuit board — was it simply a matter of realizing that you could do this on circuit boards and dramatically reduce your costs? What — why did this not exist 10 years ago?

Sterling: Right. So I think the first is that different worlds don’t talk very much, and in this case, the tool-builder world and the tool-user world are very distinct. And the second is that — to answer the question of how did I come to it — I was in my apartment in São Paulo, and I’d been really digging into biofilms. I was like, okay, so much of this is about the concentration of these things, and they’re creating these little microenvironments and all of this. And then I was really — at the time there was this concern about, are we going to have enough bioproduction capacity? And what I’d seen work before is in traditional chemical synthesis — they switched to continuous flow microreactors. So Corning Glass, that makes the glass in your iPhone, they also make chemical reactors. And the benefit of this is that you can flow things together. They react quite quickly. You can pull the heat off and things, and it’s really consistent. The reason you can’t use that in biology at the moment is because, in order to — traditional chemical synthesis, you really are pretty much just controlling flow rate. And the reactions happen really fast normally, right? You just mix them together and it’s done. But in biology, right, you need sensors in order to see what’s going on. The environment is much more tightly controlled, right? There’s more aspects to it. And cells themselves are again perturbing the environment around them. So that was the lens I was looking at — how do you bring this thing that clearly worked in chemical engineering to biology? And also thinking about these biofilms. And so I studied mathematics. I literally wrote this down as a set of axioms. I was like, what do you need? You need to be able to hold fluids apart. You need to be able to combine them together, right? You need to integrate sensors of different modalities so that you can adapt it. It needs to be small, both for mass transfer reasons — because as you get smaller, there’s more surface area around. And the limitation from any reactions is literally just how fast can you get things from the gas phase into the liquid phase. And that’s purely a function of surface area. Even in large reactors when they’re using bubbles, the bubbles are just creating surface area. And it’s about diffusion across that. So if you go small, you get that. You go small, you also get laminar flow, which is really, really nice because it takes problems that are normally chaotic and it linearizes them. So there’s a great experiment everybody should watch on YouTube of — you put a couple drops of dye into this gel, and the gel has a really high viscosity, and then they stir it up this way, right?

Abhi: And they go backwards.

Sterling: Yeah. And they go backwards, right? And that idea — well, why can you do that? You can do that because in a sense it’s linear, right? Whereas in a chaotic system, you’ll get to some point and now you can’t tell which path you were at before, right? So these are things. And then you need to be able to run a lot of them, both for — originally it was for throughput, but that throughput idea also translates to data parallelization. And then if you need a lot of them, you also need it to be manufacturable, right? Mass-manufacturable and needs to come down. Okay, those are the axioms. I was like, these are the things I need. And then I literally went through every manufacturing technique that I could find. I mean, truly everything, down to like, what are they doing with 3D-printed glass at the moment. And you can just knock these out for a variety of reasons. The molded polymers don’t work because you can’t integrate the sensors in them quickly. 3D printing doesn’t work at all — it doesn’t matter what the modality is, because the infrastructure isn’t already there, right? So if you need to make a bunch of disposables — which, great business, always make disposables — if you need to make a bunch of disposables, then you should pick something that you don’t need to have a lot of capital in order to scale, right? So you need an existing manufacturing industry for it. And all these came back, and then ultimately I was like, let me just reframe it. I was like, let’s just pick one of these and optimize for that. What’s the best way to build sensors? I was like, well, printed circuit boards are really good. And I was like, okay, can I then build the rest of this in here? Let me just take a common technique — can I just select some subset of this problem, optimize for that, and then force the other ones to fit into it? And I was like, yeah, okay. Sensors are good there. It’s good on manufacturing. Okay. And then after that, went to the literature. It was like, okay, here’s the one person who’s actually done this. Go fly to England, go work with her, and then —

Abhi: The University of Bath person.

Sterling: Yeah.

Abhi: Okay. Interesting. One person in the world has stumbled across this idea. Well, I guess if every technique seems to have its mild drawbacks and there wasn’t a single optimal one that you stumbled across, what is the drawback of going for printed circuit board?

Sterling: Okay, well, I will tell you — from a — there’s the problem that you might think, and then there’s the problem you’ll discover. The problem you would think is that it’s a kind of weird thing. You have to get people to adapt to it, or — also, you do have to design each of those sensor domains. Just because you pick a good palette to work with, you still have to do a bunch of work. You don’t — all this — those all end up actually being not that big of a deal. The harder problem is this, which is that nobody understands it.

Abhi: That’s true.

Sterling: Truly, nobody understands it.

[01:39:24] Who is the customer?

Abhi: I guess, who are you selling these to? I can imagine one customer is academic labs. Maybe — and I imagine the much bigger customer are people either preparing drugs for clinical trials or generics manufacturers. How — one, how willing are they to buy this stuff? And two, is there a customer base I’m missing?

Sterling: Yeah, so I’d say our first customer is actually the US Army.

Abhi: Oh.

Sterling: And that’s for doing something quite different from media optimization, but still within the realm of — you need to explore a larger space and current ways of doing that are insufficient. The broader answer here of who’s the customer — the customer who feels the most pain for this are the large CDMOs. I’ve spoken to people who have worked for those places. What is the thing that they talk about every year? It’s yield. That’s it. They actually don’t have — if we’re talking about degrees of freedom for them as a company, they don’t have that many, right? They don’t come up with their own products. They aren’t allowed to innovate on it once the process is set. They have extraordinary downside risk if they make a mistake. And they are in a competitive marketplace with — the pharma companies are taking the bulk of the — the pharma companies are getting the value capture, right? They ultimately own distribution. And so those features make them very desirable buyers for it. But media optimization — if you — it both happens within pharma companies for their — sometimes pharma companies manufacture their own things — but also the process of running dynamic cell experiments, that dynamic cell culture, that is pervasive. That’s where I think the largest opportunity really is — all of these problems in biology, many of them ultimately just reduce to, how many dynamic cell culture experiments can you run? And so this is true for new antibiotics discovery. It’s true for doing things in organ-on-a-chip. It’s true in cancer research. If you actually just take the lens of, what are people trying to get out of this experiment? Well, they need to be able to come in, they need to be able to perturb things over time in this, and they need to be able to read out during it. Maybe that’s too big of a lens, right? Maybe there are particular areas where our system is not going to be compatible. But there’s enough of a core there. And the justification for this empirically is that you already see it — every time that bioreactors have gotten smaller and more automated, they diffuse more into the ecosystem. It gets adopted more and people continue to want more automation, more experiments, and cheaper on it.

[01:43:14] What is the ultimate goal of Iku?

Abhi: Do you view Iku as not just a media optimization company? The hope is that whatever the final device ends up looking like, it’s useful for almost anything that’s an in vitro system where you’re trying to screen many things across it.

Sterling: Yeah. Our goal is to produce 99% of the world’s dynamic biological data. And the reason that that’s achievable is because we do not produce that much right now. And by increasing the throughput, by increasing the relevant modalities that we’re putting in and those conditions, I think that is a totally achievable thing. That’s where I started in the beginning talking about this interface between computation and biology and there being that mismatch. That interface, that layer — that’s what we want to build and that’s what we want to own.

Abhi: I’m curious — among the customers right now — maybe the military project is its own direction — for selling this to either CDMOs, pharmas, generics manufacturers — my impression is that all of these groups, like you said, don’t like variability and so they’re very hesitant to buy new technology that promises the sky and the moon. What’s the hardest part about selling to these people and how do you reassure them that things are gonna be fine?

Sterling: I certainly underestimated the importance of that aspect here. I’ve sold a lot of things in my life so far in very different domains, and I will say that not only in biotech, not only in pharma, but for biopharma manufacturing, the level of conservatism and scrutiny is extraordinarily high. So the wedge or way of getting into that distribution — there are a few examples. The first one is the kind of traditional way, which would be, who do the CDMOs look towards? The CDMOs are not going to adopt it until they’ve seen a pharma company use it. Pharma companies are not going to talk to you until you have a paper published from probably a premier lab of some sort, right? The premier lab is not going to touch anything until at least you have a white paper and some connection. In order to do that, you need to build the device. So how do you resolve this problem of getting to that end customer? The first is that there are ways of augmenting existing instruments. So the advantage of it being a standalone sensing system is that you can come in as just an add-on to something — you’re still gonna have the same economics, but now we can offer you some more data out of that same thing. And you can — that’s a lower threshold for them and it’s not involved in the actual — they can just throw that part of the data away if they don’t like it, right? If it’s not useful. So that lowers some of it.

Abhi: I guess it’s cheap enough such that it’s not a major investment to try.

Sterling: Right, right. The second is — and a big — I was just rereading Geoffrey Moore’s Crossing the Chasm, which — have you read this?

Abhi: I have not.

Sterling: Okay. I highly recommend it. It’s been on my bookshelf for eight years, 10 years. The other day I was just like, I should reread this, and — my God. And the big idea is that what counts as a market — what counts as a market is not only that people are buying something repeatedly, but critically that it’s a group of people who talk to each other and look at each other, right? And so you’ve got the academic labs who look at each other and talk to each other. And then the pharma companies look at each other and talk to each other. And then the CDMOs look at each other. But the key thing is actually the big CDMOs — they don’t talk that much. They don’t associate that much with the little CDMOs. But those are ones actually that we can sell to and get some evidence coming in there. So there’s building ancillary systems that can tack on to existing things for getting in there. And then there’s the other way, which is just — be so good they can’t ignore you, in a sense.

Abhi: I was gonna ask — I imagine the gold standard here is you show one of these CDMOs, here’s the cost and titer of expert-produced media versus the cost and titer of expert plus Iku media.

Sterling: Right. Well, actually the gold standard is not that we say it — the gold standard is that Eli Lilly says it.

Abhi: Sure. Yeah.

Sterling: Right. Because that’s their customer. And those pharma companies own those CDMOs.

[01:49:07] What does the validation evidence need to look like?

Abhi: Yeah. I guess, has any pharma done this and produced — or even you internally have done this side-by-side comparison and you have this very clean result to share to them? Or is it more like you’re still in the phase of seeing the magnitude of improvement the system gives?

Sterling: It’s more like — it will be extremely surprising if you do not get the — first of all, if you don’t get the economics. And then also, all evidence points to being able to run more and different experiments gets you to a better answer. So you can kind of work back from that.

Abhi: If you follow the trend lines, it almost necessarily has to be the case that this is better than what’s currently being used.

Sterling: Yes.

Abhi: Okay. Yeah. Has there been — in the early initial deployments of this — and like, will there be a white paper coming out in the next year of, here’s what we found by using the Iku system?

Sterling: Sure. I would say it will necessarily be more dull than that. I would separate it between two — there’s the hype marketing stuff to do, and then there’s what a CSO actually looks at, right? And from my interaction with scientists as a breed — first of all, they are a breed, and secondly, they are allergic to any hype and any kind of promotional stuff here. So what they want to see — and I don’t need to be creative here — what they need to see is your experiments running your device with some readout, and then you take the gold standard and you replicate that, and it needs to be at the same facility, right? It needs to be that. You need to show those two. And basically the graph needs to be obvious enough that it’s like, okay, I can see how these correlate and they scale. They don’t actually need to be perfect. None of them are perfect. This is true for doing scale-up from the benchtop to the pilot and all these things, right? It’s really just a series of graphs. Like, okay, this thing maps onto this, maps onto this thing. And then the next step is, actually you need that replicated at another facility. So for pharma to adopt something, it’s not even just that — you need one lab, you need it to be out of three different labs who all get — because ultimately their thing is about repeatability.

Abhi: Reducing variability. Okay. Yeah.

Sterling: Reducing variability, but then also repeatability. Yeah.

[01:52:14] What would you do with $100M equity-free?

Abhi: If someone were to hand you a hundred million dollars equity-free to push forward the mission of Iku as much as possible — one, I would be curious where you would spend the money, and two, what are the axes of improvement that still lie ahead for the future of the device?

Sterling: Yeah. I think the first thing we would do is really build this high-throughput perfusion system. I would integrate Raman sensing, and I think that’s the — I think that’s a killer app. I think if you do that, it unlocks so much. But also, if you go back through the literature, people have been talking about the value and use of having a high-throughput perfusion device for a quarter century, and that was before we had the machine learning or AI to also interpret that data. That was before the problems that we’re encountering are also getting harder to manage. So I think that’s very clearly there. Along the way there, you build organ-on-a-chip, high throughput. That’s also a constraint right now. One of the larger manufacturers — they’re moving towards it a little bit, but they still have some trade-offs as they try to move to it. Where I think actually really interesting, and I hadn’t gone down until recently, is in droplet microfluidics. So the idea of — in some sense, what we’re doing with perfusion is, okay, let’s take a benchtop bioreactor and all that control, and let’s shrink that down. The droplet microfluidics is more like, let’s just take a test tube and shrink it really small, right? And if you shrink — that’s where 10x Genomics — that’s a form of droplet microfluidics. It tends to be more of an integration of microfluidics with some chemistry, some chemical technique to help with signaling or help with the formation of particular types of droplets that allow memories that you can diffuse through and things. But I think what’s really underexplored are two things. From the customer side or from the data side, it’s higher resolution, more temporal datasets from these, right? Getting back to this idea that cells are time-varying and sensitive and highly parallel in a bunch of different ways. The ability to shrink that experimental system down that much, explore the space, but not lose the temporal element the way that it is right now for the most part — I think that’s really, really powerful. And there are a couple of techniques people have been trying to get down to it. There’s a technique for getting it down to like seven minutes now. But it’s still — there’s still trade-offs. When I look at it, I’m like, oh, they still haven’t resolved these trade-offs. So that’s one aspect I think could be enormously valuable. And then the second thing is, the droplet microfluidics right now — they’re really focused on the formation of the droplets and these things coming through. They are not really chaining things together. And in the literature there are all of these almost like transistor parts, right? Little parts that people have built. And you can see there’s this dream of building truly lab-on-a-chip, right? And the problem is that right now, as you try to build a lab on a chip, you try to do these things — there just aren’t enough of the subsystems or steps that you can link together on there. So it’s like, you do a set of these and it’s, okay, we gotta come out of the chip, right? And then you kind of lose all of it. And so I think it’s really only in the past five years, and then with our technology for being able to actively manipulate things in there and do the feedback — I think rather than conceiving of lab automation as automating manual tasks, which has a hard upper bound on how much efficiency and capability that you will get out of it — let’s just start doing what we did in other industries, which would be, no, no, no. Okay, we have to start over and we have to build some of these things in here, but we’ve already built a lot of them. Now why don’t we actually start building that lab-on-a-chip?

[01:57:31] Lab automation is in a strange place right now

Abhi: I remember, for my lab automation article, one person remarked to me that it’s a shame that liquid handlers have become so popular, because biology happens at much smaller scales than that. So you’re making a system very large when it doesn’t need to be that large.

Sterling: It’s — okay. I don’t know if you’ve ever seen these robot arms that get a cup of coffee and then — they’ve got them in the San Francisco airport. Terrible idea. And it’s like, okay, you go over, and the machine picks up the cup and then puts it over here and does the grinder and brings it to you. What that’s doing is automating a manual task. It’s taking the way that humans have just done something and then been like, I’m just going to throw an arm or an anthropomorphic thing on top of it and then duplicate it. And the result of that is honestly not that great, right? There’s a reason that those things will continue to not take off in any sense other than novelty. And compare that to your Nespresso, which still has an interface — you still need to get your cup — but far better, right? The Nespresso, they’re like, oh, actually, let’s integrate the actual keeper and the automatic dispenser and all of these things, right? And they made it much more compact. Or your coffee vending machines — also works for this, right? Neither one of those are trying to just take the human steps and then be like, literally wherever the human is, we’ll just put this thing in here. And that’s what I’m seeing happening right now in lab automation in general. And I don’t just think it’s lazy. I do think it’s lazy. I don’t just think it’s lazy. I think it’s also close to philosophically a crime. I think it’s a crime —

Abhi: Because you think for automation to truly be useful, there needs to be a new way of interacting with the underlying systems.

Sterling: Yeah. It’s like, they’re just not really thinking through the problem.

Abhi: Well, I guess one argument is that it’s easier for these things to get adoption if you are allowing them to work in the exact same environments that humans are able to work in.

Sterling: Yeah. And I think that makes sense for things like machine tending for 3D printers or for CNC machines, right? But what’s the difference? Well, the CNC part is a hundred millimeters, right? So it necessarily has to be closer to human scale. But look at what’s happened in industrial space — the most useful places for robotics — and a heuristic you can use is, if it says “robotics” in it, it’s not really that useful. Whereas if it says what it just does, then it’s successful. So a dishwasher is a very useful robot, right? Self-driving car — very useful robot. And in the industrial space, it’s mainly around logistics and moving things, right? So the really successful ways of actually leveraging automation — first, they respect the real goal, and they respect the limits of the thing you’re trying to manipulate. So if the things you’re trying to manipulate are grocery things, one way could be — let’s take a humanoid and it goes to the grocery store and picks up things off the shelf. That’s what people do, and that’s what these humanoid companies want to do. And the alternative would be — actually, if you look at the logistics companies that do the best of it, it looks nothing like that at all, right? It’s some huge grid. It has these things running around like crazy, and all they’re doing is picking up these things and setting them down. And there is no way that a humanoid system can compete with that, right? There’s no way. And you can let the economics decide that over time. I just — this idea that it’s actually pushing the lab forward — I don’t really buy. I also do not see Eli Lilly or Johnson & Johnson putting a robotic arm near their lab bench. I think what Kao’s doing with their lab automation system — those carts — right. I think that’s at least sort of a reasonable compromise in a sense. We don’t need to go and re-engineer each of these things that already exist. If we can literally just make the interface easier. But then I think the real goal should be, as much as possible, if the economics fit, just think through the problem correctly. Just put it on chip as much as possible.

Abhi: That makes sense. I think those were the last questions I had. Thank you so much for coming on.

Sterling: Thank you for having me. Yeah.

On creating 'new knobs of control' in biology

Abhishaike Mahajan — Fri, 10 Apr 2026 12:41:38 GMT

Note: I’ll be releasing a 2~ hour long Podcast in a few weeks, interviewing an early-stage founder working at the extremely niche intersection of (biomanufacturing x printed circuit boards). Please reach out to me at abhishaike@gmail.com or on X if you’d be interested in sponsoring it.

Introduction

Lipitor is a statin. Until it went off-patent in 2011, it was the best-selling drug of all time, and continues to be amongst the most prescribed. How does it work? After we swallow a pill of the stuff, it worms its way into our liver cells, crawls into the active site of a particular enzyme—HMG-CoA reductase—which turns down the rate of cholesterol synthesis in the liver, which leads to reduced cholesterol, which leads to saved lives.

But it is worth remembering that nobody is a willing participant here. Neither the HMG-CoA reductase nor the liver are aware of this cholesterol-reduction game that we humans are playing, and would almost certainly take great offense if alerted to it. The statin only works not because our biology has agreed to cooperate, but because the statin was intentionally made to impersonate something else, the thing that the HMG-CoA reductase is actually looking for, but the impersonator is biochemically incapable of participating in what the reductase wants to do with it. As a result, the therapeutic benefit is achieved: lowered cholesterol.

Our body never, ever intended for you, you that is, to take any part whatsoever in its maintenance. Our physiologies were built for evolution to handle, and it is only through the tools of evolution that we are allowed to intervene in the process at all. It is entirely by accident that the HMG-CoA reductase active site is available for us to touch, and without it, our body would happily let our arteries choke on their fatty deposits.

This clearly isn’t ideal.

Biology is uniquely limited amongst all scientific fields in that the ‘bottom’ of the subject rushes up to meet you very, very fast, where the fundamental barriers are our bodies’ presuppositions on what things ought to look like, rather than what is physically possible. Material scientists, electrical engineers, and mathematicians are not forced to suffer this indignity! Their bottom is the physical laws of the universe. Ours are the pre-existing biological communication networks that evolution could scrounge up given the deadlines it was under, and though it clearly tried to be clever during the process, the results are nowhere near as infinitely flexible as I at least would want them to be.

The whole situation feels very claustrophobic. Paternalistic even! Is there any way out? How can we regain more control over our poorly built physiology? Or, in other words, how do we install more ‘knobs of control’?

I had this question too! It turns out there’s a lot of new emerging therapeutic modalities that fit these criteria, and I decided to turn my research over them into an essay.

Examples of new knobs of control

Synthetic cell receptors

Here’s one simple way to install a new knob: stick a new receptor onto your cell membranes, something that responds to a chemical that only you, and not your body, has access to.

This was the thought process behind the incredibly-named ‘DREADD’, or Designer Receptor Exclusively Activated by Designer Drugs, line of research. In the early 2000s, Bryan Roth’s lab at UNC Chapel Hill started mutating G-protein coupled receptors on neurons to see if they could make versions that lost all sensitivity to endogenous ligands, but gained sensitivity to synthetic ones. They succeeded. Through several rounds of directed evolution, they created receptors that no longer responded to acetylcholine (or any other natural neurotransmitter) but responded potently to clozapine-N-oxide, or CNO. CNO doesn’t naturally exist in your body.

Your cells don’t make it, so it doesn’t bind meaningfully to any endogenous receptor. It’s a synthetic orphan chemical that, until DREADDs came along, had no biological partner.

In practice, the system works like this: you use a viral vector to deliver the DREADD gene to whatever cells you want to control. Once the DREADD is expressed on the cell surface, it just sits there, inert. Then you administer CNO, which binds exclusively to the DREADDs. When it binds, the receptor activates its coupled G-protein just like any normal GPCR would, doing whatever you engineered the GPCR to do. And, as far as anyone can tell, there are no off-target effects.

This is unambiguously a new knob. The receptor didn’t exist in your body before. Its binding partner (CNO) doesn’t exist endogenously. As a result, you, and you alone, get to decide when and where it is turned on.

But this said, one not-new-knob aspect of DREADDs is that it ultimately relies on GPCR-coded logic. Once CNO binds, the receptor couples endogenous G-proteins that plug into endogenous signaling cascades. There certainly is a novel input, but the output is still entirely native biology.

Thankfully, we need not have too much anxiety here. This was addressed in 2016, when Wendell Lim’s lab at UCSF published a paper in Cell demonstrating a synthetic cell receptor, known as SynNotch, that made both the input and the output programmable. Here, every component was modular and swappable. The extracellular domain could be any desired binding protein: single-chain antibody fragments, nanobodies, designed binding proteins. This determined what the receptor detected. When that sensor domain bound its target, it triggered a release of an intracellular domain. And that intracellular domain could be just as varied as the extracellular one.

Very cool! But where is this all actually useful?

One unintuitive place is in CAR-T therapy. The original CAR-T cells were programmed with a single receptor that recognized a single antigen on tumor cells, and when they found it, they killed. This worked, but it also had a few problems, specifically that it costs six-figures a dose, sometimes melts the insides of patients from immune overreaction, and that the therapy stops working due to the chimeric T-cells rapidly undergoing exhaustion due to overactivation. While throwing in something like SynNotch potentially makes the first problem worse, it may actually alleviate the latter two issues.

To understand how, let’s first consider what a SynNotch CAR-T would look like. Here, we have crafted an if→then logic system where we can control the if and we can control the then.

The "if" is the priming antigen, or, whatever the SynNotch extracellular domain is tuned to respond to. This could be a tumor-specific neoantigen, a tissue-specific marker, or really anything you can build a binder for. Once it is bound to, it can—as we’ve discussed—release whatever, and in our case that will be a transcription factor.

The "then" is whatever gene you put downstream of the transcription factor. In the simplest case, that's a CAR, but it doesn't have to be. You could induce cytokine secretion directly, or expression of a checkpoint inhibitor, or a suicide gene, or all of the above in some combination.

Why is this useful?

For the ‘sometimes melts the insides of patients’ issue, the problem with basic CAR-T cells is that they are always armed. Every cell expressing your target antigen, anywhere in the body, is a potential trigger for activation, and when millions of CAR-T cells encounter their target simultaneously, they release a flood of inflammatory cytokines, and if enough of them do this at once, you get systemic inflammation that can progress to organ failure. SynNotch constrains this geographically: if the priming antigen is tumor-localized, then the T cells only arm themselves inside the tumor microenvironment. To be clear, this is not a hypothetical on my end, this actually exists circa 2022!

For the ‘exhaustion’ problem, the issue with modern CAR-T’s is that they have a habit of signaling—also called tonic signaling— even when there's nothing to kill. The receptor is always sitting in the membrane, and so many CAR constructs exhibit some baseline activation even without antigen binding. Over days and weeks, this chronic low-level stimulation pushes T cells toward exhaustion, and, by the time they encounter the actual tumor, they may be largely inactive. SynNotch sidesteps this because there is no CAR until the priming event. Until the T cells find the tumor-specific antigen that causes them to express the CAR (or whatever else), they stay ‘fresh’. And, again, this is not a clever second-order belief about what may happen to SynNotch’d CAR-T, but an established finding with rather striking results (albeit in mice).

Fairly, these both presume a bog-standard CAR as the output, but still, that’s something that wouldn’t have been possible without the modularity of the system! But if this is so promising, where are our mutant, hyper-engineered SynNotch’d CAR-T’s?

Happily, it has not ended up in some valley of death. There are two ongoing phase 1 clinical trials right now, both run by UCSF, to test these types of constructs out in glioblastoma patients. Looking forwards to what the results will be!

Exotic physical sensors

We’ve discussed optogenetics on this blog before, guest-written by Pelagia Martin. But optogenetics really belongs to a much broader class of synthetic biology methods that seek to give cells entirely new sensing modalities, ways of perceiving the world that evolution never bothered to install.

To start off, let’s re-explain optogenetics, which is perhaps the first ever instantiation of this concept. The basic idea is to take light-sensitive proteins from algae or bacteria (channelrhodopsins, halorhodopsins, and their many cousins), stick them into neurons, and now you can control neural activity with light. Shine blue light, neurons fire. Shine yellow light, neurons silence. Your neurons did not previously respond to light. Now they do!

What other physical sensing modalities could we force into cells?

Seems like the answer is basically ‘anything’. Mechanosensitive receptors can be installed (sonogenetics), temperature sensitive receptors can be installed (thermogenetics), even magnetically sensitive receptors can be installed (magnetogenetics)—all of which work via the same fundamental properties as optogenetics.

You can do all sorts of interesting things with these.

With sonogenetics, you can engineer mice with mechanosensitive-expressing-neurons that can have their brain modulated via noninvasive ultrasound to affect disease-relevant brain circuitry—which you may recognize as one of the theses of Merge Labs. As a point of nuance, this isn’t exactly a fully new knob of control, since neurons are already slightly mechanosensitive. That’s why noninvasive ultrasound already works for modulating unmodified human neurons! But it is slightly new in the sense that engineering new mechanosensitive receptors have a few upsides: only your transfected neurons respond, the engineered channels are more sensitive than endogenous ones, and you can much more reliably do excitation/inhibition (typical ultrasound can do both, but it’s difficult to pick).

With thermogenetics, you can do something not dissimilar to the SynNotch CAR-T geographic activation, but instead of activating in the presence of a local antigen, instead have them activate only underneath mild elevations in temperature. From a 2020 paper:

To enable CAR T cells to respond to heat, we construct synthetic thermal gene switches that trigger expression of transgenes in response to mild elevations in local temperature (40–42 °C) but not to orthogonal cellular stresses such as hypoxia. We show that short pulses of heat (15–30 min) lead to more than 60-fold increases in gene expression without affecting key T cell functions including proliferation, migration, and cytotoxicity…In mouse models of adoptive transfer, photothermal targeting of intratumoral CAR T cells to control the production of an IL-15 superagonist significantly enhances anti-tumor activity and overall survival.

But while this is interesting, it feels a bit unsatisfying to repeat the same CAR-T trick again. Cell therapies suck for a lot of reasons, and it would be putting a lot of eggs in the same basket if all our ‘new knob’ ideas revolved around simply treating cancer better.

This leads us to magnetogenetics, which is perhaps the most interesting of the bunch, and in fact what drove me to write this article to start off with.

If you’re as online as I am, you may remember that in mid-summer 2024, Andrew York, a scientist at Calico Labs, posted a Twitter thread that is burned into my mind. It was over his team’s discovery of MagLOV, which is a fluorescent protein that is also magnetically sensitive.

addgene.org/219957/ ","username":"AndrewGYork","name":"Andrew York","profile_image_url":"https://pbs.substack.com/profile_images/934291215469514752/4-dXXuOF_normal.jpg","date":"2024-06-02T23:22:50.000Z","photos":[{"img_url":"https://substackcdn.com/image/upload/w_1028,c_limit,q_auto:best/l_twitter_play_button_rvaygk,w_88/fuxajbybmeruoajqssxn","link_url":"https://t.co/45UVihbQKU"}],"quoted_tweet":{},"reply_count":40,"retweet_count":295,"like_count":1029,"impression_count":201441,"expanded_url":null,"video_url":"https://video.twimg.com/ext_tw_video/1797407215935983617/pu/vid/avc1/484x558/0GP2SCRwemfYgKtw.mp4?tag=12","belowTheFold":true}" data-component-name="Twitter2ToDOM">

To understand the significance of this, we need some extra context. Prior to this result, magnetogenetics papers claimed things like "we fused ferritin to an ion channel and now magnetic fields open it." This is problematic, because it is literally physically impossible to do anything useful via this route — a fact that was explained at length in a great 2016 paper titled ‘Physical limits to magnetogenetics’:

These [above] calculations show that none of the biophysical schemes proposed in these [magnetogenetics] articles is even remotely plausible, and a few additional proposals were eliminated along the way. The forces or torques or temperatures they produce are too small by many orders of magnitude for the desired effects on molecular orientation or on membrane channels. If the phenomena occurred as described, they must rely on some entirely different mechanism.
Barring dramatic new discoveries about the structure of biological matter, the proposed routes to magnetogenetics, based on either pulling or heating a ferritin/channel complex with magnetic fields, have no chance of success.

Because of this, the field of magnetogenetics is a bit of a mess, with at least one high-profile failure to replicate in 2020.

So, what changed? How is MagLOV somehow responsive to magnetic fields?

Well, for one, MagLOV is not really mechanically responding to magnetic fields in the same way these magnetogenetics proteins papers claim, but rather alters its own fluorescence in response to a magnetic field. How does it do this? It’s beyond me, given that the underlying mechanism has something to do with ‘radical pair mechanisms’ and ‘quantum spin dynamics’, and I am not going to pretend I have any real intuition for either of these, but it does seem reasonable to boil the whole process down to two observations.

One, introducing a magnetic field alters an ongoing photochemical reaction, which changes the protein’s (MagLOV) fluorescence.

And two, fluorescence can change a protein’s conformation.

This means that you can alter the distribution of conformational states in a solution of [optosensitive protein fused with MagLOV] by altering nearby magnetic fields. Why is this useful? Because now we have a way to use this whole tech stack that optogenetics has built up over the last fifteen years, which has largely been squandered due to the fact that getting light inside the body is really, really hard.

As of 2026, there is now a company formed around this idea: Nonfiction Labs—co-founded by Richard Fuisz, who gave a really great Core Memory podcast over his work there—which has developed the first ever magnetically sensitive antibody, or ‘MagBody’.

You could imagine that a pretty simple application of this work is to do the same cancer tricks as before; only activating a drug at specific regions, or even suppressing a drug in especially sensitive areas. From another Core Memory article covering Nonfiction:

HER2 is a useful example. Herceptin, Kadcyla, and Enhertu are all FDA-approved drugs targeting this antigen, commonly found in breast cancers. All produce distinct toxicity because HER2 is also expressed in healthy tissue, particularly the heart. A magnetically controllable HER2 therapy could, in principle, be active at the tumor and silent in tissues prone to damage.

But while cancer is neat and all, we should think bigger. Where else would something like this, ‘this’ meaning magnetically sensitive proteins, be useful? It is common for new therapeutic modalities to tout their universal applicability, but there is a genuine reason to believe that magnetically sensitive proteins may be able to boast that without accusations of hyperbole. Because it is useful for basically any situation where you want external, non-invasive, temporal control over a protein’s function inside a living body, and that’s a lot of situations.

As an example of how creative one can get here: consider chronic pain. You may be aware that tools for managing this today are pretty bad. On one end, you have opioids, which work great, but are also systemic, addictive, tolerance-building, and responsible for a crisis that has killed more Americans than every war since Vietnam combined. On the other end, you have local anesthetics, which are geographically precise, but wear off in hours, require repeated injections, and are often not useful for many types of pain. And in between, you have things like gabapentin. Which sort of work, sometimes, for some people, while also making them foggy and fatigued, because, like opioids, they're systemically active and nonspecific.

The core issue, which by now should sound familiar, is that we have no knob to tune how pain-reduction medications work. Once the analgesic is in you, it does its thing everywhere, at all times, at whatever dose your last pill provided. If there is any knob made available, it is in a single, coarse, delayed-feedback dial for adjusting drug dosage.

If the world that Nonfiction Labs hopes to usher in comes to pass, the future may look very different. In this setting, you’ll receive a single systemic administration of a magnetically controllable protein that inhibits pain signaling. Maybe it’s a MagLOV-fused nanobody against a sodium channel like Nav1.7, which is famously specific to pain sensation. In the absence of a magnetic field, the nanobody is inert, or at least has dramatically reduced binding affinity. Then the patient puts on a wearable device that generates a local magnetic field over the affected area. The field activates the nanobody, which binds Nav1.7, which drops pain signaling. And if the patient needs the pain to return, for say, a physical therapy appointment, turning the magnet off suffices.

The patient now has, for the first time in the history of pain medicine, a knob for their own analgesia, one that is spatially and temporally specific in a way that was previously impossible.

Of course, many questions must be answered. How long will the MagBody persist in the body? How spatially precise of a magnetic field can you put onto a portable device? Will the nanobodies be affected by external magnetic fields, like an MRI? But the trajectory here feels fascinating to me, and I look forwards to learning more about what Nonfiction ends up doing.

Bioorthogonal chemistry

What actually happens when we ingest a drug? Say, aspirin. As you may expect, it begins chattering with the other biomolecules in our body. This has some upsides in the sense that interacting with native biology is typically the primary way that drugs exert their therapeutic effect, but the downside is that native biology talks back. Aspirin inhibits the COX2 enzyme, which is what you want—less prostaglandin synthesis, less inflammation, less pain—but the same enzyme also exists in your platelets, where it helps with clotting, which means that people with bleeding disorders cannot take it.

Now, fairly, how much demand for aspirin exists amongst hemophiliacs? Probably not much. But the broader point holds: every drug that works by engaging endogenous biology has the inconvenient habit of expressing your target in tissues we’d rather leave alone.

Bioorthogonal chemistry offers a way out of this headache. The idea, which won the 2022 Nobel Prize in Chemistry, is defined as ‘any chemical reaction that can occur inside of living systems without interfering with native biochemical processes’.

Well, wait a minute. Attempting to deviate from our plumbing entirely seems like a slightly contrived problem, no? If the ultimate purpose of any of these systems is to eventually interact with our native biology, why would we care about anything that doesn’t? Let’s suspend disbelief for the moment, I’ll explain the actual utility of solving the problem later. For now, let’s assume it’d be useful.

How would you do this so-called bioorthogonal chemistry?

Well, only half of the 2022 Nobel Prize was awarded to bioorthogonal chemistry, the other half was for an adjacent idea called ‘click chemistry’. Click chemistry is a much broader concept and has to do with designing chemical reactions that are modular, high-yielding, and work reliably under mild conditions. And as it turns out, the most therapeutically relevant click chemistry reactions happen to also be bioorthogonal, the two concepts are deeply intertwined, and three researchers—chemists and biologists alike—won the prize for this reason.

For our purposes, there is one particularly important reaction class here: the ‘inverse electron-demand Diels-Alder reaction between a tetrazine and a trans-cyclooctene (TCO)’.

‘What is that?’, you may ask. I’m not quite sure, but all you really need to understand is three things: neither tetrazine nor TCO react with anything in the human body, they react incredibly fast with one another, and the byproducts of their reactions are entirely innocuous (nitrogen and carbon dioxide). Ah, and we forgot the most important thing: both the tetrazine and TCO are very amenable to having external molecules bolted onto them, which, given some clever chemical engineering, would fall off after the (tetrazine x TCO) reaction occurs.

Do you see the therapeutic relevance here?

The utility of bioorthogonal chemistry is to create better prodrugs.

Prodrugs are defined as anything pharmacologically inert that, upon being introduced to the body, will undergo some form of chemical rearrangement—cleavage, addition of groups, and so on. They aren’t new either; the first prodrug appeared a century back, and, circa 2018, 12% of all approved small molecule therapeutics are prodrugs. What actually triggers a given prodrug’s conversion into something biologically active is heterogeneous, but can be grouped into one of a few categories: metabolism (e.g. phosphorylation), pH (e.g. acidic environments), or interaction with endogenous biomolecules (e.g. trypsin).

Importantly, whatever actually causes the conversion is typically the answer for why you’d use a prodrug at all. If activation depends on liver metabolism, it usually means the prodrug form survives the digestive tract better than the active drug would, buying you oral bioavailability. If activation depends on an enzyme enriched in a specific tissue, it means you get some geographic targeting for free. If activation depends on acidic pH, it means you’re exploiting the metabolic quirks of a particular tissue microenvironment to concentrate drug activity there. The trigger is the therapeutic logic.

The problem is that none of these triggers are yours. They are all endogenous, which means they are all leaky, which means you are playing a statistical game of relative improvement on a particular axis, not absolute.

The proposal for bioorthogonal chemistry prodrugs is that they offer absolute control; a prodrug will become active exactly where and when you want it to.

How?

Consider doxorubicin, one of the most relied-upon chemotherapeutics ever developed, part of a fairly high number of cancer treatment regimens, and also one of the most unpleasant. It is one of the few drugs, cancer or otherwise, with a foreboding nickname: ‘Red Devil’. Perhaps accordingly, doxorubicin is extremely cardiotoxic. Alongside causing you nausea, hair loss, and immunosuppression, it will, at some point, likely be responsible for giving you irreversible heart damage. The game of chemotherapy has always been one of hoping it kills the cancer before it kills you, and it is empirically the case that administration of the 'Red Devil' results in clinical heart failure in somewhere between 5% and 26% of patients, increasing based on the dose, and that if you are unlucky enough to be in that group, you face roughly 50% mortality within a year. In fact, there are case reports of patients dying of cardiac arrhythmias during the infusion itself.

Now imagine a version of doxorubicin that has been linked with TCO.

In this form, the active site of doxorubicin is capped, so the drug cannot interact with anything, including your heart. Meanwhile, at the tumor site, you have arranged for a tetrazine to be waiting. How? The simplest version, and the one furthest along clinically, is almost comically literal: you inject a tetrazine-modified biopolymer directly into the tumor. The (TCO x doxorubicin) conjugate is then administered systemically, circulates through the body, eventually stumbling onto the tetrazine deposit you made. The two react, and active doxorubicin is released locally, only at the site of the tumor.

Happily, we needn’t merely imagine this. In June 2025, the company Shasqi published the results of a first-in-human Phase 1 clinical trial in Clinical Cancer Research, the first time bioorthogonal chemistry had ever been used therapeutically inside a human being. Patients with advanced solid tumors received up to fifteen-fold the conventional doxorubicin dose as prodrug, and no dose-limiting toxicities were reported.

Given the theory we’ve established here, you may imagine that this could be the cure to cancer. Chemotherapy works extremely well, and the only reason it can’t work even better is because it also works very well at killing healthy cells. So, if bioorthogonal chemistry chemotherapy offers us a way to nearly-perfectly prevent off-target effects, can’t we just fill someone up to the brim with it and call it a day?

Unfortunately no, at least according to the trial results. Fifteen-fold the dose does not mean fifteen-fold the active drug at the tumor. At a certain point, the biopolymer tetrazine’s ability to capture circulating prodrug saturates, and (TCO x doxorubicin) that doesn’t react will simply drift through the body inertly. And while the acute safety data looked clean, doxorubicin cardiotoxicity is notorious for appearing years after treatment (in one case, 17 years later!), so the long-term picture remains unknown. And most importantly, no objective tumor responses were observed, only stable disease rates that are not out of line with typical levels. Now, fairly, this trial was on a heavily pretreated, refractory population, so while stable disease is not meaningless, it is a long way from a cure.

But there is a proof of concept here. And you could imagine that an easy way to improve is, instead of having a human-legible injection site for a tetrazine polymer to sit at, you could attach the tetrazine to something that finds the tumor on its own—which may vastly increase the ‘surface area’ for the TCO x chemotherapy conjugate to react with.

And this is exactly what the other company in the bioorthogonal chemistry space, Tagworks Pharmaceuticals, does. The Tagworks thesis is simple: conjugate TCO to an antibody meant to bind to tumor markers, inject it systemically, allow it to park itself within a tumor. Then, some time later, administer a tetrazine trigger intravenously, which reacts with the TCO on the tumor-bound antibody and releases the payload right there in the microenvironment. Their lead program, TGW101, targets TAG-72—a marker on solid tumor cells—and entered a Phase 1 dose-escalation trial in 2025.

But we’re getting trapped in the cancer bubble again. Where else can bioorthogonal chemistry theoretically be used?

Similarly to MagBodies: anywhere you want geographic or temporal precision in drug activity, which is essentially everywhere. I’ll leave an exact definition as an exercise for the reader, but one hint is for immunosuppression in autoimmune conditions. Say, joint pain? Could a tetrazine scaffold be injected into a synovial cavity, and a TCO bearing the immunosuppressant be administered systemically? Of course, who knows whether this would have any advantages over standard of care, but it’s a fun idea!

The future

We began this essay with a complaint: that biology is uniquely claustrophobic amongst the sciences, that the floor rushes up to meet you, that the barriers are not physical law but evolutionary happenstance. All of this is true. But there is also an upside here. The very thing that makes biology so frustrating to work with is also what makes it so astonishingly extensible. The sections above are, I think, the very earliest results of what happens when clever people notice this.

What feels particularly interesting about the modalities discussed here is that none of them feel like they could have emerged from within a single field. To pull off something like bioorthogonal-chemistry-for-tumors, you would’ve needed a physical chemist’s knowledge of click chemistry, a medicinal chemist to design the prodrug linkage, and an oncologist to understand where such an innovation is best deployed. The low-hanging fruit of simple binders has been largely picked, and the drug discovery field at large has been increasingly eyeing stranger modalities—PROTACs, allosteric modulators, both of which are requiring rethinking of what a “drug” even looks like. And the modalities in this essay are stranger still.

The discovery of more like these may depend less on searching known chemical space faster and more on the kind of lateral, cross-domain synthesis that has historically been bottlenecked by the simple fact that very few people are simultaneously deep in so many fields at once. I don’t want to plug LLMs into an article where there’s really no need to do it, but it’s tough to not think of the potential here. This interdisciplinary bottleneck is loosening fast! It is exciting to think about what else is on the horizon.

Owl Posting turns two

Abhishaike Mahajan — Sat, 21 Mar 2026 22:52:44 GMT

Owl Posting turned two this month. It’s difficult to ascribe any emotion to this whole endeavor other than love. You would think it’d get old at this point, but it hasn’t. Every single article still feels like I am twelve years old, staring up at a great big endless blue sky, and thinking that it’s going to swallow me up any second. Like a child, I believe in those moments that nobody has ever felt what I feel. Each time I hit ‘publish’ feels like love too, but the horrible kind, like I am watching someone die, their organs falling out of their chest as their eyes twinkle, telling me that they really enjoyed spending this time with me. I turn away, weeping, as their light fades. And then another creature walks through the door, with such an interesting energy to them, and I fall in love all over again, and the whole cycle repeats.

This all seems quite melodramatic. Perhaps some things are that serious, but certainly not running a blog. But unfortunately, you don’t get much say in what you end up loving. You can spend your whole life avoiding the subject, darting your eyes towards it as the days, months, years go by, never breaking down, never shamefully admitting what you actually desire most. Some people spend their whole lives in this pattern. Can you imagine? Of course you can! Because it is not just ‘some’ people, it is all of us. Nobody is free from the tragedy of self-denial. In exchange for this cruelty, life does sometimes throw us a bone: a chance to embrace the things that you love in a manner that is entirely inconsequential, utterly cost-free in every respect other than having the self-awareness necessary to reach out and grab it.

I have experienced this once, and it is writing.

This year, I wrote 24 essays, totaling ~108,000 words. I also filmed 8 podcasts, which I’ll discuss near the end.

Some of the written work, I admit, was not very good. For example, ‘Drugs currently in clinical trials will likely not be impacted by AI’ suffers from the sin of being boring in the worst possible way—obvious in the non-surprising parts, and likely wrong in the surprising parts. I suspect this is due to the fact that I wrote that article while unemployed, and thus exuberantly happy. Happiness is not strictly bad for writing, but it does not help things. What is strictly necessary for writing is pressure, an angry cloud hovering above your head wiggling its damp finger in your ear. Having a job with its own demands is a good way to create pressure, and barring that, it is up to you and you alone to supply that.

Luckily, I internalized this after a few weeks of unemployment, and shortly thereafter produced what remains, to this day, the most popular article I have ever written: ‘Endometriosis is an incredibly interesting disease’.

The endometriosis piece is interesting, because I did not at all expect it to go anywhere. In fact, I distinctly remember conferring with Claude—Opus 3 if I remember correctly—the night before its publication as to whether the title of the essay would be considered offensive. Yes, the model screamed, it is offensive, nobody wants their disease to be viewed as ‘interesting’. It begged me to change it, that I’d be hung from the rafters otherwise. In fact, the model mused, I may even kill you myself. In a decision that surprised even me, I stuck to my guns. And the essay ended up being a breakout success in a very classic sense, which is not something I ever expected to happen for the type of writing I do. It also indirectly led to the person who inspired the article in the first place to be recruited to work on endometriosis research at a great startup, which is probably the most positive side effect of anything I have ever done via writing.

Moving on: I only did two ‘Startup’ posts this year, which is 50% lower than I did in the previous year. The first was over EvE Bio (a non-profit) and the other was over Leash Bio—both of whom are wonderful. I’d really like to do more of these, but this type of writing is hard. You are basically acting like a comms employee, but with zero internal visibility into anything at all, which means you have to make a bunch of predictions that are almost certainly wildly incorrect. So why did I do these at all? EvE is just one of those ‘it’s boring, but so useful for humanity’ missions that felt irresponsible not to cover. And Leash has interesting science, yes, but the culture itself felt even more interesting, and how often do you find bio-ML companies with a culture worth talking about?

What is especially fun about these two is that they seem quite boring to a general audience—unlike the endometriosis one, which I consider genuinely interesting to laymen—and yet both were decently popular across social media. This is strange, and permanently updated my understanding of what could be considered ‘popular science’. I don’t think I talked down to anyone at all—the EvE essay has phrases like ‘Tango β-arrestin recruitment assays for 7TMs/GPCRs’—and yet non-biologists seemingly enjoyed it.

What else? Well, I joined a great bio-ML startup, Noetik, the reasons for which I detail here, and for the first time, I was constantly around a breed of individual I had never interacted with before: cancer biologists. I consider cancer experts a minor deity in the cosmic pantheon, all of whom are capable of providing an essentially infinite amount of information about one of the most complex diseases that afflict humanity. As a result, I am a big fan of them. And as I gobbled up information from these folks, I slowly became comfortable with the idea of putting together an essay over cancer.

Cancer is difficult to write about. There is a universe of popular essays and books that already exist on the subject, and I imagined a potential reader would roll their eyes if they observed that I am relying on some cliche. After many weeks of deep thought, I came across what I felt was a relatively unique angle: there will never be another Keytruda. At least, not in the sense of ‘a cancer drug that works extremely well across many cancer subtypes’—we have almost certainly discovered them all. What there will be are many cancer drugs that work for very specific patient populations. Thus, the job of the oncology field should be to discover these very specific patient populations, and what drug works best for specifically them. This led to ‘Cancer has a surprising amount of detail’.

It’s a very clean essay, and I am lucky to have it be part of the Works in Progress Issue 23 book (you should pick one up!). Not too many afterthoughts on it other than that it is one of the only pieces of mine where it required long stretches of deep thought to figure out the tempo. Writing is not always like research, but when it is, it is really like research.

Most of the other things I published this year were brought about via being nerd-sniped by a third party. The big upside of running a technical blog is that random folks will come up to you and patiently explain the insane intricacies of their field, and at that point, they’ve already kind of handed you the essay. It feels a bit lazy to not staple together the interesting things they told you—along with interviews with others to flesh the piece out—and put it out there. Four essays this year could be attributed to a particular person, whose passion for the subject was so infectious that it grabbed me too.

The first was, ‘RNA structure prediction is hard. How much does that matter?’ which was prompted by Connor Stephens, who told me about the difficulty of RNA modeling at an event I co-ran while visiting SF. The second was ‘Questions to ask when evaluating neurotech approaches’, which was prompted by Milan Cvitkovic, whom I met multiple times throughout 2024 and 2025, growing increasingly shocked at how much all-encompassing knowledge he had over the neurotech field. The third was ‘Heuristics for lab robotics, and where its future may go’, which was prompted by Michelle Lee, after visiting the company she founded and seeing, for the first time in my life, robotic arms performing wet-lab tasks. And finally, the fourth was ‘Reasons to be pessimistic (and optimistic) on the future of biosecurity’, which could be partially attributed to multiple people, but whose core orchestrator was Jacob Trefethen.

If I were forced to pick favorites amongst this year’s essays, I would point at these four, all of which required an order of magnitude more ‘worldview-expanding’ than any of the pieces I published during the first year of writing.

They were painful to stitch together. The RNA one required an immense amount of editing to deal with the fact that I was just completely incorrect in my first draft, the neurotech one languished in my drafts for months because I couldn’t figure out how to divide the sections, and both the lab robotics and biosecurity ones required so many interviews with domain experts (~12 and ~16 respectively!) before I felt confident enough in my perspective to put something out there. Maybe in an ideal world, I would only write pieces like these, since they are both very fun to create and probably the most counterfactually valuable thing I could produce, as most everyone else who could create something similar has better things to do. Unfortunately, making these requires an insane amount of time, is especially stressful, and is mostly impossible to consistently do without writing full-time. But fun to do in sprints!

However, favorites exist in many dimensions. While the aforementioned four were my favorite in the ‘jeez, I can’t believe I managed to do that” axis, there is one more that I’m a big fan of: the ‘saying something I’ve wanted to say for years’ axis. And the essay that scored best there was ‘Ask not why would you work in biology, but rather: why wouldn’t you?'.

This is, I think, the only thing I’ve ever published that is blatant emotional manipulation. I get why so many writers make that their whole shtick. It really is genuinely fun to cradle your reader’s head between your palms, and force them to stare at your worst fears and anxieties, trying to ignite those same fears and anxieties within their hearts. You feel powerful. You feel in control. I also get why so many of the same writers who constantly do this have deeply antisocial tendencies. There is something a little horrible, even soulless, about creating stuff like this. Some people may be able to do it forever, but I could not.

This all said: I believe every word I wrote in it. It echoes core beliefs I’ve had since I was twenty-three, and at no point in the piece do I veer off into territory that I don’t ponder at least once a week. I think about painful medical procedures often. I think about dementia often. I think about the fragility of my flesh often. When I was twenty-three, I was very upset at all this, and hated those around me who did not see what I saw: the writing on the wall for what awaits them, me, everyone. Our brains will turn to soup, our eyes will whiten, our fingers will tremble until they can barely hold a spoon. It felt so obvious to me that all wars, all petty conflicts, all of this useless bickering should be ceased until we figure out the solution here! The entirety of not only the US’s treasury, but every nation’s treasury, should be funneled into this effort. What else could possibly matter?

At twenty-eight, I still believe all this, but I have less anger about it all after funneling those anxieties into my own writing and work. Clean your own room first, you know? Still, it was a relief to get all this out of my skull and onto words on a page, and it is the only article of mine that I re-read every now and then.

This year has also involved some experimentation outside of purely technical writing. ‘A vibe check on the San Francisco biotech scene’ discusses the seeming disappearance of optimism amongst life-sciences founders and employees in the Bay Area; a location I have grown increasingly enamoured with. ‘Human art in a post-AI world should be strange’ is more for myself than for others, and was brought about by seeing how good the modern LLMs are at writing. Will this blog disappear soon after Opus 4.7? Maybe! It does increasingly feel as if the last remaining bit of alpha I have as a writer is being able to organize a piece rather than actually being able to write it. Luckily for me, all the LLMs seem to be quite bad at that. Unluckily for me, the LLMs seem to keep getting better at the things they are bad at.

On a similar note, this year had more fiction. ‘A compilation of eleven stories’ is exactly what it sounds like. There’s also an unemployment-era piece, ‘A body most amenable to experimentation’ and, no surprises, it was not very good and I kind of regret releasing it. The core theme is interesting (what if all in-vivo experimentation were done on a single creature?), but I think I really bungled the execution. Happily, I learned a few lessons and went on to have two fiction pieces in 2026 that I really liked: ‘The origin of rot’ and ‘The truth behind the 2026 J.P. Morgan Healthcare Conference’, both of which are long-form investigative journalism articles into something entirely fake. In fact, the J.P. Morgan one is the second-most-read thing I’ve published. I’d like to do more of these in the future—I have a lot of fiction in the drafts—but these pieces require a fair bit more courage than the technical pieces to send out, and so I end up spending a lot more time editing them than my usual articles.

That covers the writing. How about podcasts?

As mentioned, I filmed 8 this year, which is four times more than the prior year! And with view-counts that are somewhat respectable given that the TAM for this sort of content is almost certainly quite small.

Overall, the episodes were wonderful. I achieved my life dream of having Sergey Ovchinnikov and Ellen Zhong on, both of whose episodes were unsurprisingly the most-watched of this year. I kind of wanted to stop after that, but I kept stumbling across people with such interesting research directions that I kept going. I really enjoyed all of them, but given that Hunter’s episode is the least-watched despite being incredibly cool (did you know that organ transplant companies have a suite of private jets to ferry organs around?), I’m going to plug specifically his here and recommend you watch it:

This all said: I’m still unsure whether to keep doing these. On one hand, they are fun to make and are genuinely informative for the few people who watch them. In the case of interviewing founders, the podcasts also seem to be reasonably useful for making both potential employees and investors aware that they exist. On the other hand, they are a huge pain to edit, and my attempts to outsource have not really resulted in a smaller workload. It also costs a fair bit to rent a studio for these, and consistent sponsors have not yet emerged. As it stands, the podcast is in this weird middle-ground where they clearly have some audience of people worth advertising to, but most potential sponsors (e.g. CRO’s) either aren’t used to sponsoring podcasts and, if they are, would rather spend the money on, say, Luke Timmerman’s much larger audience base. To be clear: this is completely fair, I would do the same if I were in their position.

I’ve managed to stay in the black thanks to individual philanthropic gestures, which I’m deeply grateful for, but I’d ideally prefer not eating into people’s wallets without giving them something in return. I have ~2 more planned, but I may take a hiatus after that. Not the biggest loss in the world; my writing is niche, and podcast viewers are a sub-niche of that niche.

And that’s that for year two.

To end this off: while putting this together, I re-read my first anniversary post. It’s surprising how much things stay the same, but it is also surprising how much more I enjoy my writing now compared to articles from the first year. Some people really, really hate my style and I realize it isn’t for everyone, but I personally like it. I think I’m inching closer to whatever my own internal monologue is. And perhaps—though I’m unsure whether I actually believe this—having a polarizing writing style is all one can hope for, as the only alternative is a writing style that nobody reads at all.

What’s next? I have a few topics in the pipeline, but I’m taking a week off; the last few articles took up a lot of brainpower. But there’s a lot going on in the world right now that’s worth discussing: the bottlenecks to better cancer vaccines, protein models that can generate binders to intrinsically-disordered-proteins, in-vivo CAR-T getting closer to market, the differing strategies of liquid biopsy companies, strange therapeutic modalities, and a lot more. One thing that has changed in the last year is that I no longer feel worried about running out of topics. There’s just so, so much to cover.

Reasons to be pessimistic (and optimistic) on the future of biosecurity

Abhishaike Mahajan — Mon, 16 Mar 2026 15:25:15 GMT

Note: this essay required conversations with a lot of people. I’d like to thank Patrick Boyle (ex-CSO of Ginkgo Bioworks), Harmon Bhasin (founder of a stealth biosecurity startup), Bryan Lehrer (ex-Blueprint Biosecurity), Theia Vogel (ex-SecureDNA), Jacob Swett (founder of Blueprint Biosecurity), Matt Watson (ex-MITRE), Janika Schmitt (Program Officer at Sentinel Bio), Harshu Musunuri (PhD student at UCSF), Liyam Chitayat (PhD student at MIT), Jake Adler (founder of Pilgrim Labs), Dianzhuo (John) Wang (PhD student at Harvard), Jassi Pannu (Assistant Professor at Johns Hopkins), Charlie Petty (many biosecurity-related positions), Nish Bhat (VC at Carbon Silicon Ventures), Sarah Carter (Senior Advisor at Federation of American Scientists), and James Black (Scholar at Johns Hopkins) for speaking with me. All opinions in this essay are my own.

Second note: This essay is very long. While it can be read from top-to-bottom—and is written assuming you will—you will lose little by simply choosing specific sections you find interesting and reading only those.

Introduction

It is easy to scare yourself about biosecurity, and it is getting easier every day. Everyone has their moment when the fear first crept into their throat. Mine was when I read the article titled ‘AIs can provide expert-level virology assistance’, which found that LLMs—even ones as ancient as Gemini 1.5 Pro—are more than capable of happily providing the knowledge needed to debug BSL-4-sounding questions about wet-lab experiments.

As with any paranoia worth having, there are good objections to it.

Most recently, the non-profit Active Site published the largest randomized control trial of its kind—153 novices, 8 weeks, a BSL-2 lab in Cambridge—studying how much access to frontier LLMs (Opus 4, o3, Gemini 2.5, all with safety classifiers off) gave participants ‘uplift’ on performing a set of viral genetics workflow (including virus production), compared to only access to the internet. Their conclusions are the following: ‘We observed no significant difference in the primary endpoint of workflow completion (5.2% LLM vs. 6.6% Internet; P = 0.759), nor in the success rate of individual tasks’. with the caveat that the LLM has numerically higher success rates on 4 out of 5 tasks, just not high enough to reach significance level. YouTube, not the LLMs, was rated most helpful by both groups.

So, while frontier models are theoretically capable of providing virology assistance, it doesn’t immediately seem like they can bootstrap someone into wet-lab competence; the hands are still the hard part. There are counterpoints to this as well, like, ‘LLMs probably help non-novices a lot!’, and ‘the study is underpowered!’ and so on. I agree with some of these. The truth is almost certainly somewhere in the middle: LLMs can help a novice with wet-lab work, but they don’t help an infinite amount.

Yet, I still believe it is still hard to actually turn all this into something evil. And no, I do not think that gesturing towards ‘automated labs’ is a good counter argument. Doing things in the world of atoms is difficult. Especially here. Why? Didn’t I just write a month back about how cloud labs are the final end-state of lab automation plays, so can’t they be hacked into doing something ulterior? Man, maybe. But you should consider the fact that these cloud labs are, at the moment, barely functional enough to do the things their paying customers want them to do, let alone serve as unwitting accomplices in a bioterror plot. Yes, they will improve, but their improvement is on a very jagged frontier. Liquid handler automation is going splendid. Liter-scale creation, purification, and aerosolization of BSL-4 substances automation is not going so splendidly. Also, even in the case where automation suddenly rapidly accelerates, it is almost certainly economically not viable for these labs to care about servicing the likely small consumer market of ‘large-scale non-therapeutic virus creation’.

I’ll discuss this more deeply in the upcoming sections, but it feels that doing something as ambitious as bioweapon creation will likely be extremely annoying to do for the foreseeable future, and I am consistently on the side that only a well-funded actor would be capable of such a thing. And why wouldn’t those actors opt for much simpler acts of violence that would roughly accomplish the same thing?

This all said: I sympathize with the bioterrorism-phobia that is sweeping my simcluster. If you stare for long enough at the AI trendlines, and also observe the increasingly WW3-y vibes that the world is emanating, it is difficult to not feel at least some worry. Maybe a genuine bioterrorism incident is not too far away. And maybe it will be far, far worse than anything can imagine.

Or maybe not. Biosecurity is one of those topics that can either feel extraordinarily bleak in its prognosis, or like things are obviously going to be fine. As with many things in the world, I think both sides have a bit of a point, and I think holding them in tension is the only honest way to consider how the future may go. In this essay, I’ll share some of my own thoughts on the field at large, and the specific themes that arose in my discussions with people.

Some thoughts

The business case for biosecurity requires another pandemic for it to work

As with all problems that, if not solved, may lead to the depopulation of the planet, we can depend on venture capitalists to search for a market opportunity. A few companies have emerged in the last few months as the vanguard of this effort: Valthos ($30M Series A for being the Palantir of biosecurity), Red Queen Bio ($15M seed for designing therapeutics against bioterrorism threats), and Aclid ($4M seed for DNA synthesis screening infrastructure). There are others too, but we’ll stick with these ones for now for illustration purposes.

I have zero doubt that these companies, or something akin to them, are worth having around. What I cannot quite figure out is the business model. The usual answer for the ‘who pays for this?’ questions in these sorts of public-goods-situations are government agencies: BARDA, DoD, DHS, CDC and so on. This is not so bad of an idea.

Let’s take a look at the United States’ 2026 budget proposal for the biodefense-adjacent areas to get a sense of these agencies’ funding.

In the proposal, BARDA is being cut by $361 million, a roughly 36% decrease from its prior state. Project BioShield, the program that actually buys finished countermeasures, is on track to lose $100 million. The CDC budget is halved, coming at around a $5.4 billion loss. DTRA down $150 million. One article more deeply analyzing the many, many other various biodefense cuts being made had this to say about it:

With the Trump Administration’s priority of reducing federal spending, the funds requested for biodefense have been significantly reduced. Very few biodefense programs saw increases in their funding or even a continuation at their previous funding levels.

How about the Department of Defense (War)? Mixed picture: while the overall budget of the department was increased, the bio-adjacent programs within it saw a drop.

One notable example is the Defense Threat Reduction Agency (DTRA), a key government agency that prevents and mitigates deliberate biological threats to the US globally, for which the PBR requests $708 million. This is $150 million less than the $858 million requested in FY25. Similarly, the $1.61 billion FY26 request for the DoD Chemical and Biological Defense Program is $46 million less than requested for FY25.

In other words: the agencies that would theoretically buy tools from, say, Valthos, are the same agencies that the current administration is intending to either gut or barely increase the budget of.

There is good news: this budget did not come to pass.

Congress rejected nearly every one of the proposals: the CDC’s budget was not reduced, while BARDA, Project BioShield, and NIH’s budget actually slightly increased. There is one unfortunate budget stain—Kennedy pulling $500 million from a BARDA program developing mRNA vaccines against various respiratory viruses—but things overall turned out fine, though I cannot find specific numbers on how things fared on the DoD end. But it is a little worrying that the administration is not particularly sympathetic to biosecurity concerns. Why? Because if your primary customer is prone to wild swings of financial unpredictability, and it is only thanks to the grace of Congress that those sentiments are not actively reflected in their budget, it almost certainly hurts the capacity for these companies to plan for the future.

Earlier I mentioned that Valthos intends to be the Palantir for biosecurity. This is not a presumption on my end, they have basically said this. The CEO (Kathleen McMahon) is an alum of the company, and has stated that Valthos plans to apply “many of the same principles she learned at Palantir, about working with officials as well as commercial customers”. But an easy counterargument to this is that Palantir’s government business was built during the post-9/11 spending surge, when homeland security funding went from $16 billion to $69 billion. Biodefense is holding steady, for now, but not seeing the same dramatic jumps.

You could imagine that a pretty simple steelman for these objections is not dissimilar to the usual AI-wrapper-SaaS advice people give: build not for where the models are today, but where they are going. And if you trust the trend-lines, it is not inconceivable that there is a catalyzing event in our near future—a genuine, bona-fide bioterror incident—which will unlock massive government spending the way 9/11 created the entire homeland security industry overnight. In this setting, the companies that already have working products and government relationships when that moment arrives will be the Palantir of biosecurity. The ones that don’t will be too late.

The game then, is to survive until this catalyzing event occurs. If it does, Valthos may be able to gobble up all the government contracts it wants, Red Queen Bio may find the DoD suddenly desperate to fund therapeutics platforms that have a biosecurity veneer to them, and Aclid may discover that its few dozen synthesis company customers grow to have even stricter compliance requirements. If it doesn’t, it is tough to imagine that these companies don’t go either bankrupt or stay growth-capped. Because of this, you shouldn’t be surprised at all that these companies acquired the funding that they did! The game of venture capital is to play ‘big if true’ bets, continuously, forever, and few areas are as well-shaped to that as biosecurity.

Well, maybe. You could argue that the SARS-CoV-2 virus maybe couldn’t be the catalyzing incident for the government, since it is still unclear whether it was a lab-leak or not, but what about the 2001 anthrax attacks? How come that didn’t spur a massive amount of increased federal biodefense funding? In fact, it did. Total US biodefense funding jumped from roughly $700 million in 2001 to about $4 billion in 2002, peaking at nearly $8 billion in 2005. What was the money used on? A fairly large chunk was put into anthrax-specific [stuff]. As a case study, consider Emergent BioSolutions, the sole producer behind ‘BioThrax’, the only FDA-approved anthrax vaccine. They received: one $1.6B contract for a second-generation anthrax vaccine, one $1.25B five-year contract for delivering 44.75 million doses of an older vaccine candidate, followed by a $911 million CDC contract for another 29.4 million doses. A 2021 Congressional investigation found that, for the past decade, nearly half of the Strategic National Stockpile’s budget had gone to purchasing this anthrax vaccine, a product whose price had been raised 800% since 1998. And it is still ongoing, with a $21.5 million delivery order to the Department of War was issued as recently as January 2026.

This is, on some level, completely understandable. You prepare for the thing that just happened to you. But it should make us a little nervous about the “catalyzing event” thesis for biosecurity companies, because the empirical reality is that it may not unlock general biodefense spending so much as it locks in countermeasures that are overly anchored on the specific threat and threat vector of that particular incident.

So, perhaps it is worth exploring outside of this customer base. While governments are the biggest buyer, they surely are not the only one. After all, didn’t Kathleen’s comment mention commercial buyers too? There is another group on the table: DNA synthesis companies. A fairly high fraction of the current biosecurity framework rests on a pretty simple idea: that biological information must pass through synthesis companies to become biological reality. To actually make a [thing], you need physical DNA, and to get physical DNA, you order it from a commercial provider. Why not create a layer to screen the DNA being ordered, ensuring that whatever it is, it isn’t dangerous? This is, as previously mentioned, Aclid’s business model, alongside TwentyTwo, SecureDNA (a non-profit), and likely others.

How is that going?

The preventative architecture assumes a chokepoint that’s disappearing

There seem to be three problems with DNA-screening-as-biosecurity.

The first of which is that the screening only works if you’re ordering sequences long enough to screen. According to HHS guidelines, the current screening threshold is 50 nucleotides, but oligonucleotides—short DNA fragments often used in legitimate research—can be ordered, assembled, and stitched together into longer sequences. This is not theoretical. In 2018, Canadian researchers synthesized a functional horsepox virus from mail-ordered DNA fragments for about $100,000. Fairly, this is annoying to do, but a sufficiently dedicated adversary may be happy to do annoying things.

The second is that screening assumes you’re looking for known threats, which is to say, sequences with similarities to characterized pathogens. But if AI biological design tools might enable the creation of de novo pathogens, things that don’t have a match in any database because they’ve never existed before, then the screening becomes useless. And you needn’t even hop your way to truly de novo stuff, you could just redesign the existing bad pathogens in ways that make them invisible to screening tools. For example, Microsoft has a “paraphrasing” paper that was exactly this, redesigning known, toxic proteins in ways that evade sequence-based screening while preserving function. To counter this, you’d need to predict function from sequence alone, which is one of the hardest open problems in the field, especially because ‘function’ in biology is one of those super fuzzy, contextual words that can have a bunch of different meanings. It is certainly possible to do—see the podcast I did with Yunha Hwang, an MIT professor creating tools to automatically annotate the function of metagenomes—but it’s not easy.

The third problem is the biggest, and it is that benchtop DNA synthesizers are getting longer-range. In other words, you could neatly side-step all these screening checks by buying your own DNA-creation machine, and running synthesis in your bedroom. Right now, the best commercially available benchtop synthesizers tops out at about 120 base pairs per well, which, given that real viruses are on the order of dozens of kilobases, means we’re safe for now. But there is no functional reason that they cannot get any better. In fact, according to a fantastic Institute for Progress (IFP) report, it’s just around the corner. Enzymatic (as opposed to chemical) DNA synthesis is likely less than a decade off, comfortably pushing DNA synthesis capabilities to the kilobase realm. This all said: a few people I talked to mentioned that ‘long-range DNA synthesis has been a few years away for a decade-plus now’, so maybe we can discount this a little, but it’s worth paying attention to. Especially because, as we mentioned earlier, a DNA synthesizer needn’t be capable of full viral genome synthesis to be dangerous, since you can simply splice its outputs together.

This is all quite a pickle.

Yes, you could lock down the benchtop synthesizers, such that any attempt to use them would involve making an external call to some pathogen database to screen your request. But if the ML design tools get good enough, you can just do continuous zero-shot designs of something that doesn’t match anything in the database, and iterate from there. And even if the models don’t get good at that sort of in-vivo prediction behavior—which, despite what you may hear, is a genuine possibility for at least some time—you could simply split your order across multiple machines, synthesizing fragments that are each too short to trigger any screening individually, but that assemble into something very much on a select agent list once stitched together.

This last point is also called a split-order attack. The IFP report discusses this last point as well, and is refreshingly blunt about the prognosis.

Moreover, an offline device is vulnerable to the whole class of split-order attacks, whereby the adversary can combine the outputs of two or more devices that are small enough to evade screening in isolation, but together would be recognized. Without some centralized connectivity, such an attack is impossible to defend against.

Are we doomed?

Maybe. The optimistic angle is that the government can be awfully good at shutting things down when it wants to, and the track record in other domains is quite encouraging. When the Combat Methamphetamine Epidemic Act passed in 2005, putting pseudoephedrine behind the counter and requiring ID and purchase logs, domestic meth lab incidents dropped by over 65% within two years. Nuclear materials are an even stronger case: the NRC administers over 20,000 active licenses for radioactive materials in the US alone, coordinated across 40 states and backed by the international IAEA safeguards regime. This has almost certainly contributed to the fact that there has not been a single case of nuclear terrorism. When the government decides something absolutely cannot be allowed to proliferate, and builds the institutional machinery to back that up, it can, against all odds, work.

But the fundamental problem here is that preventing bioterrorism requires a level of governmental diligence that is closer to nuclear-level than meth-level, and right now it is far behind both. To be fair, there are clear structural differences between biology and nuclear/meth, the biggest one being that biology is much more dual-use. Benchtop synthesizers have far, far more legitimate uses than malevolent ones, and the upside of restricting them is a lot harder to argue for then, say, restricting access to pseudoephedrine.

Well, what should be done? The IFP proposal, to its credit, has some pretty clear demands: a mandatory Biosecurity Readiness Certification before any benchtop synthesizer can be legally sold, standardized customer screening for both devices and reagents, and a reagent track-and-trace system modeled on the Drug Supply Chain Security Act for pharmaceuticals. None of this is crazy, and rhymes with what has already been done for meth and nuclear material.

What is actually being done? Unfortunately for all of us, every federal document governing DNA synthesis security in the United States right now is (somewhat) voluntary, though there is a nuance here we’ll get to in a bit. The only binding rules are export controls, which have, circa 2026, already been violated. The IFP essay from earlier happily reports that Telesis disclosed in their SEC filings that their DNA assembly systems have accidentally ended up in embargoed countries through distributors.

Oops! Does uranium ever accidentally end up in embargoed countries?

Well, actually, yes. The IAEA has logged over 4,300 incidents of nuclear material outside regulatory control since 1993, 353 of which involved trafficking or malicious use, 13 that involved high enriched uranium, and 2 that involved plutonium. But importantly, the last time someone got their hands on kilogram quantities of weapons-usable material was 1994. The system leaks at the margins, but it doesn’t leak at the catastrophic level.

The security model that you’ll continuously hear repeated among biosecurity experts is the ‘swiss-cheese’ model, in which the purpose of the regulatory apparatus is to present enough overlapping layers of defense such that no actors, other than the absolute most determined, are willing to go through the trouble. The defenses against nuclear and meth are swiss-cheese-y, and the ideal solution for DNA screening will likely be similar. Possible to defeat, but difficult, annoying, and legally scary to attempt.

And at least one layer of cheese is present: I mentioned that screening is largely voluntary on the part of the synthesis companies, but there is an important caveat. It is required for federally funded entities to purchase synthetic nucleic acids only from providers or manufacturers that adhere to the US Framework for Nucleic Acid Synthesis Screening. In other words, if a DNA synthesis company wants to sell to the enormous market of federally funded researchers (most of the U.S. life sciences market), they effectively must implement screening.

Well…kinda. This particular screening requirement was the intended purpose of one piece of legislation that was passed in 2024, but the current administration issued an executive order in 2025 to replace it with something [better] within 90 days. These 90 days have come and gone, and there is yet for anything to pass to mandate it again. This said, the biggest DNA synthesis providers (Twist and the like) see the writing on the wall, and have already implemented the screening that they imagine will be required of them, but it is unlikely smaller DNA synthesis providers have. Circa February 2026, there is a bill going through the Senate to address this current regulatory gap.

But what about all the problems from before? Split-order screening, AI-assisted genome redesign, and DNA benchtop synthesizers? Legally mandated screening is surely useless given those. We need more layers of cheese to defend against these!

Many smart people have thought about these challenges, and there are ways to solve them if you can get widespread buy-in from the synthesis providers. You could create centralized repositories of DNA orders that are aggregated from multiple providers, you could assemble private saturation mutagenesis viral datasets to catch most attempted redesigns from bad actors, and you can install hardware locks on benchtop synthesizers to prevent them from being used without connection to the aforementioned centralized repository.

None of this is scientific fiction! There are groups actively working on all of them, and some are even wrapped up in the Feb 2026 bill I just mentioned. But we’ll see how realistic they are to implement in practice.

Targeting humans with bioweapons is (probably) genuinely difficult

There is something under-appreciated worth discussing: making and spreading bioweapons is not easy. I mentioned this at the start, but there is a lot more color to add.

If you talk to biosecurity folks for long enough, they will eventually mention Aum Shinrikyo. Aum is a Japanese doomsday cult that, in the 1990s, had everything a would-be bioterrorist could ask for: hundreds of millions of dollars, a graduate-trained virologist who had studied at Kyoto University running their bioweapons program (Seiichi Endo), dedicated lab facilities, and years of total freedom from law enforcement scrutiny. They believed the end of the world was upon them, and that their mission was to hurry the whole thing along. On their journey to do exactly this, they attempted ten biological attacks.

Every single one failed. Their most ambitious effort was the 1993 anthrax attack on Kameido, Tokyo, where cult members sprayed a liquid suspension of Bacillus anthracis spores, or anthrax, from a cooling tower on the roof of their headquarters onto the streets below. Nothing happened. It turned out they’d acquired a vaccine strain of anthrax, one that is, to quote the CDC’s postmortem, “generally regarded as nonpathogenic for immunocompetent people.” Even if they’d had the right strain, the spore concentration in their slurry was about 10⁴ per milliliter, versus the 10⁹ to 10¹⁰ considered optimal for a liquid bioweapon. They had a botulinum toxin program too, in which they attempted multiple attacks using vans fitted with sprayers. Once again, zero effect. The toxin was likely either degraded during processing, too dilute to have any effect, or produced from a non-toxigenic strain because they couldn’t maintain proper anaerobic fermentation conditions. It is unclear as of today.

An account of the many difficulties the group faced in actually creating usable bioweapons is well-described in this 2011 report, which has some real comedic gems:

Mice on which the yellow liquid [Botulinum Neurotoxin] was tested showed no toxic effects, and one cult member reportedly slipped into a fermenting tank and nearly drowned, but subsequently showed no signs of illness.

The same report notes that even Aum’s manner of spreading their pathogens may have interfered with their efficacy:

In the even more unlikely event that Aum had produced and successfully stored volumes of a virulent strain, it is possible that poor dissemination capabilities might have damaged the material or failed to aerosolize it so that sufficient quantities could be inhaled.
For example, the cult employed a homemade nozzle that reportedly spouted rather than sprayed and dispersed material during the day, exposing it to UV radiation and thermal updrafts that would have reduced concentrations at ground level.

The group did finally end up partially succeeding, but only after switching to chemical weapons: sarin nerve gas, which ended up killing 13 people and injuring thousands on the Tokyo subway in 1995.

But, you may protest, the 1990’s was a long time ago. We have nanopores now. We have Alphafold3 now. We have a (somewhat) mature field of synthetic biology.

All very true, but consider what actually went wrong for Aum. They used the wrong strains, their fermentation got contaminated, their concentrations were off by five orders of magnitude, their aerosolization likely didn’t work, a guy fell into a fermenter and was fine. These were problems of bioprocess engineering, strain selection, maintaining sterile culture conditions, building dissemination devices that produce the right particle size, and overall wet-lab competence. Some of these are pure information problems, yes, and some of them are fixed by using easier-to-produce viruses (rather than bacteria), yes. But others are iterative, hands-on, tacit protocol development work. Of those, none would be aided by the current generation of structural biology models, and only some would be aided by LLMs given the Active Site results I mentioned at the start of this essay.

There are other case studies to consider too. Canonically, there are three other historical bioweapons programs of note: the Soviet Union’s in the 1970s, Iraq’s program under Saddam in the 1960s, and the US’s own Cold-War-era investigation into bioweapons in the 1960s. Unlike Aum, all three had one thing in common: they were state programs, with thousands of employees, dedicated production facilities, and decades of institutional knowledge.

How did these groups fare?

Iraq’s program, despite Saddam’s enthusiasm, produced anthrax and botulinum toxin of such inconsistent quality that US intelligence assessments after the Gulf War concluded the weapons would have been largely ineffective in most deployment scenarios.

The US program—which weaponized anthrax, botulinum toxin, tularemia, brucellosis, and Q fever—had a slightly different takeaway, but one that’s still directionally aligned with what we’ve discussed. After nearly three decades of doing comically dangerous acts like releasing simulant organisms in the San Francisco Bay Area and the New York subway to study how pathogens would move through civilian infrastructure, the conclusion wasn’t exactly that bioweapons didn’t work, it was that they were strategically irrelevant. At this point, the US already had a nuclear arsenal that can glass a continent in an afternoon, and the marginal value of a weapon that is unpredictable, uncontrollable, and might blow back on your own population became effectively zero. Nixon shut the program down in 1969, and there were few complaints against the decision.

Next, the Soviet program, also known as ‘Biopreparat’. It was the largest biological weapons program in human history, employing over 60,000 people at its peak, and spent years trying to weaponize smallpox and plague. And it worked. Some insane lines from a Frontiers article about the program attached here, bolding by me:

Some Biopreparat and military facilities continuously produced agents and filled the delivery systems kept on standby. For example, the Soviets annually made about two metric tons of antibiotic-resistant pneumonic plague and 20 tons of liquid smallpox grown in eggs. Refrigerated bunkers stored the bulk smallpox, which had a 6 to 12-month shelf life, and also contained filling lines for munitions and spray tanks.
….The Corpus One building of The State Scientific Center of Applied Microbiology at Obolensk contains 42-story tall fermenters, separated into different biosafety containment zones, to make plague and other agents.
Building 221 at The Scientific Experimental and Production Base at Stepnogorsk housed 10 four-story-high, 20,000-liter fermenters and could make 300 metric tons of anthrax in 10 months. Other production lines at Kurgan, Penza, and Sverdlovsk could add hundreds more tons to the USSR’s prodigious capability to make biowarfare agents and fill munitions on short notice.

Fortunately for us, the Soviet economy collapsed before this stockpile could be used for anything world-ending.

I think there are a few takeaways here. One—from the US’s experience—is that bioweapons are fundamentally not worth it if the end goal is to wag a very large stick towards your enemy. Two—from Aum’s and Iraq’s experience—is that bioweapons are genuinely hard to create and disperse, even with significant resources and time. And three—from the Soviets experience—is that if you throw enough of a country’s industrial base at the problem, the engineering/scientific barriers can be overcome, but the scale of effort required is immense.

These are, alongside Aum, four, isolated cases from decades back. How much could we learn from such an isolated slice of history? Should we really let our mental models be informed by this?

Unfortunately, it is the best we’ve got. We do know there are other ongoing bioweapons programs today. In an April 2024 compliance report released by the the U.S. Department of State, they state that North Korea and Russia are definitely running a bioweapon program, and it is possible that Iran and China are also. Should this freak us out? Maybe. On one hand, we should take seriously the US opinion that bioweapons kind of suck, and that there are easier ways to kill many people. On the other hand, the strategic value of bioweapons is not just in killing many people, but also in plausible deniability. Either way, whether these programs perform as intended in a real-world deployment scenario is a very different question, and one that neither the compliance report nor this essay is not positioned to answer.

Agricultural bioterrorism is (probably) really easy

Unfortunately, most of what I said earlier referred to pathogens meant to target humans. The calculus changes dramatically when your targets are cows or a wheat field, or so-called ‘agroterrorism’. This isn’t great news, especially because if you spend any time reading the biosecurity discourse, you will notice that relatively few people discuss this topic, and, of the folks who mention it, the word ‘overlooked’ pops up a worrying amount.

Over the next few paragraphs, I’ll try to give some intuition as to why agroterrorism is uniquely challenging to combat.

First, the actual design of the pathogen.

Unlike most of the other, nastier viruses and bacteria that cause humans to bleed from every orifice, many incredibly dangerous agricultural pathogens do not require BSL-3/4 equipment to safely create. As a result, the barrier to entry in agroterrorism is incredibly low. While the Soviet Union bioweapons program had to regularly deal with unfortunate cases of accidental Marburg, smallpox, and anthrax leaks—even while having BSL-3-ready labs!—a bad actor here can freely muck around with designing whatever they want with little threat. And if you’re feeling especially thrifty, you don’t even need a novel gain-of-function chimera. You need foot-and-mouth disease, which already exists in nature, is endemic in parts of Africa and Asia, and is one of the most contagious diseases known to veterinary medicine.

In fact, we know this because a former Soviet Union bioweapons producer—Kenneth Alibek—told us. In a 2006 report, he extensively discussed his work, with one paper having a particularly good paraphrasing:

Alibek describes the Soviets as producing anti-livestock, anti-crop, and combined anti-livestock/anti-personnel pathogens. During the course of its existence, the Soviet’s anti-agricultural bioweapons program produced and weaponized the anti-crop pathogens Wheat Rust, Rye Blast, and Rice Blast; the anti-livestock pathogens African Swine Fever, Rinderpest, and foot-and-mouth disease…
…The Soviets used simple, rudimentary techniques to develop these effective antiagriculture pathogens. They developed anti-crop fungal pathogens through a simple ground cultivation technique, while anti-livestock pathogens were developed in live animals…
All of these techniques, as Alibek points out, could easily be utilized by unsophisticated terrorist organizations to develop bioweapons designed to cause mass casualties of agriculture.

Next, distribution.

If you want to cause a human pandemic, you need aerosolization, you need to calculate incubation times, you need sophisticated delivery mechanisms. Agricultural pathogens require none of this. As one paper puts it, deploying plant or animal pathogens could be as simple as “atomizing unprocessed pathogen near the target organisms or, in the case of animals, directly applying the pathogen to the nose and mouth of the organisms.”. Why is it so easy? Is there something special about agricultural pathogens? No, but there is something special about how modern agriculture is done, in that it involves thousands of nearly-genetically-identical plants and animals in astonishingly dense conditions. The environment does the work. All this, with virtually zero risk to the adversary, given that this would not be done in crowded cities with cameras on every corner, but on sprawling, isolated farms that have essentially zero security infrastructure.

Finally detection.

Unlike human disease surveillance, which benefits from the fact that sick people tend to show up at hospitals and demand attention, cows and wheat do not. As a result, agricultural disease relies on a very error prone set of steps for its detection to ever occur: one, the farmer noticing something is wrong with their animals, two, the farmer reporting it to the government, and three, the authorities being dispatched.

We’re going to spend the next few paragraphs discussing these three steps, because each step is a point of failure, and they fail constantly.

First, the farmer notices something is wrong. This is hard. You have to realize the scale that modern agriculture operates at.

A single large-scale poultry operation can house 50,000 turkeys or hundreds of thousands of laying hens in a single building. A feedlot might hold 100,000 head of cattle. The average dairy herd in states like California or Idaho now exceeds a thousand cows. And the trend is accelerating: U.S. livestock production has been consolidating into fewer, much larger operations for decades, with the economics of scale constantly toward ever-increasing density. As a matter of example: an outbreak of H5N1 among cattle populations in the United States began in December 2023, . How long was the lag between initial infection and actual detection? According to a Science paper from April 2025, the virus circulated entirely undetected for over 4 months. Clinical signs—reduced milk production, decreased feed intake, and changes in milk quality—were first noticed by veterinarians in late January 2024. Only on March 25, 2024 was the virus confirmed to exist after genetic sampling of the cows milk. By that point, the virus had already reached 26 dairy cattle premises across eight states and six poultry premises in three states.

Let’s say the farmer eventually realizes that something is wrong. Now they need to report it to the correct authorities. But why would they? There is something extraordinarily perverse about the reporting incentives at play here: farmers are actively disincentivized from flagging unusual disease, because a confirmed outbreak of a notifiable disease may wipe out their entire livelihood. Remember: these pathogens are often so virulent, so adaptive, that mass culling of their herd will be what is demanded of them. So, if you’re a rancher staring at a few sick animals, the economically rational move is to wait and see if they get better, not to call a vet and risk having your entire herd destroyed. Once again, there is empirical proof here: how Indonesian farmers handled avian bird flu in 2006. A paragraph from a zoonotic disease book is instructive:

Those smallholder poultry keepers questioned the severity of the avian influenza threat to their birds….Some continued to consume and sell diseased dead birds. Small to medium-sized contract poultry farmers feared that government officials might cull their birds before definitive laboratory confirmation of the disease, and they were skeptical of compensation schemes or believed compensation was too low. These poultry farmers reported the deaths of chickens to contractors, who in turn sought the services of private veterinarians to determine the causes of bird death, making effective disease surveillance difficult. Smallholder poultry farmers and keepers feared reporting incidents directly to the government. This fear was not limited to a concern about losing their own birds, but also to the social risk of angering nearby neighbors, whose birds would be subject to culling within a 2–5 km radius of an outbreak location.

You may ask: in the case of animals, why can’t we just vaccinate them? You can! But export regulations prevent most farmers from doing so, because standard vaccines make it impossible to distinguish a vaccinated animal from an infected one. Vaccines that include marker proteins allowing serological tests to tell vaccinated animals apart from infected ones do exist, or so-called DIVA vaccines, but adoption has been glacial.

Finally, let’s say, against their better judgement, the farmer reports it. What happens then?

How the U.S. government actually responds to agricultural threats is theoretically fairly straightforward. Human pathogens fall under HHS, via the CDC. Agricultural pathogens fall under the USDA, via its Animal and Plant Health Inspection Service (APHIS). There is a select agent list for each, plus an overlap category for things that threaten both. The jurisdictional lines are reasonably clear. The problem with the agency technically in charge, the USDA, is that it is also the agency whose mission includes promoting the very industry it would need to disrupt in a crisis.

To understand this better, we can look at a fascinating Vanity Fair investigation that interviewed over 55 people across USDA, CDC, HHS, and the White House, all of whom were involved in the same H5N1 cattle outbreak we just discussed. Since the virus was first confirmed in 2024, the two organizations were barely aligned: the White House was planning a public-health-directed response, while the USDA was prioritizing the needs of the dairy industry.

Within weeks of the diagnosis, APHIS employees began calling state veterinarians from personal cell phones to confide that they had been instructed not to discuss, not to engage, and to discontinue even routine conversations with health officials in the field unless talking points were pre-approved. The USDA sat on genetic sequencing data for weeks, sharing samples an average of 24 days after collection—compared to 8 days for the CDC—and without basic metadata like the date or state of collection, rendering the data effectively useless for real-time monitoring. The same farmer incentive problem from before reared its ugly head too: dairy farmers simply opted not to test, and some forced veterinarians off their property. At least five veterinarians who had been outspoken in responding to the outbreak were fired from their jobs. By the time a Federal Order requiring pre-movement testing was issued, the virus had already spread across multiple states. And the testing regime was widely regarded as obviously insufficient: just 30 animals per herd, with farmers reportedly prescreening in private labs to cherry-pick healthy animals.

Because why not? Who was going to stop them?

This was a naturally occurring virus, both in viral origin and how it was spread. Yet, the federal response still took months to coalesce into something real.

And as much as you may think the APHIS bungled this, it is difficult to imagine their future responses will look much better. As of mid-2025, APHIS lost roughly 1,377 staff under the administration’s workforce reduction push, about 16% of its employees. The USDA also accidentally fired several employees working on the H5N1 response, and had to scramble to rescind those termination letters within days. Yes, it may be the case that the organization is bloated beyond a reasonable doubt, and the cuts were deserved. But the cuts have not been accompanied by any visible effort to fix the structural problem here: the fact that the USDA is simultaneously the regulator of and the lobbyist for the industry it oversees.

But there is an important question to ask. What is the ultimate impact of all this? What actually happens if a successful agroterrorism attack occurs? Because if it’s insignificant, just a rounding error, then none of this should be a concern.

It is not a rounding error. The 2001 foot-and-mouth (FMD) outbreak in the UK resulted in over 6 million animals culled, cost the public sector £3+ billion and the private sector £5+ billion, was severe enough to delay that year’s general election by a month, and lead to the dissolution of the Ministry of Agriculture entirely. Simulation models for the United States are even uglier. A study modeling FMD outbreak originating in a single California dairy farm found that median national agricultural losses ranged from $2.3 billion to $69.0 billion depending on detection delay, with every additional hour of delay at the 21-day mark costing roughly $565 million and another 2,000 animals to be slaughtered. What about a deliberate, state-actor attack? Another simulation model estimated the economic impact of a FMD agroterrorism scenario—vast, widespread dispersal of the pathogen—put possible losses between $37 billion and $228 billion across three scenarios, from a contained state-level outbreak to a large multi-state attack.

But there is at least some argument that, under some mental models, it actually is a rounding error. The United States’ agricultural GDP is roughly $1.4 trillion, while the overall GDP is $29 trillion. Even the worst-case FMD simulation represents about a 16% hit to agriculture, and a 1% hit to the broader US economy. This is not nothing, it may completely devastate the nation, but it also is not civilization-ending.

Yet, while agroterrorism perhaps isn’t a standard x-risk scenario, when evaluated against the “is this a serious national security threat“ standard, the answer feels like it is an obvious yes. This raises a rather important question. If everything I’ve said is true—and I’m pretty sure it is—why hasn’t there been a significant agroterrorism event…ever? I have no idea, and it too was a point of confusion among most those I talked to. The best argument I’ve heard is that, if the ultimate goal of bioterrorism is to either terrify a nation or outright end the world, neither the aesthetics nor net-effect of agroterrorism is well suited for either.

However, one person I talked to did say there has, in fact, been one case of minor agroterrorism they are aware of: in late 2019, drones controlled by gangs dropped [items] infected with African swine fever into commercial pig farms in China. Why were the gangs trying to spread swine fever? So that the farmer would be forced to sell their potentially infected meat cheaply to the gangs, who would then sell it on as healthy stock. This feels like a rather roundabout way to make money, but it happened. Moreover, it may be the case that stuff like this occurs far more than anyone realizes, since the whole racket was only discovered because Chinese farmers resorted to radio jammers to prevent the drones from flying near the farms, which ran afoul of the regional aviation authority.

The monitoring architecture is useful for detection, but not defense

The United States has two main systems for detecting biological threats in the environment: one that watches the air, and one that watches the sewage.

Let’s start with the air. BioWatch is a federal program to detect the release of pathogens into the air as part of a terrorist attack on major American cities, created in 2001 in response to the anthrax attacks. Here is how it works:

As currently deployed, BioWatch collectors draw air through filters that field technicians collect daily and transport to laboratories, where professional technicians analyze the material collected on the filter for evidence of biological threats [via PCR]. The entire collection and analysis process takes up to 36 hours to detect the presence of a potential pathogen of interest.
A positive result triggers what is known as a BioWatch Actionable Result (BAR), an indication that genetic material consistent with a target pathogen was present on a BioWatch filter. Upon declaration of a BAR, local, state, and federal officials then assess relevant information and determine the course of action to pursue.

Very cool, isn’t it? Here’s what one of the air filter boxes look like:

The problem with the system, and this is a big one, is that it has literally never once been useful. Never. Not once. Every single time a BAR has been announced, the subsequent investigation has concluded that it was either a false positive or an environmental anomaly indistinguishable from something dangerous. A Department of Homeland Security page has this helpful note about it:

Out of these more than 7 million tests, BioWatch has reported 149 instances in which naturally-occurring biological pathogens were detected from environmental sources. Many of the pathogens the BioWatch system is designed to detect occur naturally in the environment, such as the bacteria that causes anthrax, which has been used as a weapon, but is also found in nature. For example, near the nation’s Southwest border there have been a number of instances where a bacterium that is endemic in the environment has been identified. Thankfully, none of the instances were actual attacks.

It also has these lines that I thought were quite funny:

The detection of commonly occurring environmental agents is not a “false positive.” Much like a home smoke detector goes off for both burnt toast and a major fire, the smoke detector is meant to notify you of a potential fire before it’s too late. BioWatch works very much the same way.

A smoke detector that has gone off 149 times over two decades and never once for an actual fire is almost certainly not a functioning smoke detector. And this particular smoke detector cost hundreds of millions to set up, and tens of millions a year to maintain! To be clear: there is no technological reason that these can’t be made better, and there are startups, such as Pilgrim Labs, that are working on improving similar air-detection systems. If curious, I found the Pilgrim’s founders interview here to be worth watching.

On the sewage side, the whole endeavor is actually going fairly well. But before we go on: monitoring the air is obvious, but why monitor sewage? Because nearly every pathogen that infects a human being eventually ends up in the toilet. Because of this, looking through sewage is perhaps the most honest epidemiological data source available, because people cannot choose not to participate.

And we’re doing very well in monitoring this sludge, or doing so-called ‘wastewater screening’. A lot of people in biosecurity complain that ‘the federal government learned nothing from COVID’, and they are mostly right, with one huge counterexample: the national wastewater surveillance infrastructure, which was largely built in response to the pandemic. The National Wastewater Surveillance System (NWSS), launched by the CDC in September 2020, established that you could detect community-level viral trends days before clinical cases appeared, using nothing more than the genetic material people flush down the toilet, without requiring any of them to consent to testing, show up at a clinic, or even know they’re sick.

But the problem with the NWSS, as it is currently deployed, is that it is a targeted system, relying on qPCR to identify specific, known threats. And among the 500-600 sites where NWSS monitoring stations are deployed, they measure three things: COVID-19, Influenza A, and RSV.

80% of them also measure three more things: Measles, H5N1, and Monkeypox.

There’s an awful lot missing, isn’t there? What about all the other types of Influenza? Norovirus? And the scarier ones too, Nipah, Ebola, Tularemia, all of them are entirely absent.

The answer is, in principle, to switch away from qPCR and do metagenomic sequencing: instead of looking for specific pathogens, you sequence everything in the sample and computationally figure out what’s there. I’ve written about metagenomics in the context of microbiomes, so you can look there for a deeper explanation on how it works.

Why isn’t anyone doing this?

In fact, there is someone doing this, and this leads us to what I’d consider one of the crown jewels of what the U.S. nonprofit-biosecurity-complex has managed to accomplish: SecureBio Detection, previously known as Nucleic Acid Observatory (NAO), which has been building a pilot metagenomics-based wastewater screening network in the US since 2021. Circa November 2025, they maintain 31 sampling sites across the US, in 19 cities, sequencing about 60 billion read pairs weekly. And they’ve already stumbled across a few interesting things, such as detecting measles in wastewater from Kauaʻi County, Hawaii and West Nile Virus in Missouri—the latter of which ended up having real, confirmed cases to go alongside it! There is an ongoing effort to have something similar at the federal level—the so-called ‘Biothreat Radar’—but it doesn’t seem to actually exist yet. But SecureBio Detection continues!

This is quite promising. This is a bonafide, national-scale attempt to detect both known and unknown biological threads, and it works! They are also doing some interesting ML work in being able to automatically detect, via a metagenomic language model, whether unknown metagenomes are simply uncharacterized, innocuous microbes (i.e. nearly all microbes) or human-targeting pathogens worth worrying about.

But, despite how good wastewater screening is, it is worth remembering that detection is not defense. This may seem like a semantic point, of course detection isn’t defense, but certainly it should allow you to defend faster or better.

But does it really?

If you’re detecting something known—a COVID variant, a resurgent influenza strain—then yes, detection may accelerate response, because you already know what to make against it. But if you’re detecting something novel, then what exactly happens next? Designing vaccines that elicit neutralizing antibodies is difficult in the best of circumstances, clinical trials take time, and, in the meantime, the underlying pathogen will continue to mutate, potentially diverging from whatever you’re designing against it. This is surprisingly under-discussed, but it is worth marinating in the fact that, yes, BioNTech’s and Moderna’s capacity to generate a COVID-19 vaccine so quickly was indeed an extraordinary feat of logistics and science, but the usage of the spike protein segment as an immunogen in the vaccine was informed by two decades of prior coronaviruses research. In the case of a brand new, chimeric virus that has no immediate cousin, a few weeks of advance notice is just a longer window in which to watch the curve steepen.

Finally, in both cases, either a natural or engineered pathogen, there exists one last problem: coordination. There is no pre-negotiated decision tree for what happens after something scary is detected, no threshold that, once crossed, triggers automatic funding for therapeutic stockpiling or accelerated clinical development. There probably should be one! But there isn’t today and, as far as I can tell, there aren’t plans for one to exist. Ultimately, the value of early warning is bounded by the speed of the response it enables, and that speed seems extremely limited today.

Machine learning may be very useful for rapid-response therapeutics

This section is me going off-script from the experts I talked to. The pipeline I will describe below does not exist in any meaningful capacity, but there are inklings of it found across the therapeutics-for-biosecurity plays out there, so it feels like the mental framework is informative regardless. As in, the logical steps mentioned here may massively diverge from what will realistically occur, but the types of models, timelines, and decision calculus used likely will not.

The Coalition for Epidemic Preparedness Innovations, or CEPI, has an initiative that identifies exactly what you’d want your government to be capable of in the case of a major pandemic: the 100 Days Mission. As in, from the day of realizing, ‘we probably should mount a response to this weird sequence we found’, therapeutic options should be ready to go within three months for population-scale deployment. It took 326 days to get the first COVID-19 vaccine authorized, and that was widely regarded as the fastest vaccine development in human history. How could 100 days be possible?

Luckily for us, they’ve defended the position at length in a paper. Long story short: this is not an unreasonable timeline if you’re in a coronavirus-y situation, where your adversary is something that millions of hours of research has already gone into characterizing. Why? Because the second you can identify the ideal immunogen—or, what you should be sticking in your vaccine to elicit the antibody repertoire that neutralizes the virus—you’re done with the major technical design challenge. Like I mentioned earlier, the spike protein was the obvious immunogen for SARS-CoV-2, informed by two decades of prior coronavirus research going back to SARS-1 and MERS. Thus, the fun little party story of BioNTech and Moderna having a vaccine candidate within days of receiving the SARS-CoV-2 sequence.

So, mRNA basically hands us our vaccine. Now we just need to deal with the two other bottlenecks: manufacturing scale-up and clinical trials. I think it’s interesting to discuss how things may be sped up here—and the arguments for how you’d speed them up are within the realm of possibility—but it does lead us off-topic, so I’ll place those in a very long footnote.1

But remember, what I’ve described so far is the rosy scenario, where we are dealing with something we already mostly understand. What about things that are wholly new? This includes not only de novo pathogens, but also mostly natural ones that have immune-escaped the established immunogens through either evolutionary or otherwise methods. For these cases, the same CEPI paper admits that things are harder, and that a 200 or 300 day turnaround time should be the goal.

But is that possible? Remember, now the vaccine design problem becomes quite difficult. Which viral protein subunit do you use as the immunogen? Which conformation elicits neutralizing versus non-neutralizing antibodies? Which epitopes are conserved enough that you’re not designing a vaccine that will be obsolete by the time it’s manufactured? These are not easy questions to answer! And if you get them wrong, you waste months manufacturing the wrong thing. The same CEPI paper from earlier optimistically states that immunogen/antigen design for these novel pathogens would take just a few months if we really worked hard at it.

But it feels like getting to this speed of development would almost certainly require immense technological leaps. One of my favorite podcast episodes was my interview with a founder of a vaccine development startup: Soham Sankaran of PopVax. In it, I ask a lot of questions about why immunogen design for vaccines is so hard, and I will paraphrase his answers in the footnotes2. To keep it short: it’s really, really hard.

Now, the question of the evening: can machine-learning help us with this?

Probably not. At least not in a significant way anytime soon. ML seems useful in the margins for, say, figuring out how to scaffold specific immunogens of interest such that they are ‘correctly’ presented to the immune system, but we are far off from a model being able to reliably respond to a query like ‘here is the structure of the virus I am scared of, please design an immunogen that I can encode into an mRNA vaccine that will elicit broadly neutralizing antibodies’.

At least, that’s the consensus from everyone I talked to. But if we’re willing to stretch our brains a little, I think one can imagine a scenario in which ML, as it exists today, may end up being extraordinarily useful for how we respond to pandemics. And it comes down to the fact that mRNA is such a stupidly, insanely versatile platform. You don’t need to encode an immunogen in the mRNA. Instead, you could simply encode the antibodies that you’d want the immunogen to elicit.

What, you may scream, surely you can’t do that. But you can! As far back as 2021, Moderna injected adult humans with an mRNA vaccine that had, as its payload, monoclonal antibodies against the Chikungunya virus. And it worked quite well! Moderna has since shelved this particular asset, but for reasons that seem more portfolio-optimization-y than the drug not having enough efficacy. Luckily, there is ongoing work outside Moderna in exploring mRNA-encoded nanobodies, which have the advantage of being far smaller than typical antibodies, so less stressful for our weak, mammalian cells to pump out. And upon looking it up, I have discovered that I am not the first one to find this absurdly relevant to biosecurity efforts! One 2026 review paper echoes my sentiment, and expands on it: ‘mRNA-encoded antibody approaches have been explored in preclinical models of Zika virus, Ebola virus, and rabies, where a single intramuscular dose provided prophylactic and therapeutic benefits in animal models’.

Insane, right? Now, you may immediately spot problems with this. For instance: antibodies don’t last very long in our blood stream, on the order of 2-3 weeks. How useful could this possibly be in a pandemic, where circulating pathogenic material may linger around for months? But fixing this is fully within the realm of possibility. Engineering the Fc region, or the bottom section of the ‘Y’ shape of an antibody, can reliably and dramatically expand its therapeutic window. In fact, we needn’t even theorize on this, because the same 2021 Moderna paper also included these Fc mutations: 2 alterations (M428L and N434S), leading to a 69 day half life. And there is no reason to believe that this cannot be pushed even further, given that at least one anti-viral antibody has been shown to have a half-life on the order of 5-6 months.

The next question: where will we get useful antibodies from?

Modern ML methods for designing antibodies against arbitrary targets are not perfect, but they really are quite good. In 2025, the Baker lab published what is, to my knowledge, the most significant result in computational antibody design to date: a fine-tuned version of RFdiffusion that can generate de novo antibodies—VHHs, scFvs, and full antibodies—targeting user-specified epitopes. Most relevant for us, when the model was given a particular target and epitope—C. difficile toxin B and a specific epitope that had never had an antibody designed against it—the model generated moderate-affinity antibodies, with cryo-EM confirming its binding. Now, as I mentioned in the footnotes, binding to a virus is not the same thing as neutralization of a virus, and we usually only care about the latter. I agree that this is a bottleneck that ML cannot easily solve, but it also does not feel like a huge bottleneck, especially if these models work well. Consider the fact that binding is necessary, but not sufficient for neutralization, and if you just screen a bunch of binders, all generated for free, surely you can vastly speed up the process of identifying a neutralizing antibody.

Of course, in the case of a pandemic going on long enough, you could bypass all this by simply fishing out neutralizing antibodies from infected patients, or at least use those as a parent for further ML-driven optimization.

Our final problem is that pathogens usually mutate, which means that even if we turn every human into a factory of identical antibodies against a particular sequence, those same antibodies may soon become useless due to immune escape. This is why the natural immune response—as offered by either an immunogen or antigens from the pathogen itself—can be so efficacious, as the polyclonal antibody repertoire elicited by natural infection or vaccination targets dozens of epitopes simultaneously, making it extraordinarily difficult for the virus to escape all of them at once. This too is not theory: every single monoclonal antibody therapy authorized against SARS-CoV-2 was eventually rendered obsolete by Omicron and its descendants.

Are we doomed?

Let’s not give up, and instead take a closer look at what two issues we need to solve to overcome this obstacle. First, we need to choose not just any neutralizing antibodies for our vaccine, but ones that target sites where escape is costly to the virus, or functionally constrained epitopes where mutations would compromise receptor binding or some other essential function. Second, we need to deploy cocktails of antibodies targeting non-overlapping epitopes, such that the probability of simultaneous escape across all of them becomes vanishingly small.

I propose to you that there are viable ML-based solutions to both of these.

For identification of immune-escape-y-epitopes, we can look to EVEscape, a protein model from the Debora Mark’s lab at Harvard. The model combines evolutionary sequence information with structural and biophysical data to predict, for a given viral protein, which mutations are most likely to emerge and evade existing immunity. Flip the interpretation and you get the inverse: sites where EVEscape predicts low escape potential are precisely the sites where you want your antibodies to bind, because the virus cannot easily mutate away from them without crippling itself. This is not a solved problem, but models like these are surely directionally useful, and certainly better than guessing.

For cocktail design, consider EscapeMap. EscapeMap integrates deep mutational scanning (DMS) data from SARS-CoV-2 across dozens of neutralizing monoclonal antibodies with a generative sequence model to identify something very useful: negatively correlated escape routes. Two antibodies have negatively correlated escape if the mutations that evade one tend to make the virus more sensitive to the other. Cocktails built from such pairs are inherently resistant to simultaneous escape, because the virus cannot run from both at once. As published, EscapeMap is SARS-CoV-2-specific; the underlying DMS data took years to generate, and you wouldn’t have it on day one of a new pandemic. But the framework (should) generalizes to any pandemic and a DMS-esque dataset will emerge if it goes on for long enough, allowing you to eventually design broadly-neutralizing cocktails of antibodies. If we’re being especially galaxy-brained, given a sufficiently good protein model, perhaps you don’t need any DMS data at all! After designing your de novo antibodies, you could run in-silico DMS to predict how every possible mutation on the target surface would affect binding to each candidate, cross-reference those with EVEscape-style fitness predictions to filter for mutations the virus can actually tolerate, and look for the same negative correlations. I realize this isn’t fully possible today, that the impact of single-amino-acid substitutions are still badly grasped by these models, and a whole host of other failure modes. But the models will only get better.

When all of this is put together, this pipeline should allow us to do something extraordinary within weeks of a novel pathogen being sequenced:

Discover neutralizing antibodies against them, either via ML or patient serum.
Create a cocktail of antibodies with negatively correlated escape routes via in-silico screening or a DMS dataset.
Fc-engineer them for a long half-life.
Encode the whole thing into mRNA.
Manufacture it.

If we do this early enough, and distribute the vaccines fast enough, we could potentially kill the spread of even the most virulent pathogens. Of course, manufacturing is historically the next major bottleneck, but if our wastewater screening and ensuing rapid-responses are quick enough, we may need to manufacture orders of magnitude fewer doses.

I realize that there are many catches here, and that what I’ve presented is grossly optimistic. All of this is a multi-layered, mostly-computational solution, and every one of these layers are error prone. All antibody generation methods have plenty of failure modes, EveScape is not consistently useful across viruses (though further lines of research claim to have improved on it), EscapeMap is hyper-focused on SARS-CoV-2 and it may very be that the framework actually cannot easily transfer to new pathogens, and antibody-encoded-into-mRNA is—for however clever it may sound—still in its early days of efficacy-and-adverse-effect studying.

But each one of these are improving, and I think the trend-lines are promising. I am much more optimistic on the value of ML here than in perhaps any other layer of the biosecurity defense workflow, and time will tell how much that optimism is warranted.

Pathogen-agnostic defenses are extraordinary. But who pays for it?

Finally, the last section. This one will be short.

Everything discussed so far shares a common architectural assumption: that you know, or can figure out, what you’re looking for. This is hard! And it is made all the more difficult by the fact that the coordinated effort needed to respond to these discoveries is not something that we’re historically very good at. But there is a one category of biosecurity defense that sidesteps this problem entirely, since they work against all pathogens. And once they are deployed, they largely work for extended periods (months to years!) by themselves, with no logistical effort needed from anybody.

What are they? Far-UVC and glycol vapors.

I’m going to be honest: the more I looked into this subject, the more I found that every conceivable thing that could be written about it has been, and where it hasn’t, it’d require conversations with a lot more people and significantly lengthen this already long essay. So I’ll defer to other people here. For far-UVC I’d recommend visiting faruvc.org for an introduction, and, if you’re sufficiently convinced, aerolamp.net to pick one up for yourself. Glycol vapors have a lot less easy reading material, but there is one article published a year back by Blueprint Biosecurity—a nonprofit who also funds far-UVC work—and various related articles written on Jeff Kaufman’s blog, who works in biosecurity.

To keep it short: If we could tile the interior of enough buildings with these solutions, you could, in theory, render the entire human indoor environment continuously hostile to airborne pathogens; far-UVC through physical degradation of their DNA, and glycol vapors through (probably) desiccation. This would affect all airborne pathogens. Named ones, unnamed ones, engineered ones, ones that have never existed before and will never exist again except in the brief window between their release and their death to one of these two. And it would do all this with no harm to you. Of course, these technologies still have room to improve, but their problems are mostly ones of logistics, optimization, and scalability.

So why don’t we see these far-UVC lamps and glycol vapor fumers in every building in the world? Why aren’t we sterilizing our air the same way we sterilize our water?

You could quibble with the details here, about how far-UVC is still very expensive, the evidence base for glycol vapors is still being figured out, and the like. But it’s tough for me to consider the question of ‘why isn’t this being massively funded’ without concluding that the problem is that there is no for-profit entity that really benefits from it. The benefits of clean air are diffuse, accruing to everyone who breathes in a building, none of whom are the institution writing the check. Hospitals are the one exception, but they are a sliver of all interior environments that humans reside in, and obviously will not offer the scale necessary to put a dent into pandemics. This means that these technologies can only be deployed and studied by a very small group of hobbyists, early adopters, and academic labs.

Okay, but isn’t this the point of governments? This is a clear public good! This is territory that is hard to get perfect visibility into, but my instinct is that the evidence base for governmental-buy-in is simply difficult to produce.

A recent Works In Progress article over far-UVC had this to say:

Measuring infection control is challenging and seldom undertaken, particularly in public spaces. Epidemiological data is expensive and difficult to gather, and there is currently no way to measure the amount of viable, infectious pathogens in the air in real time. Office attendance can be tracked, but controlling for how users mix outside the office space is immensely difficult, and measuring the real-world effect of small-scale deployments in public areas is almost impossible. Studies aiming to cause deliberate disease transmission in controlled environments have failed to work in practice because they have been too small to generate enough infections.

While this is a bitter pill, there is a sweet one that it offers us: the implementation problems with pathogen-agnostic defenses are extremely ‘money-shaped’ in a way that few other biosecurity solutions are. All the subject needs is proof, in the form of randomized control trials, in aggregating individual use experiments, in subsidizing institutions to try it out—more money to push over to the ‘this obviously works’ finish-line. So, if there are any biosecurity-curious philanthropists reading this: I highly encourage you to explore far-UVC or glycol vapors.

Especially because unlike almost every other type of biosecurity solution we’ve discussed so far, these solutions will yield public benefits even in the absence of bioterrorism. In fact, the same Works In Progress article over far-UVC never even mentions biosecurity, and is focused more on public health, ending with this line: ‘Tuberculosis and coronaviruses [may] join typhoid and cholera as tragedies of the past, and seasonal flu and common colds would become rare rather than routine if clean air were as universal and expected as clean water.’.

It’s a great pitch, and I am very excited to see more deployment of these technologies in the coming years. It just feels like one of the more obvious areas to push forwards on in this field.

Conclusion

So, what should you be scared of?

I can’t speak for you, but I can say what I’m scared of. I am scared of a well-funded terrorist organization constructing their own lab, out of which they create natural pathogens—potentially with a few AI-assisted mutations to allow them to immune-escape existing defenses—using either split-order attacks or ordering from DNA synthesis companies who don’t screen. I am scared of these groups spreading it in well-populated cities or farmland. I am scared that it will either kill several million people and/or cause billions in economic damage, and though its spread will be noticed by wastewater screening, it will be months until the necessary resources are allocated to defend against it. And I am scared that all of this will happen within the next few years.

What am I not scared of? I am not scared of state-actors, because most states have too much to lose by violating the Biological Weapons Convention and, if they are willing to let loose anyway, I believe they would opt for either easier-to-use-and-control chemical or nuclear weapons instead. I am not scared of people creating extremely engineered pathogens that have capabilities far beyond existing ones—because the existing ones are already quite good and difficult enough to work with—especially because even if the AI tools get good enough to make it worth it, I believe the same AI tools will be just as useful in countermeasure design. And yes, I realize ‘attack requires one success while defense requires comprehensive coverage’, but I also believe the swiss-cheese security model will prevail. Finally, I am not scared of individual actors, because the economics of bioweapons production likely do not work in their favor. Yes, they can rent upstream services—virus production, purification—but the downstream weaponization work requires custom protocols that CROs have no economic incentive to develop. Moreover, given that the weaponization will almost definitely be a bespoke, hands-on R&D project and not one that is easily automated, it feels unlikely that nobody at the CRO will raise an eyebrow.

That’s my threat model at least. I realize it has holes. For example, it may be the case that state actors are worth worrying about, entirely because the appeal of bioweapons is that you deploy them with plausible deniability. Hard to do that with a nuke! You may also accuse me of not paying close enough attention to the trendlines, and that maybe I am correct about the 2026 threats, but not the 2030 ones, so perhaps a disgruntled salaryman will really be able to someday easily design mega-Ebola to depopulate the planet. Maybe!

Ultimately, you can get infinitely paranoid about biosecurity if you really want to, or you can assume Nothing Ever Happens, and I think where I have landed is a comfy middle ground. I am grateful that there exist people who work in biosecurity who are infinitely paranoid, and through writing this essay, I have become far more sympathetic to their viewpoint.

To end this off: in all my conversations, everyone generally agreed that an honest-to-god, bioterrorist attack is unlikely. It is a low probability event. But low probability events with civilizational consequences are still worth preparing for. The heartening thing here is the bottleneck to preparation here is almost entirely institutional, economic, and coordinative, not scientific. The disheartening thing is that fixing these ultimately requires political will, and sans a catalyzing event to unlock it, that political will does not currently exist. Of course, one could argue that perhaps we will never need it, that the Pathogen that people in this space are breathlessly building defenses against will never arrive, that it is all paranoia, tech-rotted minds coming up with entirely hallucinated demons. But that argument feels far less convincing now than before I started writing this essay, and, if I did my job right, I hope it will feel less convincing to you, too.

Where are we at with manufacturing-maxxing? There are certainly more mRNA production facilities around. Moderna brought three new plants online in 2025 in the UK, Australia, and Canada. BioNTech has deployed modular, containerized manufacturing units called BioNTainers to Rwanda, the first mRNA plant on the African continent. But mRNA production is really, really complicated, and there’s all sorts of weird bottlenecks that can arise in its creation. If you’re curious to learn more here—since this is a surprisingly deep subject that could be its own essay— there are two really incredible articles over the whole logistical apparatus that goes into making one of these drugs: ‘Exploring the Supply Chain of the Pfizer/BioNTech and Moderna COVID-19 vaccines’ and ‘Analyzing Vaccine Manufacturing Supply Chain Disruptions for Pandemic Preparedness using Discrete-Event Simulation’. The short version is that the specialty raw materials and quality-control personnel needed to actually produce + release vaccines at pandemic scale are in short supply, and, as far as I can tell, continue to remain in short supply. People are working to change this though!

How about reducing the clinical trial bottleneck? The paper over the CEPI 100 Day mission has a fun approach to it: just immediately chuck the vaccine into a phase 2b/3 trial. Of course, caveat on those only being a COVID-y situation: known pathogens, available safety data from similar therapeutics, and the like. The trials you run could also be challenge trials, as in, deliberately infecting vaccinated volunteers with a pathogen in a controlled setting, allowing you to immediately observe efficacy of the vaccine (which is, surprisingly, a historically safe thing to do).

Can’t you just fragment a bacteria or virus into a soup of proteins, and inject that alongside an adjuvant? This is not terribly dissimilar to how traditional vaccines function, which is to say: this may work, but you’d forgo all the advantages of speed advantages of mRNA, and speed is ultimately what we need most here.

Okay, forget fragmentation. Can’t you identify conserved regions of a virus, and just use those fragments in your vaccine? Sure, and maybe it’ll work. But maybe it’ll also massively backfire, and you’ll up giving your patient antibody-dependent enhancement, or ADE: antibodies that bind tightly to sections of pathogen, but don’t neutralize it in any meaningful way, crowding out the antibodies that would actually help. ADE actually happened for the RSV vaccine: injecting native proteins from the virus made the disease worse. It took a structural biology breakthrough to get it to work: using the prefusion conformation of the RSV protein in the vaccine. Crazily, the same conformation trick, by the same guy (Jason McClellan), is what made the COVID-19 spike protein work as an immunogen.

But if we know which antibody we want, which we can grab from patients who naturally recover from the disease, can’t we just work backwards and find the immunogen that elicits it? Perhaps! But did you know that there are patients with HIV who somehow have gained antibodies against the disease? They are called ‘elite controllers’, making up 0.5% of all HIV patients, and despite knowing exactly what antibodies these patients have, it has been a struggle to convert this finding to a vaccine. The path from immunogen to mature antibody involves cascading rounds of somatic hypermutation, cross-reactive antibody-antibody interactions, and a network of immune signaling that cannot be reliably predicted from binding data alone. In fact, from Soham’s perspective, it isn’t terribly hard to find an antibody that can neutralize a vaccine. What is hard is understanding which immunogen can reliably cause those antibodies to be elicited, and that process is almost entirely a trial-and-error process. Worse of all, it may be the case that some patients genuinely lack the immune repertoire necessary for those antibodies to ever be elicited.

Neurotechnology? For Cancer? (Ben Woodington & Elise Jenkins)

Abhishaike Mahajan — Mon, 02 Mar 2026 15:17:37 GMT

Spotify: https://open.spotify.com/episode/6BLZph2uGGUVphbNQ8NGPd?si=SVBSKJM8RdO4AhYzDa-ZfQ
Apple Podcast: https://apple.co/3OU5Zse
Transcript: https://www.owlposting.com/i/189602943/transcript

Introduction

This is an episode with Ben Woodington and Elise Jenkins, who are the cofounders of Coherence Neuro. The pitch for Coherence is as follows: a brain implant that treats cancer with electricity. When I first learned of the company in mid-2025, it was such an alien thesis that I instinctively wrote it off entirely. This surely isn’t clinically plausible at all, maybe it will be one day, but certainly not today.

Then, while I was in San Francisco, I met up with Nicole, Coherence’s chief of staff. After that, I was far more convinced that there was something real here, especially after she told me that the electricity ←→ cancer thesis already has some merit: Optune, an FDA-approved medical device developed by Novocure. This has been on the market for over a decade, and uses externally delivered alternating electric fields to treat glioblastoma. And it works! If Optune is consistently used, glioblastoma patients can live up to twice as long compared to chemotherapy alone. How does it work? Simple: the alternating electrical fields prevent fast-dividing cells from replicating by interfering with the physical process of cell division (specifically, mitotic spindle formation).

After this, Nicole connected me with Ben and Elise, the cofounders of the company. It was an incredible conversation. During it, I was informed that cancer cells behave eerily similar to neurons: hijacking neural pathways, attracting nerves into their microenvironment, and forming synaptic connections with surrounding tissue. Given this set of evidence, none of which felt particularly controversial, an easy logical leap is to ask the question: why can’t you throw neuromodulation at the tumor? Maybe not even just for treatment, but monitoring as well? Optune was a step in the right direction, yes, but surely it can be pushed even further.

So Coherence was born, the only (neurotechnology x oncology) company in existence. Ben and Elise met during their PhD’s at Cambridge, spinning up the startup with the belief that a modality long assumed to be exclusively for neurological conditions like Parkinson’s, epilepsy, and chronic pain, may have a profound role to play in cancer. And perhaps even conditions outside of it.

And during my last trip to San Francisco for JPM 2026, I had the honor to sit down with Ben and Elise to talk about it all.

This conversation covers how Coherence’s first neurotech device (SOMA) works, the molecular reasons behind why neuromodulation affects cancer at all, what the biomarker readouts look like, the obvious Michael Levin comparison, and a lot more. Coincidentally, Ben helped me out a fair bit for my neurotechnology piece awhile back, and that article may be helpful reading material for this episode.

Enjoy!

Timestamps

(00:00:00) Introduction
(00:01:42) How is SOMA different from Novocure’s Optune?
(00:08:57) Why does neuromodulation affect cancer at all?
(00:13:28) How was cancer-nervous system crosstalk first discovered?
(00:15:42) Anti-epileptics and beta blockers as accidental cancer drugs
(00:17:38) What is molecularly happening when you block cancer-neuron crosstalk?
(00:19:50) What is SOMA actually reading out as a biomarker?
(00:20:44) What does it mean that cancer is “very electric”?
(00:22:02) Can you derive universal biomarkers across patients?
(00:23:09) How is the device placed?
(00:24:45) How does the blocking stimulation regime work?
(00:26:43) Is it fair to say this is closed loop?
(00:29:05) Why not just spam the tumor with constant stimulation?
(00:32:31) Why MRI safety is non-negotiable for oncology devices
(00:33:35) Walk us through the patient journey from diagnosis to implantation
(00:36:13) The Michael Levin question: can you reprogram cancer back to normal?
(00:42:29) Efficacy, hospice settings, and the utility of the neuromodulation literature
(00:45:52) Why start with glioblastoma instead of an easier cancer?
(00:48:57) Regulatory strategy and the reimbursement threat
(00:55:37) How well does mouse-to-human translation work for neuromodulation?
(00:58:09) Why didn’t this exist 10 years ago?
(01:01:48) The founding story
(01:06:38) Why build your own device instead of using off-the-shelf arrays?
(01:08:35) Speaking with glioblastoma patients
(01:12:04) What was it like to raise money for this?
(01:13:56) Beyond cancer: TBI, lung disease, and the pan-disease argument
(01:17:40) Hiring at Coherence + what is the hardest type of talent to find
(01:23:17) What would you do with $100M equity-free?
(01:27:15) Are you a neurotech company or a cancer company?

Transcript

[00:00:00] Introduction

Abhishaike Mahajan: Today I’m going to be talking to Ben Woodington and Elise Jenkins, who are the co-founders of Coherence Neuro, a startup that is building therapeutic neurotechnology that manages cancer from inside the body. I first want to talk about what specific device they are building, because I think it really sets the stage for how interesting the Coherence pitch is.

Ben and Elise, welcome to the podcast. Your first device is called SOMA. What exactly does it do?

Ben Woodington: That’s a brilliant opening question. Thank you so much for having us here. To rewind slightly, you’re right, we build technologies that interface with cancer, surrounding biology of cancer using electrical stimulation and recording. We’re really leveraging the intrinsic electrical properties of cancer, and also the way that they intersect and interact with our nervous system. We have a lot of programs ongoing, and I’m sure we’ll talk about some of them today. But as you correctly identified, our first product and program is SOMA, which we’re using in brain cancers. This is a tiny device, a BCI-like device that sits in the skull, and it can deliver an electrical stimulus to a tumor in the brain. We can also record the electrical activity from the tumor and around the tumor for readouts. And we’re working very hard to investigate what those electrical readouts can mean for diagnosis of the patient, for the prognostication of the patient. And of course, the therapeutic potential of that device as well.

[00:01:42] How is SOMA different from Novocure’s Optune?

Abhishaike Mahajan: The interesting thing when I was first researching Coherence is that this is not the first device that uses some notion of electrical fields to interact with cancer. There’s another one called Optune developed by a company called Novocure. Is SOMA fundamentally different from the technology employed there?

Ben Woodington: Fundamentally different, yes. I think the natural development of — we have often said that Novocure is a 25-year-old technology. Optune is a 25-year-old technology. And if Optune is a Walkman, we’re going to be the iPod. It’s a lot more technically dense. We’re doing readout capabilities, recording capabilities. The thing is much, much smaller. We get much closer and in contact with the tumor and the body itself.

Abhishaike Mahajan: Maybe just to give some context as to what the Novocure device actually is, my understanding is that it is a non-invasive device — you stick it to the side of your skull, meant for glioblastoma patients — that emits a low frequency electrical field, preventing fast dividing cells from dividing. What is the evolution of SOMA? What about SOMA is an evolution from that?

Ben Woodington: I’ll let Elise clarify the Novocure point because that was the background of her PhD.

Elise Jenkins: Yeah. So Novocure, as you said, is a wearable device. It’s actually four patches or arrays of electrodes that are positioned on the scalp, on a shaved head. And it delivers alternating current electric fields. They’re actually more intermediate frequency electric fields. And they are proposed to interfere with mitotic spindle formation, the way that cells divide. That’s what Novocure’s technology does.

Abhishaike Mahajan: And what about SOMA ...what is it an improvement on?

Ben Woodington: Though the Novocure device has powerful overall survival statistics, there’s a very steep usage effect curve. When patients use the device around the median time, which is around 18 hours, the overall survival in those glioblastoma patients is about four months. But when they are super users of that device — when they use it 22 hours, 23 hours, or even more — that overall survival goes through the roof and those patients are getting maybe nine months, maybe more, in median overall survival. Which is almost doubling their life expectancy, which is huge. That’s probably the most impactful thing in glioblastoma in the last 20 or 30 years. Now it is a very natural progression to say, okay, why aren’t patients using that device 22 to 23 hours a day? It’s a large compliance issue.

Go inside and you’re in charge of how much stimulus the patient is getting and for how many hours of the day. You offset a lot of those systemic effects — the skin irritation, just the fact that they have to wear something on the head all the time. Patients tend to not like wearing large things on their head. That’s why there’s been so many EEG device failures. Just being able to justify going inside and treating those patients 24 hours a day is a huge benefit already, before you even start talking about the data elements and recording elements that you can introduce, or the novel stimulation regimes that you can start using once you’re inside.

Abhishaike Mahajan: What other electrical things can you take advantage of when you’re actually physically inside the body?

Elise Jenkins: I think there’s a few things. If you think about the way that an electric field is delivered through Novocure’s platform, they require a very large voltage to overcome the skull. There’s a big loss component there. And because of that, they have to carry this very large backpack with a big battery pack, like a car battery, in order to deliver the electric field threshold that they need to enable that interaction with the cell, whether that be mitotic spindle formation or one of the other mechanisms, which is around the process called dielectrophoresis that happens in the cell during metaphase. There are two interactions that happen when you expose that kind of electric fields to cancer cells. And they do that externally using this field. The advantage of going into the brain is that you no longer have that barrier anymore. You don’t need these extremely large electric fields. You don’t need these car batteries. You can have a very small wearable. The interface is non-obtrusive for patients. There’s a big argument around non-invasive versus non-obtrusive. And that’s one of the natural progressions in terms of using this type of technology continuously, which is shown to have the biggest benefit in patients. They’re going in already, they’re doing surgery already on these patients. Let’s put a device in that can deliver that kind of field or other types of electrical stimulation. Let’s do it locally. Let’s also record what’s happening because we’re right there. We’re interfacing with those cells and tissues. And we can do it without being obtrusive to patients’ lives.

Abhishaike Mahajan: And the actual device itself is emitting the exact same type of field that the Novocure device is emitting, or something else?

Elise Jenkins: It can do either. We’ve been looking at — a lot of my PhD work was looking at Novocure’s types of stimulation, tumor treating fields. But the advantage of being closer is that you can start to look at different types of stimulation. Neuromodulation is something that we are particularly interested in. We’ve been exploring different ways of optimizing electrical stimulation in these types of cancers. Neuromodulation is a really interesting one. When we started development of the SOMA platform, we were really interested in the data that you could actually record. That was — we knew that there was a therapeutic intervention that you could use. Novocure had already shown that clinically. We were really interested in the data element. When we started looking at what happens longitudinally when you record electrical activity from these tumors, based on a lot of history of neural interactions that happen with these cancer cells, we started to discover that there were these very interesting biomarkers that are really relevant to what people target with neuromodulation. And that was what drove us to consider beyond just what Novocure is doing with electrical stimulation — field-based mitotic spindle, cancer cell focused — to what’s happening in the rest of the environment and how can we actually target or look at the rest of the environment, modulate that behavior, and how would that affect cancer cells. That’s what we’ve been looking at, optimized strategies for stimulation.

Abhishaike Mahajan: Just to have a good mental model for what the SOMA device actually is and where it is placed in relation to the cancer itself — should I imagine it as like you have a little head right here and a bunch of spikes coming out of it that poke directly into the cancer?

Ben Woodington: We wouldn’t use the word spikes. We would use leads or threads. But yes, the brains of the device is that part that you see that’s anchored in the skull. And as Elise said, that’s delivered at the time of surgery through a very small perforation in the skull. Then the front end of the device is modular. We have leads and threads that come off the front of that device and we can position them in and around the resection cavity after a tumor has been removed, or into a tumor maybe without resection. Then in the rest of the body, we can look at targeting specific nerves or tumors there as well. And as Elise alluded to, we’ve seen some pretty promising results looking at neuromodulation — slightly lower frequency regimes rather than super high frequency regimes — in those peripheral cancer indications as well.

[00:08:57] Why does neuromodulation affect cancer at all?

Abhishaike Mahajan: Whenever I’ve brought up Coherence to other people and mentioned neuromodulation in combination with cancer, there’s always this surprise that neuromodulation does anything to cancer. What is the intuition for why you would expect neuromodulation to do anything to cancer? I can buy that monitoring the nervous system nearby the cancer helps you have some notion of biomarker, but why does performing neuromodulation at all affect it?

Elise Jenkins: I think maybe the same surprise that others might have when they first hear this is the same surprise that people who discovered these interactions had. Cancer cells behave and act a lot like neurons. And I think that was a big surprise for the entire field that started to make these discoveries. A lot of cancer cells — and not just in the brain, this happens in other organs as well — mimic a lot of the behavior that neurons have. They hijack neural pathways. They have an ability to attract neurons into their environment. They also have an ability to attract nerves into their environment. If you’re outside of the brain, all of these properties make a really nice opportunity for you to then consider neuromodulation or other targeted strategies that are not just looking at the cancer cell itself. You could consider a similar analogy with the immune system. When people are looking at immunotherapy, they’re not targeting the cancer cell. They’re targeting a completely different subsystem in biology that they can leverage and tune in a way to target cancer cells. And when people have started to unpack this opportunity — that cancer cells are behaving so similarly to neurons — it massively opens up the therapeutic opportunities that you can exploit, things we have used for decades in other indications. I think that similar surprise was also a surprise to the people who discovered it.

Ben Woodington: And that’s recent. These are recent discoveries over the last five to ten years. Not mechanisms that have been uncovered and explained for 50 years, which is the exciting thing.

Abhishaike Mahajan: Is it fair to say that glioblastomas have the most nervous system interaction and maybe prostate cancer has the least, or is it not that clean?

Ben Woodington: I can say one thing and then maybe Elise can say the other. It’s difficult to draw a side-by-side comparison between glioblastoma brain cancers and peripheral cancers in the body simply because of the nature of those tumors. The tumors in the brain are in a sea of neurons. It’s a volume of conductive tissue, neural tissue, whereas tumors in the rest of the body are heavily innervated with nerves but are not existing within this sea of neurons. So it is tricky to draw exactly a side-by-side comparison. It does change how we design the devices and how we introduce them to the body. But yes, they do still have these neural features.

Elise Jenkins: I think if you’re studying gliomas or tumors of the brain, it’s a natural curiosity to imagine that cancer cells might have some type of similar features because they’re an extension of oligodendrocytes or astrocytes or whatever other brain-type glial cell. It’s not so unbelievable to think that cancer cells might behave similar to the environment they’re in in the brain. And I think that’s also why a lot of the research has been more well-established in the brain — in gliomas and diffuse intrinsic pontine glioma [DIPG], which is a pediatric glioma. A lot of the research is very well-established in those regions. But I also think that’s just a consequence of proximity — how close they are to that environment. When you start to look at these interactions happening in other organs, I don’t think it’s a matter of proximity. I think it’s just a consequence of the fact that people have done a lot of research already in glioma. And now that people are starting to observe these interactions happening in other organs, it’s exploding. If you were to do a PubMed search on cancer neuroscience and look at the trajectory of publications coming out in this space, it’s exponential. I think we’re only at the very start of that now. I think we will start to see that these interactions are possibly happening in every cancer in the body, not just ones in the brain.

[00:13:28] How was cancer-nervous system crosstalk first discovered?

Abhishaike Mahajan: You mentioned that a lot of this research is relatively new, past five to ten years. How was it first established that there is any crosstalk between cancer and the nervous system?

Ben Woodington: By one of our idols and collaborators.

Elise Jenkins: It’s possible that this initial discovery really is a byproduct of over 50 or a hundred years of bioelectricity research. If you look at some of the really early research where people are talking about voltage-gated channels and interactions, the idea that cancer is very electric — that’s not a new phenomenon. That’s been something that people have been talking about and studying for a very long time. I think the discovery that Michelle Monje’s group at Stanford made, where they were able to show that there were basically similar behaviors in cancer cells that were very similar to neurons — and then they actually started looking at, well, if you patch neurons and you start to depolarize these neurons, what happens to the cancer cell? They started to see that not only are they mediated by neural interactions explicitly through synaptic interactions, but they’re also mediated by paracrine signaling. When neurons release specific factors into the environment, or even cancer cells releasing those types of factors, there is this network effect. And that network effect is actually really bad . There’s this reciprocal engagement that they discovered, which has now caused a bunch of researchers to say, there’s so much going on here. What are those individual mechanisms? Which ones are happening through neurotransmitters? How many of them are actually synaptically integrating? It creates this opportunity for a whole new host of targets, whether that be new drugs to discover or new ways to therapeutically intervene, but also looking at repurposing. People have been looking at repurposing epilepsy drugs as neural inhibitors. People have been looking at retrospective studies in gliomas — what happens if you were on an anti-epileptic and you also had glioma, and how is their survival different? You do see these types of differences retrospectively. So now people are starting to do those kinds of studies forwards, which is awesome.

[00:15:42] Anti-epileptics and beta blockers as accidental cancer drugs

Abhishaike Mahajan: Does that mean you can imagine at some point it will become standard of care for all people who have glioblastomas to be on an anti-epileptic, or it’s not that open and shut?

Elise Jenkins: They’re running these trials at the moment. I think if there is a significant benefit, I can’t see why they wouldn’t add that as a standard of care. We’re talking about patients who have such poor prognosis, such poor survival. The standard of care has not changed for 25 years. If there is anything that’s beating or contributing to current standard of care, I can’t see why — if the side effects are not completely debilitating, the quality of life is still very important — but if they can do that, then I don’t see why that wouldn’t be a natural progression as well.

Abhishaike Mahajan: That’s cool. I usually don’t hear about free lunches like that. When I think of anti-epileptics, they aren’t super side effect heavy.

Ben Woodington: Another famous example from another one of our collaborators, Erica Sloan at Monash, where they’re looking at beta blockers — another relatively innocuous drug, generic, widely available. Again, looking at retrospective studies on the outcomes of patients that happened to be on beta blockers and showed pretty profound impacts, reducing the chance of metastasis from breast cancers, I believe, in a way that you wouldn’t necessarily expect from something innocuous and that has not typically been used in cancer therapy. It’s exciting. It’s this uncovering of biology that we haven’t thought about before. There’s drug repurposing opportunities, but also forward looking — what does that mean for our understanding of biology and cancer and the system itself and how we can design new therapies as well.

[00:17:38] What is molecularly happening when you block cancer-neuron crosstalk?

Abhishaike Mahajan: Before we move on to using these signals as a biomarker of cancer itself — let’s say SOMA works, you’re able to prevent the cancer from interacting with the rest of the nervous system, and somehow that improves prognosis, which given the evidence you’ve given so far feels like not too large of a logical leap. What do you suspect is molecularly going on that is actually helping the patient? What about breaking the crosstalk is actually benefiting?

Elise Jenkins: I think there’s a whole host of things going on. There’s not a single molecular interaction that happens between neurons and cancer cells. There are interactions that happen with what they call pacemaker cells, which are specific cancer cells in the network that have synaptic integration with neurons. And then they have this gap junction-mediated network that happens between cancer cells. When we’re looking at neuromodulation as a regime for electrical stimulation, we look at something called a blocking regime, which essentially is a depolarization block. You either try to stop the neuron from being able to propagate action potentials, which in turn could mean things like blocking neurotransmitter release. Glutamate is a primary example — a primary excitatory neurotransmitter that has a really profound impact on cancer cells. There’s a lot of glutamate in the brain. Being able to reduce some of that — when people have used pharmacological blocking of AMPA receptors, they see around a 50% reduction in tumor volume in mice. So we imagine that there is a whole host of things that you’re interacting with when you block that.

Abhishaike Mahajan: Maybe a dumb question, but why does reducing the levels of the neurotransmitters lead to a reduction in tumor volume?

Elise Jenkins: They’re growth factors. Glutamate and neuroligin-3 is another one — another paracrine signal that is picked up by gliomas — they are growth factors.

So you’re blocking growth factors.

[00:19:50] What is SOMA actually reading out as a biomarker?

Abhishaike Mahajan: And with regards to the actual biomarker aspect of SOMA — you implant the device, maybe it’s post-resection or before a resection — what is the actual readout that you’re getting?

Ben Woodington: Fundamentally we’re recording electrophysiological signals as well. For all intents and purposes, you can think of this like a brain-computer interface. We’re recording electrically, we’re stimulating electrically. We’re reading out the same endogenous electrical activity of the brain as a device that’s trying to read out motor intent, for example. We’re looking at spike rates across the brain and more broadly, local field potentials — frequency shifts in the brain. Cancer cells do have this intrinsic electrical property as well. And you can measure that and we have measured that.

[00:20:44] What does it mean that cancer is “very electric”?

Abhishaike Mahajan: Before you move on — you mentioned that cancer is very electric. What does it mean that something is very electric? Why is cancer very electric?

Elise Jenkins: They have a very high expression — or they retain a very high expression — of voltage-gated ion channels. When they say they’re very electric, they hold a particular membrane potential. They depolarize under certain events. Those depolarizing or hyperpolarizing events usually occur during the stage of the cell cycle. That’s what we mean by that — they’re very electric.

Abhishaike Mahajan: Sorry, go ahead.

Ben Woodington: It’s important to clarify. We are looking at this electrical activity of the brain. We use that as a proxy for what is going on in the tumor or near the tumor. Because of these interactions between the tumor and its microenvironment, we can use that electrical proxy to inform ourselves of what’s going on in the tumor. We’re not measuring specific biomarkers or specific proteins in that microenvironment. We’re measuring the electrical activity of the microenvironment and what that means. Trying to correlate that to tumor volume, or drug responsivity, or seizure activity, or maybe the aggressiveness and growth rate of the tumor — trying to do all of that from electrical proxies.

[00:22:02] Can you derive universal biomarkers across patients?

Abhishaike Mahajan: That sounds very custom from patient to patient and tumor to tumor. Are there actually some universal properties that you can derive from the EEG-esque readouts or spiking that you’re getting?

Ben Woodington: We’re going to find out in humans. Yes, in rodents. But of course rodents are very homogeneous between mouse and mouse. We’re going to find out in humans. I will say, using electrical biomarkers as a proxy for something else is not new. People are doing this in other areas of medicine. People are now looking at this in Parkinson’s, in pain, in depression. And in those spaces, there’s still a lot of variability between patients. We expect to be able to demonstrate the same thing in humans between patients by training one model and applying it to multiple patients. That’s the end goal. Of course, proof will be when we run our long-term human studies, because these have never been done before. It’s one of the most exciting things — people have done this intraoperatively. You can take a recording from a patient, from a tumor while they’re under in surgery.

[00:23:09] How is the device placed?

Abhishaike Mahajan: We’ve been talking about when you actually implant the device — it could be pre-removal or post-removal. The pre-removal setting makes some sense to me. You attach the leads into or around the tumor itself. In the scenario where the tumor has been removed, are you just placing it into the cavity where the tumor was found, trying to see if there are any tumor cells left over? What’s the use case there?

Ben Woodington: In the margin. In the brain cancer case, most recurrences — glioblastoma, to paint a picture, is a horrendous disease. Always terminal, unfortunately, and very poor mortality rates. Those patients, even after a resection — somewhere between 70 to 80% of patients are having a resection, are fit enough to have a resection — all of them will get a recurrence again. And of those patients that get a recurrence, most of them, I think 90%, are in the margin of where the resection was made. That’s where your residual cells are. That’s where your tumors start again. That’s where we’re targeting first. Now that’s not to say we can’t go more broad area, and we have internal programs where we’re looking at broad area coverage, maybe from the surface of the brain, maybe deeper in the brain. The goal is to get as much coverage of the brain as possible so you can control distantly as well.

[00:24:45] How does the blocking stimulation regime work?

Abhishaike Mahajan: This is somewhat clear in the case where you’re purely recording and you get this spike readout. For the case of intervention, what does the perturbation you’re applying actually look like? You mentioned something about blocking. Could you talk a little bit more about that?

Elise Jenkins: There is a multitude of stimulation regimes, but one of the ones that we’re really particularly interested is this blocking regime. When we started looking at biomarkers — when we started recording longitudinally — what are the electrical biomarkers that we’re seeing change in the brain? We saw a really interesting peak in hyper-excitability. If we look in the high gamma range, you see this week-to-week increase in high gamma activity in tumor-bearing mice. Frequencies above 70 Hertz. That’s where we tend to see very distinct changes in brain activity during progression of disease.

What led us to thinking about blocking was, well, if we know the general spike rate of these kinds of neurons in a particular region of the brain — it varies throughout — computationally, we started deriving some neuromodulation parameters that looked at modeling the neuron and essentially looked at, can we block, can we stop this activity from happening? If you generate an action potential, how do you stop it from generating again? And what consequence might that have on cancer growth? That’s essentially the types of perturbations that we’re looking at. If you’re not familiar with blocking, you can think about it similarly to how you would evoke a neuron. If you stimulate a neuron, you evoke an action potential. When you block, you essentially stimulate at a faster rate such that when it tries to regenerate or repolarize, it can’t. You leave it in this constantly depolarized state. It can’t reach a threshold. It can’t activate. That’s the kind of perturbations that we’re interested in, in the brain and outside of the brain.

[00:26:43] Is it fair to say this is closed loop?

Abhishaike Mahajan: Is it fair to say that this is closed loop and that it is not actively learning about the electrical state of the cancer?

Ben Woodington: Currently, no. This is an open loop system. Rather than thinking of this as a closed loop therapy, we prefer to think of this as a theranostic platform. We have some electrical stimulation where we’re treating this disease and we have a diagnostic function where we’re reading out and we present the clinician and the patient with what is going on. You can imagine that would be overlaid on an MRI to say what’s going on with the tumor in real time, very fast. That’s great — it’s a new diagnostic weapon. Eventually, of course, you train these models and these systems to be better than clinicians so that you can make these systems closed loop. That’s the holy grail across many aspects of neuromedicine, where you no longer have to have a second diagnostic readout and say, how do we tune the stimulation, where do we position the next device, the next electrode. Instead, the system’s doing that for you, optimizing exactly where it’s treating, where it’s stimulating.

Elise Jenkins: It might be helpful to understand that when we started trialing these types of blocking stimuli in the brain — high frequency blocking is not new, people have been doing this for a while throughout the body — there is a way that you can understand whether or not you’re having an effect in the brain. What we typically do is record a segment of electrical activity, process that data, look at the frequency band. We apply stimulus, we then record again, and we see that there is a transient response when you deliver this type of neuromodulation, where you can see a decrease in the high gamma range that we’re interested in. And that’s also how we threshold. That’s also how we would work out — do we need to increase the stimulus, do we need to change the stimulus, do we need to change which pairs of electrodes we might be using to achieve a certain area of activation or blocked regions of the brain. We can use those kinds of methods of pre and post recordings to tell us whether or not we’re having the effect that we desire.

[00:29:05] Why not just spam the tumor with constant stimulation?

Abhishaike Mahajan: Incredibly naive question on my end, but it sounds like you just want to be constantly stimulating all points of the tumor as much as possible to prevent it from ever being able to fire off an action potential.

Elise Jenkins: Pretty much.

Ben Woodington: In some scenarios, yes. But we are also running studies where we’re looking at dosing, because there are other mechanisms at play as well. Some of those mechanisms — you don’t need to be stimulating and spamming them constantly. You can perhaps dose once a day and elicit some local biological response as well.

Abhishaike Mahajan: I’m curious, if you’re actively going to be working with a clinician to tune the actual inner workings of the SOMA, what are the knobs of control that the clinician is actually allowed to tune? If it seems like just overloading the tumor with stimulus is what you’re doing in practice.

Ben Woodington: I would draw a parallel between radiotherapy. Whole brain radiotherapy is really fucking grim. If you’ve ever seen a patient go through whole brain radiotherapy, it is gnarly. It is awful. It affects their whole brain, as the name suggests. And those patients are never quite the same afterwards. Clinicians don’t want to do that. They do it as a last resort and they try to use very focused technologies where they can hit the tumor very hard and spare the rest of the brain. That would be the approach that we would take as well. And it’s the approach that Optune takes — they focus that field, focus the stimulation to as much of a concentrated point as they can, so they can hit that area as hard as possible. That’s what we would want to do as well, rather than targeting the whole brain.

Abhishaike Mahajan: Instinctively, why is there any off-target effects if all of the threads are around the tumor?

Ben Woodington: Because current spreads.

Elise Jenkins: The network in the brain is crazy. You have long-range projection neurons. You might be affecting the body of a neuron in one area that has a projection going very far away. There is definitely a network effect. One of the things that we’ve been looking at is how you can computationally model what the affected area of tissue might be — affected meaning the area that would be blocked. We’ve integrated neuron models into our computational models that essentially tell us, this is the threshold that we need to hit in order to block a neuron X distance away. You build these really nice balloon-type shapes around the electrodes that tell you how far you’re actually going to reach if you want to block neurons. And then of course there are probably some neurons, as an extension, that are connected somewhere else in the brain’s network. At the same time, you’re also imagining that in glioblastoma, when these patients are having resections, they’re having big chunks of tissue just taken out. Trying to preserve function and being aware of which areas of eloquent cortex you want to try to avoid so that you’re not inhibiting movement or inhibiting speech or inhibiting critical functions — trying to design the stimulation parameters or the way that you might activate those electrodes, where should they be in order to avoid those spots. You can do that by taking the MRI into account as well.

[00:32:31] Why MRI safety is non-negotiable for oncology devices

Abhishaike Mahajan: Actually, I’m curious — you do mention that SOMA is MRI-safe. Why is that important or particularly useful?

Ben Woodington: It is absolutely critical for oncology. MRI is not going anywhere. It’s the gold standard imaging technique. It’s used in the brain more than anywhere else. In glioma cases, they ideally want the patient to be having an MRI every three months. If you introduce a device to the body that is non-compliant with MRI, that’s a massive problem. That’s one thing. The second step is not just inducing compliance into the device, but making sure that device doesn’t cast any artifacts. MRI relies on magnetic fields, and if you have a magnet in your device or large chunks of metal, that will affect the MRI image. You start casting shadows, and your clinician’s not going to like that. We’ve spent a lot of time engineering this device and using technologies in this device to overcome this issue.

[00:33:35] Walk us through the patient journey from diagnosis to implantation

Abhishaike Mahajan: External from the actual inner workings of the device with regard to therapeutic interventions and the biomarker readouts, I am curious about the practical use of this device in a clinician’s workflow. What will it look like? You’re diagnosed with glioblastoma — you immediately have this put in, or what?

Ben Woodington: We’re going to work our way up through patients. Our first patients will be the most sick, recurrent patients who are probably coming in for their second surgery at this point, and the device will be left behind. Then we’d be moving towards newly diagnosed patients. Let me walk through what that generally looks like for a patient. A patient would usually present with perhaps a seizure — fit, healthy, 42-year-old man or woman, has a seizure. They go to A&E, the doctor will say, I think you should have an MRI. You have an MRI. They spot the tumor. Pretty quickly you’re brought into surgery — within a few weeks, ideally — for a resection. Our best clinical access point would be right there. Leave that device behind at that first surgery. The patient will then have radiotherapy and chemotherapy, temozolomide usually. Eventually we want to work our way up and be at the top of that pile.

Abhishaike Mahajan: Let’s say you run the clinical trial with SOMA. The thing that I would be instinctively curious about is that the patients who are most willing to have this device put in are also the sickest, and potentially the device might not be able to do anything at all. Is that at all a concern?

Ben Woodington: No. We will have done a lot of work preclinically to validate this technology. We’re adopting some of the lessons from Novocure as well, which is now very clinically validated — it’s been on the market for 20 years. Of course, early feasibility patients are always signing up for a clinical trial. We won’t be signing those patients up saying we guarantee this is going to end your disease and you’re going to live forever. That’s not something we can do, just like any other drug or device trial. We’re going to be working within the bounds of ethics of clinical trials as well. But there’s a hell of a lot of work that leads up to that so that we’re confident we’re going to have a clinical and therapeutic effect.

[00:36:13] The Michael Levin question: can you reprogram cancer back to normal?

Abhishaike Mahajan: One thing I did want to ask — whenever someone outside of the bio field hears about bio, their first thought is Michael Levin. If I put my Michael Levin hat on and look at Coherence Neuro, my thought is: well, if you put SOMA into a glioblastoma, why can’t you just reprogram it back into a normal neuron, because all cancer is membrane depolarization gone awry? That probably isn’t true, but what is your view of the Levin-esque understanding of bioelectricity?

Ben Woodington: I’m going to hold back for a second.

Elise Jenkins: The way I interpret Levin’s interpretation of what’s going on in cancer is essentially that cancer is maybe mostly influenced by external cues, less so by genetic abnormality. I think that aligns a lot with the way that we’ve been building this technology and how we would use it — we’re trying to influence the environment, given that that’s a dominant factor for how these diseases are able to thrive. If you can believe that cancer cells are able to modify themselves to thrive in a particular environment, it shouldn’t be so far-fetched or impossible to believe that the same can be said in reverse. I think the challenge against some of this thinking is that cancers are normally diagnosed at a really late stage. At that point you have a significant number of driver mutations that have happened. Saying that you can essentially nudge these cells back into a healthy state is a bit of an oversimplification. However, I do think that by using something like a bidirectional interface — where we’re no longer just relying on what we see in a cell at a specific point in time, a snapshot at day one or day five, and we miss everything that happens in between — I think there is a lot of information that we can get out of longitudinal data. What happens to that cancer cell or that environment over the course of its evolution? We don’t have that yet, and that’s exactly what we’re trying to build. And I think it’s not infeasible to think that these types of devices will be able to have single-cell resolution at some point. So while I don’t fully buy just yet that you can just nudge these cells back into a healthy state, I don’t think it’s so far-fetched. If we understood what was happening that makes them change through time — which we still don’t know — perhaps if we listen to them and read from them and learn what’s happening, I don’t think it’s that crazy that we can start thinking about what kind of nudges we need to make to put them back into a healthy state. This is not super crazy.

Abhishaike Mahajan: So the argument is that at the very start there is genuinely membrane potential gone wrong, and then driver mutations are acquired, and then it’s irreversible.

Elise Jenkins: I think that in a simplistic view, you could say that, but there is so much going on in a cell. It’s an oversimplification to say that one single voltage channel is driving this entire process. I think there are multiple things going on — multiple different membrane potential-mediated interactions that are happening that drive the change in DNA or whatever else it might be that says, now change this expression, express more of this protein, so that you can leverage the environment. As they’re growing, that growth happens exponentially. You’re having so many more of these mutations happening. And as the environment changes, they change again. They’re really clever at figuring this out. I think that because we catch it after so many things have happened, it’s very hard to work out how you’re going to go back and change 15 or 16 different steps with one single application.

Ben Woodington: In a highly heterogeneous tumor environment that now has 40 different cell types or however many.

Elise Jenkins: And I don’t think you can make the claim that if you stimulate that environment — let’s say you try to target just the cell — that there’s no consequence to the neighboring non-cancerous, healthy participating cells. You’re going to modify them too. How do you design a protocol or a system that essentially only targets those specific channels, for example? If you look back at around 2010, there was a really incredible review article that looked at what happens in cancer cells — what happens to the membrane potential, what happens when they depolarize just before they enter a certain stage of the cell cycle. It’s called cell cycle-mediated membrane fluctuations. There’s so many things involved in that process. I don’t know how you could really specifically target one specific version of the ion channel that can mediate that change. It’s complex.

Ben Woodington: I’m going to go one level higher. Ion channels are important. Membrane potential is important. And our best chance to mess with it and target it is by using electrical biological interfaces like high-density BCIs. That’s exciting. I think there’s a lot of potential there. I don’t think we know yet what the downstream effects can be, because a lot of this has been done in a dish. A lot of this has been done maybe in simplistic rodent models and not a lot of this has been done at the network level and single-cell level in a human brain. So I think it’s exciting, there’s a lot of potential. Jury’s out on whether you can fully reverse cancer back to a healthy state.

[00:42:29] Efficacy, hospice settings, and the utility of the neuromodulation literature

Abhishaike Mahajan: At least for SOMA-like devices that have been tried in mouse models — has there been anything more complicated than mice, or is it just mice?

Ben Woodington: Mice for cancer. All of our safety work is done in larger mammals. The cancer models in larger mammals are less useful, let’s say. Spontaneous models in some companion animals can also be used.

Abhishaike Mahajan: How well does this work in mice? Like neuromodulation for cancer.

Elise Jenkins: In pharmacological settings, what people have shown is around 50% reduction in DIPG, so pediatric glioma. That’s essentially our target. We’ve been looking at how different types of stimulation parameters work in glioblastoma versions of those models. We’re still working on that right now.

Abhishaike Mahajan: Is that for monotherapy or is that combined with something else?

Elise Jenkins: We do both. We’re looking at combination treatment with the standard of care, which is temozolomide, and we also do standalone treatment with a host of different neuromodulation parameters.

Abhishaike Mahajan: Actually, this is something we completely did not discuss. Is SOMA useful even if a patient is going to live only two more months — just for reducing symptoms? Is there a good argument that could be made there, or is it iffier?

Ben Woodington: There has been work that has shown that neuromodulation can be used to reduce seizure activity. And electrical stimulation can be used to reduce seizure activity. There’s a hell of a lot of work that’s been shown that you can reduce pain. You’re talking to some of the same nerve bundles that the sensory neurons are traveling down. I think it’s very likely that we will end up reducing seizure burden, pain burden, et cetera. But again, we can’t speak to our rodents. We’ll have to find out when we do our intraoperative and safety work in humans.

Abhishaike Mahajan: My impression is — are you able to just automatically take advantage of all the neuromodulation literature that’s out there when using SOMA, or do you need to build up your own corpus of knowledge because it’s a brand new device being used for cancer at the site of where cancer just was?

Ben Woodington: Both, right? We massively leverage elements of Optune and Novocure and elements of the neuromodulation world. This isn’t coming out of thin air. There is a body of work that’s been around in the neuromodulation world for 60 or 70 years. We leverage a lot of that, looking at how electrical stimulation routines affect biology and neural firing. We are adopting some of that and applying it to new diseases.

Elise Jenkins: The only thing I would add is that especially on the device product development for the human device for SOMA, there are so many neuromodulation and electrical neuromodulation devices that exist that we can absolutely learn from — to the extent of how do you characterize the electrodes. This is really important to make sure that they’re safe. All of that literature is directly relevant and useful, and we use it all the time.

[00:45:52] Why start with glioblastoma instead of an easier cancer?

Abhishaike Mahajan: This is more of a broader question, but if it turns out that cancer broadly interacts with the nervous system, why go after glioblastoma specifically? Pan cancer would be too ambitious of a goal to start with, but alternatively, why not go after a cancer that’s perhaps less fatal and a bit easier to work with?

Ben Woodington: I think pan cancer is the right amount of ambition. We do want to go pan cancer with this. I think what we’ve seen historically in cancer treatments — the ones that really move the needle are the pan cancer approaches that tackle some fundamental mechanism. Cut the thing out — one of the most effective approaches for cancer treatment. Burn it with radiation — one of the most effective treatments. Chemo. Immunotherapies. Things that affect lots of tumors. We’re going for the same thing. We want to build devices — one for the brain, one for the torso — and we want to go after as many solid tumors as we possibly can. Any of them that we see an effect in, we’ll be pushing forward. Why start with the brain then is the next question. Because it’s hard. There are a number of reasons for this — economic, technical, and cultural. Number one, Novocure has set the stage for the use of electrical devices in brain cancer. The FDA regulators and payers are comfortable with the use of electrical stimulation devices now in glioblastoma. That’s a big cultural moment for these kinds of devices. Clinicians are comfortable with the use of these devices and with physical modalities of treating the disease as well.

Elise Jenkins: 70 to 80% of patients are having surgery.

Ben Woodington: Number two is the surgical elements. For our first devices — and this may not be the case forever — we’re implanting, we’re going inside. So we want to be looking at diseases where surgical intervention is not uncommon. Bring the barrier right down. The risk floor is established. Leaving something behind is marginal risk there, versus trying to justify with many of these neurotechnology companies trying to justify new surgery for a patient — the risk-reward starts to get complicated. For us, we don’t have that issue. And then the final large reason — there are many, but the final large one — is how many therapy options are out there, and what are the macros looking like for new interventions coming to these diseases. Glioblastoma, unfortunately, is not a pretty picture when it comes to what’s on the horizon. The standard of care has not changed for 25 years. Optune is probably the most transformational thing that’s happened in those 25 years. Outside of that, there isn’t a lot of hope. There aren’t many clinicians singing the praises of other technologies coming online over the next 10 years. We want to be at the top of that pile. We want it to be resection, radiation, radiotherapy, chemo, and us. That’s not as easy of an equation for, say, breast cancer.

[00:48:57] Regulatory strategy and the reimbursement threat

Abhishaike Mahajan: You mentioned that Optune has paved the way for medical devices to be used. I’m assuming that by virtue of this being invasive, there will be some new territory that you have to navigate. What is that new territory? What are the logistical and regulatory challenges ahead?

Ben Woodington: Regulatory — we’re not scared of regulatory. We will get this device approved. It’s very likely that we’ll get breakthrough designation for this device. I’m not concerned about that. Reimbursement is your biggest threat and challenge in devices, always. We need to be designing trials, designing the device, designing how the patient interfaces with these devices, and how that also makes payers happy. Novocure has laid some of that groundwork, but there will be different costs. There will be different costs involved in the surgery with the patient, how the surgeons are interacting with the device, how the external components are being supplied to the patient. We need to design trials and a go-to-market strategy that lends itself to that.

Abhishaike Mahajan: This is maybe related to the actual implantation process itself, but do you need a Coherence employee alongside the surgeon, helping guide how exactly the device is put in?

Ben Woodington: It’s a good question, but no. We’ve been designing these technologies alongside clinicians, neurosurgeons, and neurologists to make sure that we are compatible with what they already do. How they plan surgeries, how they implant devices — so that we’re not having to build a hundred-million-dollar robot.

Abhishaike Mahajan: The Neuralink way.

Ben Woodington: There’s obviously some incredible engineering that’s gone into that, but right now we want to get into patients as quickly as possible. They don’t have much time and we want to get there fast. The best way to do that is by giving a device to a clinician that requires minimal surgical training to start

implanting in patients.

Abhishaike Mahajan: That makes sense. Well into a question I’ve had for 30 minutes now. What are the axes of improvement that are on the table for SOMA to be improved upon?

Ben Woodington: Size is critical. The smaller you can go, the wider your patient population and the safer these technologies are. Everyone’s trying to move to more and more minimally invasive approaches where eventually, as Elise said, you’re going through some very small, single-digit millimeter access point into the body.

Abhishaike Mahajan: If it’s at a certain size, are you limited to the most severe, largest tumors?

Ben Woodington: We’re already very small. Our device is about half the width of a Neuralink device, which is compatible with standard burr perforations into the skull — the kind they’ll do for a biopsy, for example.

Abhishaike Mahajan: Is there a good visual indication?

Ben Woodington: A thumbnail. About the size of a thumbnail.

Yeah. Now there are other improvements, of course — power efficiency, electric coverage, all these kinds of elements. Eventually maybe you want high-density electrical coverage to get more and more precision in your stimulus and recording. There’s of course always improvements to be made.

Elise Jenkins: Big one for me is access. Right now, one of the limitations that you might see across a lot of neurotech platforms going out today is how much access of the brain can they get. Neuralink has a really high-density multi-thread device, but they’re all going into a specific region of the brain. One thing that we’ve been really focused on is how do you get multiple access points? How do you create a device or a platform that can access the front of the brain, versus the side of the head, versus somewhere at the back of the head — so you can access multiple areas, but your surgery is still very minimally invasive. By shrinking everything down really small, you can imagine not just having one of them — maybe you can have multiple of them. Now you’re not only accessing this specific region of the brain, but also this region and this region, or across hemispheres, to see if it’s migrating across. That’s something that is pretty hard but quite interesting on our end.

Abhishaike Mahajan: Actually, if a tumor is on one side, why would you care about what’s going on on the other side? The tumor is at the occipital cortex — why would you care about what’s going on in the frontal cortex?

Elise Jenkins: Because of these network effects in the brain. You have a crossover point in the corpus callosum where you have neurons — motor activity that might be happening on one side is actually projecting over.

Abhishaike Mahajan: Okay, they’re projecting on over.

Elise Jenkins: So you can imagine that if you have a very diffuse tumor that is making its way across the brain and actually going to project into the other hemisphere — if, long down the line, you had a device on the primary side of the resection with some electrodes or probes in that region, but you know that they are going to at some point migrate across, you can also put an electrode there and pick up the signals before they start moving across, and maybe start stimulating earlier on that side of the brain.

Abhishaike Mahajan: Wait, what do you mean by migrate across?

Elise Jenkins: It’s not uncommon for very diffuse tumors to move from one hemisphere across to the other hemisphere.

Abhishaike Mahajan: I did not know that. It’s terrifying.

Elise Jenkins: It’s really terrifying. And you can’t see this on MRI for diffuse tumors because they don’t pick up the contrast. You can’t see them, which is a problem. But you can record them. And we know that we can record them. So if you were able to implant in multiple regions of the brain across hemispheres, you can start to actually record when that is happening. You can pick them up from long-range projection neurons as well. We could start recording that information and also start intervening at a much earlier time point.

Abhishaike Mahajan: Is it obvious how many SOMA-like devices you would want in a glioblastoma patient’s brain? Is there a max — like seven of them is enough to cover all the important spots?

Elise Jenkins: It would be entirely based on their MRI. In the pre-operative setting, you would take their MRI. The surgeon will know the extent of resection that they will likely be able to perform, and you’d pre-plan with software that essentially tells you: position the electrodes in this position to get this coverage. That would be how we would do that.

[00:55:37] How well does mouse-to-human translation work for neuromodulation?

Abhishaike Mahajan: Returning back to an earlier thread about all the mouse discussions we’ve been having — how big of a concern is translatability from a mouse platform to pig, to human? Is membrane depolarization a pretty well-conserved phenomenon across all life, or is it case by case?

Elise Jenkins: Particularly in neurons, it’s very well conserved from mice to pigs to humans. We started almost all the way in computation — in silico — then we went into in vivo models. In vivo models for cancer are mouse models.

Abhishaike Mahajan: What do in silico models look like for neuromodulation?

Elise Jenkins: You model the neuron using a Hodgkin-Huxley model. You can computationally, mathematically build that model. You can get that model to generate a specific spike rate. Those are quite well characterized depending on the region of the brain you’re in. Then you can start applying stimulus in silico — computationally — that helps you with selection of what kind of stimulation parameters you think might work best for the region of the brain that you’re in. Then you go into mouse models. These are the most relevant models we can use for oncology. We use orthotopic models, xenografted models. We take human cells, put them in the brain. It’s quite a hard model to do. Then we take a device that is already very difficult to make small for humans and we make it 10 times smaller and put it in a mouse brain. We do a number of tests over short durations to work out the optimal stimulation parameters and the effect of those. Then we go to large animals. We’ve done large animal studies. We’ve been able to show the same suppression effect in large animals in healthy brain. And the natural progression from that is to go into humans. We just got approval to do our first-in-human study to try the stimulation parameters in humans. So far the trajectory seems as good as we can possibly expect.

Abhishaike Mahajan: That’s exciting. Does that translate to a Phase 1 trial?

Ben Woodington: It’s our first-in-human safety work. It’s not a Phase 1, but we’ll be doing recording, mapping, and stimulation safety across the brains of patients.

Abhishaike Mahajan: And this is for glioblastoma patients?

Ben Woodington: It’s for glioblastoma. And that will lead into our next phase.

[00:58:09] Why didn’t this exist 10 years ago?

Abhishaike Mahajan: Exciting. Why does this not exist today? Why doesn’t every glioblastoma patient have this?

Ben Woodington: There has been a lot of innovation in neural implants over the last 20 to 25 years. Miniaturization of electronics, better powering methods, new electrode materials and lead materials. There has been a hell of a lot of innovation — things that didn’t exist in the early 2000s, frankly. On top of that, with Neuralink coming to the table in BCI, there’s obviously been a lot more focus on the use of these kinds of technologies across diseases. Many diseases. And a lot more cultural acceptance from clinical centers to adopt them. I think that’s one of the reasons.

Elise Jenkins: I also think that for us, the scientific underpinnings of these interactions are still very new. The discovery of bioelectricity is not new, as we said before, but the neural interactions that are happening and observed in cancers are really new. I think that in combination with the ability to miniaturize technology and get it implanted chronically and record and stimulate these environments for patients who have literally nothing else — those have been the limitations before now.

Ben Woodington: And to underline how new that is — we sometimes present to academic cancer groups or cancer neuroscience groups. We show our mouse setups that we’ve developed. And it’s a bit mind-blowing for them that you can do high-density neural recordings across the brain of a mouse over months — four, six months. These are technologies that haven’t quite existed in that way. They didn’t exist in that way 30 years ago. We are really at the early stages of that.

Abhishaike Mahajan: When you go to something like the AACR and present your results, it seems like such an interdisciplinary field. There probably can’t exist that many people in the world who really understand the intersection of cancer and neuromodulation, and whatever other fields you’re intersecting with. Do most people seem convinced today that there is something here, or are there still skeptics?

Ben Woodington: I think if there are not skeptics, you’re not working on the bleeding edge. You want people to not agree with everything you’re saying. Our interactions with clinicians and cancer biologists, I would say, usually go like this: “I’m not sure.” And then we show them data. We talk through it. We show them devices, we show them work. And then there’s a big buy-in — people are very excited. I think they see the same things that we see. By the way, I had the same interaction. I come from a neurotech background, a neuroengineering background. I was working in spinal cord and spinal cord injury for years. I had the same response when Elise showed me this about four years ago. It took me a while to digest the papers and read the research and then go, “Oh man, why is no one looking at this? There’s so much opportunity here. I would do this device and this device and this device.” And now we are having the same effect with clinicians who start saying, “Well, hang on — this is how I would design the device to do this.” There suddenly becomes quite a lot of buy-in. I think we’re just at that takeoff point right now. I think we’re going to see a lot more attention clinically, and probably some companies as well, take off. And we’re excited about that.

[01:01:48] The founding story

Abhishaike Mahajan: Similarly, when Elise showed you this — these results from three years ago?

Ben Woodington: Four years ago.

Abhishaike Mahajan: Do you think that was the only moment a company like Coherence could have been founded, or was it just right place, right time to discover this information and put all the pieces together and think there is an unmet need here that’s filled very cleanly by this device?

Elise Jenkins: I felt like I was very lucky because I was really interested in — actually, I was bought into the PhD to look at a drug delivery implant. I knew nothing about glioblastoma. I knew nothing about cancer in general. I’m an electrical engineer. I really wanted to understand the problem. When I started looking at this problem, it’s horrific. I started looking at the potential of a drug delivery platform — an implantable drug delivery device — what is it going to offer here? These cancers don’t respond to these drugs. Maybe you can repurpose drugs that can’t cross the blood-brain barrier — that’s one advantage — but they still just manage to evade these drugs and kill patients. Then I heard about Novocure’s work. I was like, absolute bullshit. No way this works. This doesn’t make sense. I built a platform to try and replicate the work. There’s a whole history to that. I was like, I’m going to figure out what’s going on here. I’m very curious. And I could not disprove it. I tried, and I kept seeing what they were showing in their data. These cells would halt when you would deliver this type of electrical stimulus. My PI, George Malliaras, at the time — we were talking about, well, what happens? There’s an infinite parameter sweep that you can do here that looks at uncovering how cancer cells behave when you put them under certain electrical stimulation parameters. And at the time was when the work from Michelle Monje’s lab came out. I think it was 2019. One of my other advisors had pointed me to this work. I was quite into the membrane potential. I was like, maybe that’s what tumor treating fields are doing — they’re modulating calcium ions or calcium modulators in the cell.

Ben Woodington: Levin is right. Novocure just don’t know it.

Elise Jenkins: I was convinced that that was what was going on with Novocure. At that time —

Abhishaike Mahajan: The mitotic spindle theory was not proven out?

Elise Jenkins: It’s definitely been hypothesized.

Abhishaike Mahajan: Even today, it’s not known for sure?

Elise Jenkins: There’s been a lot of evidence that suggests that’s what’s going on. Yes. But from an engineering perspective, it was not making a lot of sense to me. You’ve got a very weak force acting on a very strong force happening inside this protected barrier in a cell. I was struggling to fully comprehend it, but it worked. And under certain directionality — actually, it was a piece of work that we worked on together — it does work. If you can control the direction of an electric field, you have a really profound effect on tumor treating fields. That was what we found. But Michelle’s work came out and that was my holy shit moment. I was working in a lab full of amazing people doing neurotechnology — making wearables, making implants, making spinal cord stimulators, everything you can think of that interacts with the body. Our lab was building it. And I was this weird person doing cancer in the group. This paper came out from Michelle’s group. I invited her to give a talk to our group — totally fangirling. I love her work. I was like, there is such an opportunity here. Initially it was actually more on the recording side. I was like, these neurons are interacting with these cells. You can read this. We’ve always struggled to get single-cell resolution, to reconstruct that, because it’s very noisy in the brain. You’re getting all of the neurons telling us most of the information that’s going on. The cancer cell signals maybe are a lot weaker or at a much lower frequency. If we can listen to the neurons, that’s amazing. That was when I went to Ben and said, let’s use your device, let’s put it in here, let’s listen to what’s going on. And Novocure works, so we’ll stimulate using that. And now it’s like, well, there’s way more opportunities that we can do now, because look at all of these interactions that are happening, and all of them are a function of neuromodulation or something that we can modulate with neuromodulation.

Abhishaike Mahajan: At the time, you were working on novel devices for measuring and stimulating?

Ben Woodington: For spinal cord injury — brain interfaces and spinal cord interfaces.

[01:06:38] Why build your own device instead of using off-the-shelf arrays?

Abhishaike Mahajan: Super cool story. This leads well into a question I had that we chatted about previously — why build your own device for this? Why isn’t there some standard like Utah arrays that you can hijack and use? You don’t have to build your own thing. It doesn’t seem like anyone does that — everyone hand-rolls their own thing for their own purposes. Why is that a practice in this field?

Ben Woodington: It’s a good question. There are white-label device manufacturers where you can take a device and stick it into a neuromodulation indication, stimulate some nerves, and do your thing. But there are indications where it really does make sense to create your own device. You need a certain density of electrodes. You need to be compatible with the clinical workflow — for our case, MRI. We can’t just take any of those off-the-shelf devices. You can’t just stick a Neuralink device in a cancer patient because they’re going to have to have an MRI, and the magnet in that device is going to affect the MRI.

Abhishaike Mahajan: And there’s no off-the-shelf device that’s also MRI transparent?

Elise Jenkins: Definitely not transparent.

It’s really hard to build.

Ben Woodington: So it makes sense for us to design purpose-built devices for the treatment of these diseases rather than taking off-the-shelf devices. That’s not an easy lift. It takes a lot of engineering effort. And we have a very excited but exhausted engineering team who are doing this. But it’s necessary for us.

Abhishaike Mahajan: Returning back to the original story of Coherence — you showed Ben your work, you decided to form Coherence four years ago. What were the initial set of milestones you had set up to prove whether this is a real thing that could be scaled up into a company?

[01:08:35] Speaking with glioblastoma patients

Ben Woodington: For us it’s — do clinicians and patients want this? It’s very easy for engineers and scientists to start creating things that they like, that are passion projects, without actually speaking to the end users. We see this all the time. We went straight out and started speaking to clinicians, and the pull is huge. I don’t think we’ve spoken to a single clinician that has said they wouldn’t use that. Every single one is like, tell me when I can run a trial with this. I want to run a trial with this sort of technology. Then we went out and started speaking to patients. I’ve become friends with a number of glioblastoma patients. I just hang out with them and drink coffee with them and watch them interact with the technologies that they’re using. And I would say it’s pretty universal — this disease sucks and my options suck. I’m using this piece of technology and it’s horrible and I don’t like it. And if you tell me right now that there’s something better, I will go back and get another surgery. That’s a big barrier. People do not like going in for surgery. So getting that clinical and patient pull was huge. Now how do we transform that into tangible milestones? We build the technology that they need. We’ve been doing that now for almost three years, running safety animal studies. As Elise mentioned, we’re then doing a first-in-human safety study. The next piece is — what does our early feasibility look like? Get it in patients. First 10, then a hundred, then maybe 500. And show that there is a meaningful clinical, therapeutic, and diagnostic benefit to these patients.

Abhishaike Mahajan: I’m curious — these glioblastoma patients that you’re friends with — one device they interact with is probably Optune, and I’ve heard it kind of sucks because you have this constantly heated device near your head 24/7, above 18 hours a day. What other technology do they have that they potentially use to help their disease?

Ben Woodington: Not a lot. There are some things on the horizon that people have started experimenting with, that people have been on in trials — looking at ultrasound-type devices, blood-brain barrier disruption-type devices, convection-enhanced delivery devices. They’re not great.

Abhishaike Mahajan: No silver bullet.

Ben Woodington: It’s also just the quality of life and the patient impact. I don’t want to sit here and talk negatively about Novocure. I’m really happy that company exists. I’m happy that technology was created for those patients who are in desperate, dire need. The engineers, the scientists, the people that run Novocure — kudos to them for bringing a novel technology to those patients who desperately need it. And of course those patients are using it because it is extending their lives. But there’s so much more you can do to enhance the quality of life for those patients who don’t want to spend the rest of their lives traveling with companions to align stickers on their head and being affected by skin rashes and the pain associated with all of that. There is much more we can do for those patients.

[01:12:04] What was it like to raise money for this?

Abhishaike Mahajan: Back to the creation of Coherence — you talk to the providers, you see there’s demand. You talk to the patients, you see there’s demand. Now it’s time to raise money. Coherence feels like it’s in this weird place where the thesis is so strange that there’s not really many investors I can imagine off the top of my head who instinctively... did their PhD in this area and understand what you’re talking about. How difficult was it to raise money for a thesis like this?

Ben Woodington: They’ll come around. They’ll see what we all see and they’ll realize how large the pan cancer opportunity is. How hard was it? Both hard and easy.

Abhishaike Mahajan: What was your seed?

Ben Woodington: We did a pre-seed in the end of 2022, early 2023, which was about $2.5 million. And then we’ve just very recently closed our seed round, which was another $10 million. The investors that we’ve brought in follow the same trend that scientists, patients, and doctors all have with us. They’re cautiously skeptical at the beginning — hang on a second, does this work? — and then go in a very big way, get very interested, obsessed, both on the therapeutic opportunities but also on these data creation opportunities. We’re living in a world now where there are a lot of AI bio companies out there, and they desperately need data — novel datasets that are showing progression of disease and novel insights from human biology. That’s exciting to a lot of our investors as well. A lot of people are excited by BCI, but they’re all looking for what’s going to be the killer application. When people get that impression of us, they go all in.

[01:13:56] Beyond cancer: TBI, lung disease, and the pan-disease argument

Abhishaike Mahajan: Speaking of TAM expansion, one market is pan cancer. But I imagine there is a very reasonable logical leap you can make that membrane potential is probably important for a lot of diseases. Is that true? Could you make a reasonable argument that you don’t really need to do these five-year-long Alzheimer’s disease progression readouts — you can get a decent proxy from electrical readouts? Is that at all an argument people are trying to make?

Ben Woodington: It’s an argument that people are trying to make. It’s not something that we’ve done inside the company. But people are exploring electrical and other physical stimulation modalities in Alzheimer’s. People are looking at recording readouts for Alzheimer’s, Parkinson’s, other neurodegenerative diseases. And then of course there’s a whole host of neurological disorders that people are looking at — both electrical readouts and electrical stimulation — and other systemic diseases like diseases of the immune system and other things as well.

Elise Jenkins: The nervous system is involved in everything. I feel like it would not be a surprise to me that these types of interactions — you can pick up a whole host of things going wrong just by looking at nerves or neurons.

Abhishaike Mahajan: Is there any convincing evidence that, outside of cancer, if someone gave you a few million dollars to throw at another indication on top of what you guys are already doing, what would be the next thing?

Ben Woodington: There’s a company that just launched thats looking at targeting the nervous system for treatment of asthma and COPD. Before I did my PhD, I was actually working in lung diseases, drug delivery for lung diseases. I actually think that’s a pretty big opportunity. Chronic diseases — many patients are not managed particularly well. A lot of hospitalizations. There is evidence that you can stimulate certain nerves to relax the lungs, to bronchodilate. And closed-loop opportunities as well, predicting when someone is about to exacerbate. I think there’s a big opportunity there.

Elise Jenkins: Chronic stress or TBI. TBI is interesting.

Abhishaike Mahajan: Why TBI? Actually... I could fabricate an intuition for myself. I’d prefer you guys give one to me.

Elise Jenkins: I think traumatic brain injury is really interesting and has similar attributes to what you can leverage from glioblastoma. A lot of the time when someone has a traumatic brain injury, they’re already going in to put something into the brain — usually a shunt or something. There’s an obvious access point, which I always think is the biggest barrier to entry right now. Until this becomes more mainstream, that’s the biggest barrier. So that’s an obvious one — they’re going in and doing something already. And biomarkers.

And you can do similar types of strategies to suppress activity there. I’m also really interested in the data side of all of this — how diseases, degeneration, whatever else evolves. I think what would be really interesting with TBI is looking at how you can watch a brain go back to its normal, healthy state — what type of biomarkers give us that indication that something is going right, and how do you steer that. That’s really interesting in TBI. Chronic stress is because it’s regulated by adrenergic signaling. You can just target the vagus nerve or something else. I think that’d be quite cool.

[01:17:40] Hiring at Coherence + what is the hardest type of talent to find

Abhishaike Mahajan: That makes sense. One thing I’ve been curious about — I think I interviewed Hunter Davis a few months ago, the Until Labs cryopreservation guy. His company shares some similarity with yours in the sense of being wildly interdisciplinary in a way that very few other companies in the world are. He had some interesting thoughts about how hiring works in companies like that. I’d like to get both of your philosophies on what makes for people you want to join Coherence.

Elise Jenkins: I think probably curiosity and taking lessons from other industries. Some of our engineers come from the robotics industry. Some come from the med device industry. Some are scientists from completely outside of cancer neuroscience. And then we also have cancer biologists who really know cancer, but also know immunology and also know neuroscience. We look for people who are experts in their domain but also have demonstrated interdisciplinary overlap with multiple things. Robotics is a really nice example of that — you have mechanics, electronics, spatial interactions, and those types of things you have to consider in your design. The scientists are some of the most fun to find, because a lot of them are coming from the neuro background — that’s the kind of talent we seem to attract. But when you introduce them to this concept of these interactions that happen in cancer, people’s minds massively expand. Watching that process — when you start going through that in the hiring, or when you bring them on board, and how quickly they go from never hearing about it ever before to being so bought in, building and designing these crazy experiments to try and uncover some new neural biomarker — that’s been really cool to watch. Especially when you have this crazy idea many years ago that no one’s ever heard of, and you’ve got all these people that are super pumped about that discovery and want to build something that interfaces with that discovery. That’s been really cool. Mostly I’m looking for interdisciplinary. Yes.

Ben Woodington: Code-switching across disciplines is super important. It’s the same as a lot of deeply technical companies — it’s about your ramp of being able to learn. How steep is that? Because we have electrical engineers that need to come in and learn biology really fast. We have computational neuroscientists that come in and need to learn how to run what would be adjacent to clinical studies really fast. BCI and neurotech is a field that covers so many touch points — electrical engineering, neurobiology, to the sort of stuff that you do. It’s hard to find people that are willing to spread themselves across that many fields.

Abhishaike Mahajan: What do you think is the rarest skillset to find and/or to teach?

Ben Woodington: We know this because it’s the person we’re always trying to hire. Very good electrical engineers and embedded systems engineers. They’re hard to find.

Abhishaike Mahajan: Is it that there aren’t many hardware people?

Ben Woodington: I think a lot of the electrical engineers that have come out of Stanford or wherever get attracted by the tech industry. They’re often good programmers. So they go to Google or Meta or wherever. We need them when they’re at least a few years into their career, with a few projects behind them, a few product cycles, if we’re lucky. And most of them have gone into tech. Bringing them back into hardware is tricky. I think we’re seeing a shift now. Hardware is kind of hot again. Maybe in a year or two, there’ll be a bit of lag and then we’ll see more hardware people that we can bring into the fold. But it’s always the positions that we’re fighting most for.

Abhishaike Mahajan: I think some of the most talented people who have joined the companies I’ve been a part of have been ex-engineers at places like Cerebras, Uber, or the big SaaS companies. What is the big company in your field that you wish you could just pull all the engineers from to come work for you? Is there one, like Neuralink?

Elise Jenkins: I think Neuralink could be a good one, given that they’ve just taken strides in being the first ones to take both a high-density BCI and a robot into trial in a really short period of time. There’s a lot of things that those people would have learned along the way that could definitely be leveraged at a company like ours. I think there’s a challenge when you’re trying to do something really new, but it’s also a regulated technology. There’s this balance of being able to bring in people who really know how to build medical devices that are not scared of things that are new. That’s a really hard balance to find. There’s no company, maybe apart from Neuralink where that exists.

[01:23:17] What would you do with $100M equity-free?

Abhishaike Mahajan: The last question I have — if you were given a hundred million dollars, equity-free, to push this work forward as fast as possible, but you had to spend it within the next year, what would you spend it on?

Ben Woodington: Can I give one and a half answers?

Abhishaike Mahajan: You can have as many answers as you want.

Ben Woodington: This technology exists. The technology that we’re building fundamentally — there are no more science challenges. This is an engineering optimization piece now. Being able to get those technologies into as many human beings, as many cancers as possible — we could build such insane datasets. We could build such incredible real-time, real-world datasets that would blow a lot of people’s minds for what you can access from that data. You just can’t run that many trials all at once if you don’t have a hundred million equity-free cash. If you’re offering, I will take it. The other super exciting thing would be — fab floor, engineering integration floor, clinical scientists, clinic — all in one building. Everything in house.

Abhishaike Mahajan: Including a clinic?

Ben Woodington: A neuro-oncology clinic. That would be insane. I think you could do that for just about a hundred million if you did it maybe not in America. That would be incredible. Being able to highly iterate — build devices, build them in your own clean room, validate them, get them in patients really fast and start running studies, collecting data, and becoming that hub of those studies.

Abhishaike Mahajan: Why does it matter? Why do you care about having a clinical oncology suite inside the building?

Ben Woodington: So you have some control over the functions, the implants of the device, the same surgeons, quick readouts connected to your teams. When our preclinical and engineering teams are working in unison, it’s humming. You’re getting data out that the engineers and the computational scientists are analyzing overnight, feeding back into the next day’s experiments. That’s not really possible in clinical studies. There’s this barrier between you and the hospital, where you’re waiting for data, then you have to wait, then you have to submit new ethics to run a new study. Being able to turn that wheel super fast would be pretty exciting.

Abhishaike Mahajan: This is leading into a lot more questions, but I am just now realizing I never actually asked — is there an experimental loop that goes on? In rodents, at Coherence — where you design one version of the device, implant it, see how well it works?

Ben Woodington: Constant iteration. Both on our preclinical devices — where we’re recording data from these animals, running new stimulation regimes — and on the primary product development pathway as well. Both of those have tight iterative loops.

Abhishaike Mahajan: You exist amongst many other neurotech companies, and you’re probably the most alien amongst them. Do you pay attention to most of the neurotech research that’s going on outside of your immediate field, or is it not super applicable to what you are doing?

Ben Woodington: Firstly, I take it as a great compliment to be called the most alien neurotechnology company. That’s good. Secondly, both of us are having conversations almost every day about what’s going on in the field. It’s entirely relevant, both from a technology landscaping exercise and from a cultural landscaping exercise — which indications are getting more heat in the use of neurotechnology, where are people most excited, what are the innovations convincing more clinicians and patients to adopt these technologies. We need to be abreast of all of this, because there are some similarities with the technology stack and how it’s introduced to the patient as well.

[01:27:15] Are you a neurotech company or a cancer company?

Abhishaike Mahajan: Do you think you’re a neurotech company with ambitions to attack cancer, or a cancer company with ambitions to use neurotech?

Ben Woodington: I personally am a neurotechnologist that wants to develop technologies that can help a lot of people. And oncology seemed like the fastest and highest-impact route to get there. If I can speak on behalf of Elise — and maybe she’ll say I’m wrong — I think Elise comes more from an “oncology matters, and I’m going to use whatever tool I can to help these patients, and this makes sense” perspective. Is that an accurate read?

Elise Jenkins: I think so. I don’t know why the two have to be separate. There are so many debilitating conditions and diseases that need attention. There are two major diseases in the world that are causing death or suffering for a lot of people — cardiovascular disease and cancer. I feel like we can have a really big impact here by leveraging technology that is well-established in other indications that could have huge potential in cancer. I fit in either camp. I want to develop technology that will benefit people.

Ben Woodington: That’s fair. I’m more neurotechnology-pilled. Neurotechnology is crazy. Why are we not using it in all these indications? It’s amazing. It’s going to change everything — from the extreme cases that some of the neurotechnology and BCI companies are making to just day-to-day medicine. I just think that cancer is an extremely promising way to get there and to scale these technologies into a lot of people.

Abhishaike Mahajan: Do you suspect that the full landscape of possible perturbations is pretty limited and you’ve discovered most of them, or you may actually expand that over time?

Elise Jenkins: Initially it’s looking at well-established regimes. If you were to take what Setpoint or Galvani were doing in vagus nerve stimulation for rheumatoid arthritis — they’re targeting immune response there, with well-established parameter sets that are published in literature and have been done in humans. Those are the safer bets that you’d want to try in a novel indication. We are starting with those types of things, with some variations that depend on the nerve that you’re targeting, whether you want to increase immune function or immune activity or decrease stress. They’re very different types of stimuli that you’d apply, but they are well-established.

Ben Woodington: It’s actually a real problem in clinical programming generally — not in our field, but in other fields, for example, pain. The clinical programming profession hasn’t caught up with the engineers. You’ve got more and more complex devices. You’ve now got hundreds, in some cases, of electrodes with thousands of different potential waveform characteristics that you could apply to each electrode. Which gives you this multi-billion parameter operational space. And then you’ve got a clinical programming nurse sitting there saying, where do I even start on this? There’s this massive space now that I have to operate in to try and treat the pain of this person. It’s a job that probably will end up being done by some AI model down the road, using some sort of Bayesian optimization — not a nurse going, “does it feel better or worse by doing this?”

Abhishaike Mahajan: Is that currently how it’s done?

Ben Woodington: It’s currently how it’s done. You would be quite surprised how much human-in-the-loop there is in electrophysiological medicine, where you’ve got people watching a screen saying, “I think they’re going to have a seizure soon.” And someone else going, “well, better stimulate their brain to stop that happening.” And there’s no model, no computer really in the loop giving early indication.

Abhishaike Mahajan: You mentioned a bit about what gives you anxiety, Ben. I’m curious what gives you anxiety, Elise, or if you’d like to add to your answer.

Elise Jenkins: I think for me it’s maybe a combination of anxiety and frustration. You want to move as quickly as humanly possible. The impact that we need has to happen in humans. You need to get to humans as quickly as possible, but you don’t have all the answers in that design process. That is frustrating and can be anxiety-inducing. You’re having to make some assumptions about what might happen in certain scenarios, or how to design this implant, and it has to be safe, of course. That iteration — you just want to get to humans as quickly as possible, but you have all of these things that you need to consider. That’s frustrating, drives me a little insane.

Ben Woodington: I totally agree with that. You work in the cancer field yourself, correct? We’re not in the ads business. We’re not interested in pumping out a few extra targeted ads to people. We’re in the game of actual human beings who are dying quickly. And we’re trying to get technologies that can help those patients as quick as possible. That is frustrating. That is anxiety-inducing, especially when you maintain a close connection with those patients and you see those patients dying. And then you’re screaming at people in the office to move quicker because you’re very connected to that.

Abhishaike Mahajan: I don’t think I have any other questions. This has been an amazing conversation. Thank you so much, Elise and Ben, for coming on.

Ben Woodington: Thank you so much. It’s been a pleasure. Had fun.

Heuristics for lab robotics, and where its future may go

Abhishaike Mahajan — Mon, 09 Feb 2026 12:42:22 GMT

Note: this article required conversations with a lot of people. A (hopefully) exhaustive, randomized list of everyone whose thoughts contributed to the article: Lachlan Munroe (Head of Automation at DTU Biosustain), Max Hodak (CEO of Science, former founder of Transcriptic), D.J. Kleinbaum (CEO of Emerald Cloud Labs), Keoni Gandall (former founder of Trilobio), Cristian Ponce (CEO of Tetsuwan Scientific), Brontë Kolar (CEO of Zeon Systems), Jason Kelly (CEO of Ginkgo Bioworks), Jun Axup Penman (COO of E11 Bio), Nish Bhat (current VC, ex-Color cofounder), Amulya Garimella (MIT PhD student), Shelby Newsad (VC at Compound), Michelle Lee (CEO of Medra), Charles Yang (Fellow at Renaissance Philanthropy), Chase Armer (Columbia PhD student), Ben Ray (current founder, ex-Retro Biosciences automation engineer), and Jake Feala (startup creation at Flagship Pioneering).

Introduction

I have never worked in a wet lab. The closest I’ve come to it was during my first semester of undergrad, when I spent 4 months in a neurostimulation group. Every morning at 9AM, I would wake up, walk to the lab, and jam a wire into a surgically implanted port on a rat’s brain, which was connected to a ring of metal wrapped around its vagus nerve, and deposit it into a Skinner box, where the creature was forced to discriminate between a dozen different sounds for several hours while the aforementioned nerve was zapped. This, allegedly, was not painful to the rat, but they did not seem pleased with their situation. My tenure at the lab officially ended when an unusually squirmy rat ripped the whole port system out of its skull while I was trying to plug it in.

Despite how horrible the experience was, I cannot in good conscience equate it to True wet lab work, since my experience taught me none of the lingo regularly employed on the r/labrats subreddit.

I mention my lack of background context entirely because it has had some unfortunate consequences on my ability to understand the broader field of lab automation. Specifically, that it is incredibly easy for me to get taken for a ride.

This is not true for many other areas of biology. I have, by now, built some of the mental scaffolding necessary for me to reject the more grandiose claims spouted by people in neurotechnology, toxicology prediction, small molecule benchmarks, and more. But lab robotics eludes me, because to understand lab robotics, you need to understand what actually happens in a lab—the literal physical movements and the way the instruments are handled and how materials are stored and everything else—and I do not actually understand what happens in a lab.

Without this embodied knowledge, I am essentially a rube at a county fair, dazzled by any carnival barker who promises me that their magic box can do everything and anything. People show me robots whirling around, and immediately my eyes fill up with light, my mouth agape. To my credit, I recognize that I am a rube. So, despite how impressive it all looks, I have shied away from offering my own opinion on it.

This essay is my attempt to fix this, and to provide to you an explanation of the heuristics I have gained from talking to many people in this space. It isn’t comprehensive! But it does cover at least some of the dominant strains of thought I see roaming around in the domain experts of the world.

Heuristics for lab robotics

There are box robots, and there are arm robots

This is going to be an obvious section, but there is some groundwork I’d like to lay for myself to refer back to throughout the rest of this essay. You can safely skip this part if you are already vaguely familiar with lab automation as a field.

In the world of automation, there exist boxes. Boxes have been around for a very, very long time and could be considered ‘mature technology’. Our ancient ancestors relied on them heavily, and they have become a staple of many, many labs.

For one example of a box, consider a ‘liquid handler’. The purpose of a liquid handler is to move liquid from one place to another. It is meant to take 2 microliters from this tube and put it in that well, and then to take 50 microliters from these 96 wells and distribute them across those 384 wells, and to do this fourteen-thousand times perfectly, which is something that humans eventually get bored with doing manually. They must be programmed for each of these tasks, which is a bit of a pain, but once the script is written, it can run forever, (mostly) flawlessly.

Here is an image of a liquid handler you may find in a few labs, a $40,000-$100,000 machine colloquially referred to as a ‘Hamilton’.

Why do this at all? Liquids are awfully important in biology. Consider a simple drug screening experiment: you have a library of 10,000 compounds, and you want to know which ones kill cancer cells. Each compound needs to be added to a well containing cells, at multiple concentrations, let’s say eight concentrations per compound to generate a dose-response curve. That’s 80,000 wells. Each well needs to receive exactly between 1 and 8 microliters of compound solution, then incubate for 48 hours, then receive 10 microliters of a viability reagent (something to measure if a cell is alive or dead), then incubate for another 4 hours, then get read by a plate reader. If you pipette 11 microliters into well number 47,832, your dose-response curve for that compound is wrong, and you might advance a false positive, or even worse, miss a drug candidate.

Difficult! Hence why automation may be useful here.

Many other types of boxes exist. Autostainers for immunohistochemistry, which take tissue sections and run them through precisely timed washes and antibody incubations that would otherwise require a grad student to stand at a bench for six hours. Plate readers, often used within liquid handlers, measure absorbance or fluorescence or luminescence across hundreds of wells. And so on.

Boxes, which can contain boxes within themselves, represent a clean slice of a lab workflow, a cross-section of something that could be parameterized—that is, the explicit definition of the space of acceptable inputs, the steps, the tolerances, and the failure modes of a particular wet-lab task. Around this careful delineation, a box was constructed, and only this explicit parameterization may run within the box. And many companies create boxes! There are Hamiltons, created by a company called Hamilton, but there are boxes made by Beckman Coulter, Tecan, Agilent, Thermo Fisher, Opentrons, and likely many others; which is all to say, the box ecosystem is mature, consolidated, and deeply boring.

But for all the hours saved by boxes, there is a problem with them. And it is the unfortunate fact that they, ultimately, are closed off from the rest of the universe. A liquid handler does not know that an incubator exists, a plate reader has no concept of where the plates it reads come from. Each box is an island, a blind idiot, entirely unaware of its immediate surroundings.

This is all well and good, but much like how Baumol’s cost disease dictates that the productivity of a symphony orchestra is bottlenecked by the parts that cannot be automated—you cannot play a Beethoven string quartet any faster than Beethoven intended, no matter how efficient your ticketing system becomes— similarly, the productivity of an ‘automated lab’ is bottlenecked by the parts that remain manual. A Hamilton can pipette at superhuman speed, but if a grad student still has to walk the plate from the Hamilton to the incubator, the lab’s throughput is limited by how fast the grad student can walk. An actual experiment is not a single box, but a sequence of boxes, and someone or something must move material between them.

Now, you could add in extra parts to the box, infinitely expanding it to the size of a small building, but entering Rube-Goldberg-territory has some issues, in that you have created a new system whose failure modes are the combinatorial explosion of every individual box’s failure modes.

A brilliant idea may occur to you: could we connect the boxes? This way, each box remains at least somewhat independent. How could the connection occur? Perhaps link them together with some kind of robotic intermediary—a mechanical grad student—that shuttles plates from one island to the next, opening doors and loading decks and doing all the mindless physical labor? And you know, if you really think about it, the whole grad student is not needed. Their torso and legs and head are all extraneous to the task at hand. Perhaps all we need are their arms, severed cleanly at the shoulder, mounted on a rail, and programmed to do the repetitive physical tasks that constitute the majority of logistical lab work.

And with this, we have independently invented the ‘arm’ line of lab robotics research. This has its own terminology: when you connect multiple boxes together with arm(s) and some scheduling software, the resulting system is often called a “workcell.”

As it turns out, while only one field benefits from stuff like liquid handlers existing—the life-sciences—a great deal of other fields also have a need for arms. So, while the onus has been on our field to develop boxes, arms benefit from the combined R&D efforts of automotive manufacturing, warehouse logistics, semiconductor fabs, food processing, and any other industry where the task is “pick up thing, move thing, put down thing.” This is good news! It means the underlying hardware—the motors, the joints, the control systems—is being refined at a scale and pace that the life sciences alone could never justify.

Let’s consider one arm that is used fairly often in the lab automation space: the UR5, made by a company called Universal Robots. It has six degrees of freedom, a reach of about 850 millimeters, a payload capacity of five kilograms, and costs somewhere in the range of $25,000 to $35,000. Here is a picture of one:

Upon giving this arm the grippers necessary to hold a pipette, to pick up a plate, and to click buttons, as well as the ability to switch between them, your mind may go wild with imagination.

What could such a machine do?

Arms within boxes? Wheels to the platform that the robot is mounted upon, allowing it to work with multiple boxes at once? So much is possible! You could have it roll up to an incubator, open the door, retrieve a plate, wheel over to the liquid handler, load it, wait for the protocol to finish, unload it, wheel over to the plate reader, and so on, all night long, while you sleep and dream. This is the future, made manifest.

Well, maybe. If this were all true, why are there humans in a lab at all? Why haven’t we outsourced everything to these cute robotic arms and a bunch of boxes?

Most lab protocols can be automated, they just often aren’t worth automating

If you were to speak to LLM’s about the subject of lab robotics, you will find that they are pretty pessimistic on the whole business, mostly because of how annoying the underlying machines are to use. I believed them! Especially because it does match up with what I’ve seen. For example, there is a somewhat funny phenomenon that has repeated across the labs of the heavily-funded biology startups I’ve visited: they have some immense liquid handler box lying around, I express amazement at how cool those things are, and my tour guide shrugs and says nobody really uses that thing.

But as was the case in an earlier essay I wrote about why pathologists are loathe to use digital pathology software, the truth is a bit complicated.

First, I will explain, over the course of a very large paragraph, what it means to work with a liquid handler. You can skip it if you already understand it.

First, you must define your protocol. This involves specifying every single operation you want the machine to perform: aspirate 5 microliters from position A1, move to position B1, dispense, return for the next tip. If you are using Hamilton’s Venus software, you pipette from seq_source to seq_destination, and you must do something akin to this for every container in your system. Second, you must define your liquid classes. A liquid class is a set of parameters that tells the robot how to physically handle a particular liquid: the aspiration speed, the dispense speed, the delay after aspiration, the blow-out volume, the retract speed, and a dozen other settings that must be tuned to the specific rheological properties of whatever you’re pipetting. Water is easy, glycerol is apparently really hard, and you will discover where your specific liquid lies on this spectrum as you go through the extremely trial-and-error testing process. Third, and finally, you must deal with the actual physical setup. The deck layout must be defined precisely. Every plate, reservoir, and tip rack must be assigned to a specific position, and those positions must match reality. The dimensions of the wells, the height of the rim, the volume all must be accurately detailed in the software. If you’re using plates from a different supplier than the one in the default library, you may need to create custom labware definitions.

And at any point, the machine may still fail, because a pipette tip failed to be picked up, the liquid detection meter threw a false negative, something clogged, or whatever else.

To help you navigate this perilous journey, Hamilton, in their infinite grace, offers seminars to teach you how to use this machine, and it only costs between 3,500 and 5,000 dollars.

And here’s a Reddit post with some more details:

Now, yes, this is annoying, especially if you compare it with manual pipetting. There, a trained researcher picks up a pipette, aspirates the liquid, watches it enter the tip, adjusts instinctively if something seems off, dispenses into the destination well, and moves on. The whole operation takes perhaps fifteen seconds. Perhaps the researcher gets really bored with this and doesn’t move particularly fast, but if you assemble enough of them together and call it graduate school or an RA position, you can scale things up quite a bit without needing to touch a robot at all. Oftentimes, that may not only be the more efficacious option, but also the cheaper one.

But there is a very interesting nuance here: if the task is worth automating, it actually isn’t that big of a deal to automate.

From talking to automation engineers, there is a distinct feeling I get that if we had an infinite number of them (and scientists to let them know their requirements) worming through our labs, there is a very real possibility that nearly everything in an average wet-lab could be automated. After all, there are centrifuges, incubators, and so on that are all automation compatible! And the engineers I talked to don’t actually mind the finicky process of tuning their boxes and arms that much. Yes, dialing in a protocol can be tough, but it is often a ‘solvable over a few hours’ problem. In the edge case of dealing with genuinely strange protocols that bear little resemblance to what the automation engineer has seen before, it could take perhaps weeks, but that’s it.

So what’s the problem?

Most protocols simply aren’t run enough times to justify the upfront investment.

Let’s assume it takes an automation engineer forty hours to fully dial in a new protocol, which is a reasonable estimate for something moderately complex. At a loaded cost of, say, $100 per hour for the engineer’s time, that’s $4,000 just to get the thing working. Now, if you’re going to run this protocol the exact same way ten thousand times over the next month, that $4,000 amortizes to forty cents per run. Trivial! Also, it’d probably be nearly impossible to do via human labor alone anyway, so automate away. But if you’re going to run it fifty times? That’s $80 per run in setup costs alone, and then you might as well just have a human do it.

This is, obviously, an immense oversimplification. Even if a wet-lab task could be ‘automated’, most boxes/arms still need to be babysat a little bit. But still! The split between robot-easy problems and robot-hard problems—in the eyes of automation engineers—has a lot less to do with specific motions/actions/protocols, and a lot more to do with ‘I will run this many times’ versus ‘I will run this once’.

And most protocols in most labs fall into the latter category. Research is, by its nature, exploratory. You run an experiment, you look at the results, you realize you need to change something, you run a different experiment. Some labs do indeed run their work in a very ‘robot-shaped’ way, where the bulk of their work is literally just ‘screening X against Y’, and writing a paper about the results. They can happily automate everything, because even if some small thing about their work changes, it’s all roughly similar enough to, say, whatever their prior assumptions on liquid classes in their liquid handler was.

But plenty of groups do not operate this way, maybe because they are doing such a vast variety of different experiments, or because their work is iterative and the protocol they’re running this week bears only passing resemblance to the protocol they’ll be running next week, or some other reason.

So, how do you improve this? How can we arrive at an automation-maxed world?

You can improve lab robotics by improving the translation layer, the hardware layer, or the intelligence layer

The answer is very obvious to those working in the space: we must reduce the activation energy needed to interface with robotic systems. But, while everybody seems to mostly agree with this, people differ in their theory of change of how such a future may come about. After talking to a dozen-plus people, there seem to be three ideological camps, each proposing their own solution.

But before moving on, I’d like to preemptively clarify something. To help explain each of the ideologies, I will name companies that feel like they fall underneath that ideology, and those categorizations are slightly tortured. In truth, they all slightly merge and mix and wobble into one another. While they seem philosophically aligned in the camp I put them in, you should remember that I am really trying to overlay a clean map on a very messy territory.

The first camp is the simplest fix: create better translation layers between what the human wants and what the machine is capable of doing. In other words, being able to automatically convert a protocol made for an intelligent human with hands and eyes and common sense, into something that a very nimble, but very dumb, robot can conceivably do. In other words, the automation engineer needn’t spend forty hours figuring this out, but maybe an hour, or maybe even just a minute.

This is an opinion shared by three startups of note: Synthace, Briefly Bio, and Tetsuwan Scientific.

Synthace, founded in London in 2011, was perhaps the earliest to take this seriously. They built Antha, which is essentially device-agnostic programming language, which is to say, a protocol written in Antha runs on a Hamilton or a Tecan or a Gilson without modification, because the abstraction layer handles the translation. You drag and drop your workflow together, the system figures out the liquid classes and deck layouts, and you go home while the robot pipettes.

Briefly Bio, which launched in mid-2024 and has perhaps one of the best and least-known-about blogs I’ve seen from a startup, initially started not as a translation layer between the scientist and the robot, but between the scientist and the automation engineer. Their software uses LLMs to convert the natural-language protocols that scientists can write—with all their implicit assumptions and missing parameters and things-that-must-be-filled-in—into structured, consistent formats that an automation team can work with. But since then, the team has expanded their purview to allow these auto-generated protocols (and edits made upon them) to be directly run on arbitrary boxes and arms, alongside a validation check to ensure that the protocol is actually physically possible.

Tetsuwan is the newest of the trio, announced at the end of 2024, and operates at a higher level of abstraction than Briefly and Synthace. Users do not write commands for transfers between plates, instead, they define experiments via describing high level actions such as combining reagents or applying transformations like centrifugations. Then they specify what their samples, variables, conditions and controls are for that specific run. From this intent-level description, Tetsuwan fully compiles to robot-ready code, automatically making all the difficult downstream process engineering decisions including mastermixing, volume scaling, dead volume, plate layouts, labware, scheduling, and liquid handling strategies. The result of this is fully editable by the scientist overseeing the process, allowing them to specify their preferences on costs, speed, and accuracy trade-offs.

And that’s the first camp.

The second camp also admits that the translation layer must be improved, but believes that physical infrastructure will be an important part of that. This is a strange category, because I don’t view this camp as building fundamentally novel boxes or arms, like, say, Unicorn Bio, but rather building out the physically tangible [stuff] that stitches existing boxes and arms together into something greater than the sum of their parts.

The ethos of this philosophy can be best viewed by what two particular companies have built: Automata and Ginkgo Bioworks.

Automata is slightly confusing, but here is my best attempt to explain what they do: they are a vertically-integrated-lab-automation-platform-consisting-of-modular-robotic-benches-and-a-scheduling-engine-and-a-data-backend-as-a-service business. They also call the physical implementation of this service the ‘LINQ bench’, and it is designed to mirror the size and shape of traditional lab benches, such that it can be dropped into existing lab spaces without major renovation. It robotically connects instruments using a robot arm and a transport layer, with them building a magnetic levitation system for high-speed multi-directional transport of plates across the bench. And the software onboard these systems handles workflow creation, scheduling, error handling, and data management. I found one of their case studies here quite enlightening in figuring out what exactly they do for their clients.

And of course, Ginkgo. Yes, Ginkgo is a mild memetic allergen to those familiar with its prior history, but I do encourage you to watch their 2026 JPM presentation over their recent push into automation. It’s quite good! The purpose of the presentation is to push Ginkgo’s lab automation solution—Reconfigurable Automation Carts, or RAC’s—but serves as a decent view into the pain points of building better lab automation. What are RAC’s anyway? Basically a big, modular, standardized cart that can have boxes (+other things) inserted in, and has an arm directly installed:

There is software that comes onboard to help you use the machines (Catalyst), but their primary focus seems to be hardware-centric.

This is Ginkgo’s primary automation play, though both the RAC’s and scheduling software were really first created by Zymergen, who Ginkgo acquired. And, just the other day, they demonstrated this hardware-centricity by partnering with OpenAI to run an autonomous lab experiment: 36,000 conditions across six iterative cycles, optimizing cell-free protein synthesis costs.

Moreover, the RAC’s each include a transport track, making it so they can be daisy-chained together in case you need multiple instruments to run your particular experiment.

And that’s the second camp.

The third and final camp believes the future lies in augmenting the existing systems with a greater degree of intelligence. This differs from the translation camp in that the translation camp is primarily concerned with the input side of the problem—converting human intent into robot-legible instructions before execution begins—while the intelligence camp is concerned with what happens during execution.

This is the newest group, and there are two companies here that feel most relevant: Medra and Zeon Systems.

Medra is the oldest player here, founded in 2022, raising 63 million dollars in the years since. Their pitch is that you already have the arms, you already have the boxes, and both are quite good at what they do. Really, what you need most is intelligence. Yes, perhaps the translation layers that the first camp is building, but the Medra pitch is a bit more all-encompassing than that. Onboard robotic intelligence would not only make it easier to do translation, but also error recovery, adaptation to novel situations, ability to interface with arbitrary machines (even ones that are meant to be worked manually), autonomously optimize protocols, design its own experiments outright, and generally handle the thousand small variations that make lab work so resistant to typical, more brittle automation.

Zeon Systems is our final company, and is fundamentally quite similar to Medra, but with a quirk that I find very endearing: their use of intelligence is not necessarily to make robots more capable, but to make them more forgiving. In 2014, Opentrons started, attempting to democratize automation by making the hardware cheap, but cheap hardware comes with cheap hardware problems—tips that don’t seat properly, positioning that drifts, calibration that goes wonky. The Zeon bet is that sufficiently good perception and intelligence can compensate for these shortcomings. If the robot can see that the tip didn’t seat properly and adjust accordingly, you no longer need the tip to seat properly every time. If the robot can detect that its positioning is off and correct in real-time, you no longer need precision machining to sub-millimeter tolerances. Intelligence, in this framing, is not a way to make robots do more, but rather a way to get away with worse machinery. And worse machinery means cheaper machinery, which means more labs can afford to automate, which means more Zeon products (whether that takes the form of software or software + hardware) can be sold.

Okay, that’s that. Those are the three camps.

Now the obvious question: which one of them is correct?

The most nuanced take is: all of them. It feels at least somewhat obvious that all possible futures will eventually demand something of all of these camps, and the companies that thrive will be those that correctly identify which layer is the binding constraint for which customer at which moment in time.

But here is a more opinionated take on each one:

The translation layer camp, to my eyes, has the most honest relationship with this problem. They are not promising to make robots smarter or to sell you better hardware, they are instead promising to make the existing robots easier to talk to, such that the activation energy required to automate a protocol drops low enough that even infrequently-run experiments become viable candidates. If we accept that this problem of protocol building is actually the real fundamental bottleneck to increasing the scale of automation, we should also trust the Tetsuwan/Synthase/Briefly’s of the world to have the best solutions.

You can imagine a pretty easy failure case here is that frontier-model LLM’s get infinitely better, negating any need for the custom harnesses these groups are building, slurping up any market demand that they would otherwise have. To be clear, I don’t really believe this will happen, for the same reason I think Exa and Clay will stick around for awhile; these tools are complicated, building complicated tools requires focus, and frontier model labs are not focused on these particular use-cases. And importantly, many of the problems that constitute translation are solved best through deterministic means (deck & plate layouts, choosing liquid class parameters, pipetting strategies, math of volume calculations). Opus 8 or whatever may be great and an important part of the translation solution, but it probably should not be used as a calculator.

The hardware camp is curious, because, you know, it doesn’t actually make a lot of sense if the goal is ‘democratizing lab automation’. Automata’s LINQ benches and Ginkgo’s RACs are expensive—extremely expensive!—vertically-integrated systems. They make automation better for orgs that have already crossed the volume threshold where automation makes sense. But they don’t actually lower that threshold, nor add in new capabilities that the prior systems couldn’t do. If anything, they raise it! You need even more throughput to justify the capital expenditure! So, what, have they taken the wrong bet entirely? I think to a certain form of purist, yes.

But consider the customer base these companies are actually chasing. Automata and Ginkgo alike are pitching their solutions to large pharma and industrial biotech groups. In other words, the primary people they are selling to are not scrappy academic labs, but rather organizations running thousands of experiments per week, with automation teams already on staff, who have already crossed the volume threshold years ago. Their problem has long gone past ‘should we automate?’, and has now entered the territory of “how can we partner with a trusted institutional consultant to scale to even larger degrees?“. To those folks, LINQ and RAC’s may make a lot of sense! But there is an interesting argument that, in the long term, these may end up performing the democratization of automation in a roundabout way. We’ll discuss that a bit more in the next section.

Finally, the intelligence camp. We can be honest with each other: it has a certain luster. It is appealing to believe that a heavy dose of intelligence is All You Need. In fact, visiting the Medra office earlier this year to observe their robots dancing around was the catalyst I needed to sit down and finish this article. Because how insane is it that a robotic arm can swish something out of a centrifuge, pop it into a plate, open the cap, and transfer it to another vial? Maybe not insane at all, maybe that’s actually fully within the purview of robotics to easily do, but that’s what the article was meant to discover. But after having talked to as many people as I have, I have arrived at a more granular view than “intelligence good” or “intelligence premature.”. There are really two versions of the intelligence thesis. The near-term version is about perception and error recovery: the robot sees that a tip didn’t seat properly and adjusts, detects that its positioning has drifted and corrects in real-time, recognizes that an aspiration failed and retries before the whole run is ruined. This feels quite close! The far-term version is something grander, where you can trust the robot to handle every step of the process, where you show a robot a video of a grad student performing a protocol and it just does it, perhaps even optimizing it, maybe even designing its own experiments—the intelligence onboard granting the robot all the necessary awareness and dexterity to complete anything and everything.

This future may well come! It is not an unreasonable bet. But, from my conversations, it does seem quite far away. Yes, it is easy to look at the results Physical Intelligence are producing and conclude that things are close to being solved, but lab work is very out-of-distribution to what most of these robotics foundation models are learning (and what they have learned is often still insufficient for their own, simpler folding-laundry-y tasks!). I want to be careful not to overstate this, since this greater intelligence may arrive faster than anyone suspects, so perhaps this take will be out of date within the year.

And wait! Wait! Before you take any of the above three paragraphs as a statement on companies rather than philosophies, you should recall what I said in the second paragraph of this section: none of these companies, many of whom the founders I talked to for this article, are so dogmatic as to be entirely translation/hardware/intelligence-pilled. They may lean that direction in the revealed preferences of how their companies operate, but they are sympathetic to each camp, and nearly all of them have plans to eventually play in sandboxes other than the ones they currently occupy.

Speaking of, how are any of these companies making money?

All roads lead to Transcriptic

There is a phenomenon in evolutionary biology called carcinization, which refers to the fact that nature keeps accidentally inventing crabs. Hermit crabs, king crabs, porcelain crabs; many of these are not closely related to each other at all, and yet they all independently stumbled into the same body plan, because apparently being shaped like a crab is such an unreasonably good idea that evolution cannot help itself. It just keeps doing it. I propose to you that there is a nearly identical phenomenon occurring in lab robotics, where every startup, regardless of what its thesis is, will slowly, inexorably, converge onto the same form.

Becoming Transcriptic.

Transcriptic was founded in 2012 by Max Hodak (yes, the same Max who co-founded Neuralink, and then later Science Corp). The pitch of the company was simple: we’ll build out a facility stuffed with boxes and arms and software to integrate them all, and invite customers to interact with them through a web interface, specify experiments in a structured format, and somewhere in a facility, the lab will autonomously execute your will (alongside humans to pick up the slack). In other words, a ‘cloud lab’.

The upside is that the sales pitch basically encompasses the entirety of the wet-lab market: don’t set up your own lab, just rent out the instruments in ours! And with sufficiently good automation, and software to use that automation, the TAM of this is a superset of a CRO.

The obvious downside is that doing this well is really, really hard. Transcriptic later merged with the automated microscopy startup ‘3Scan’, which rebranded as ‘Strateos’, which folded in 2023. This tells us something about the difficulty of this model. This said, Emerald Cloud Labs (ECL) is a startup that appeared two years after Transcriptic with similar product offerings, and they’ve held out, with a steady 170~ employees over the past two years. Yet, while they ostensibly are a cloud lab, they are not the platonic ideal of one in the same way Transcriptic was, in that anyone and everyone can simply log in, and run whatever experiment they’d like; ECL’s interface is gate-kept by a contact page.

Despite the empirical difficulty of making it work, it feels like going down the Transcriptic path is the logical conclusion of nearly any sufficiently good lab automation play.

Why?

Here, I shall refer to ‘Synbio25’, a wonderful essay by Keoni Gandall that I highly recommend you read in full. In this essay, Keoni discusses many things, but what I find most interesting is his comment on the immense economic efficiencies gained by batching experiments:

Robots, in biotechnology, are shamefully underutilized. Go visit some biology labs — academic, industrial, or startup — and you are sure to see robots just sitting there, doing nothing, collecting dust….
The benefit of aggregating many experiments together in a centralized facility is that we can keep robots busy. Even if you just want to run 1 protocol, there may be 95 others who want to run that 1 protocol as well — together, you can fill 1 robot’s workspace optimally. A centralized system lets you do this among many protocols — otherwise, you’d need to ship samples between labs, which is just too much. While the final step, testing your particular hypothesis, might still require customized attention and dedicated robot time, the heavy lifting — strain prep, validation, etc — can be batched and automated.

And one paragraph that I really think is worth marinating in (bolding by me):

The key, then, is to pull these robots towards projects and protocols that are closer and closer to the raw material side of biology, so that you can build everything else on top of those. For example, PCR enzyme, polymerase, is very widely used, but rather expensive if you buy proprietary enzymes. On the other hand, you can produce it for yourself very cheaply. If you utilize your robots to produce enzymes, you can then use this enzyme in all other experiments, dropping the costs of those experiments as well. The reason is quite simple: without a middleman, your costs approach chemical + energy + labor costs. A billion years of evolution made this, relative to other industries, very inexpensive. You just need to start from the bottom and move up.

There is a very neat logical train that arises from this!

If you are to accept that lab centralization (as in, cloud labs) means you can most efficiently use lab robotics—which feels like a pretty uncontroversial argument—it also means that the further you lean into this, the more able you are to vertically integrate upstream. If you’re running enough experiments such that your robots are constantly humming, you can justify producing your own reagents. If you’re producing your own reagents, your per-experiment costs drop. If your per-experiment costs drop, you can offer lower prices. If you offer lower prices, you attract more demand. If you attract more demand, your robots stay even busier. If your robots stay even busier, you can justify producing even more of your own inputs. And so on, ad infinitum, until you devour the entirety of the market, and the game of biology becomes extraordinarily cheap and easy for everyone to play in.

As an example, the Synbio25 essay offered this picture showing the plasmid production cost differences between unoptimized and optimized settings (read: producing enzymes + cells in-house and using maximum-sized sequencing flow cells). Over twice as cheap!

How dogmatic am I being here? Surely there are other business models that could work.

Perhaps for the next decade! But on a long-enough time horizon, it does feel like eventually everything becomes a cloud lab. Nothing else besides this really seems to work, or, if they do, their upside is ultimately capped and not ‘venture-scalable’, which is to say they may work, but you better not take any money to make it happen. Selling software means you’re easy to clone, being a CRO means you’re ultimately offering a subset of services that a cloud-lab can, and automation consulting has limited upsides. The best potential alternative on the table is to become a Thermo-Fisher-esque entity, selling boxes and arms and reagents to those who want to keep things in-house. But how many of those will there realistically be? How many holdouts could possibly remain as the cloud labs get more and more trustable, all while the business of biotech becomes (probably) more and more economically challenging, making it so justifying your own lab expenses become ever-more difficult?

But how things shake out in the short-term may be different. Because while Transcriptic doesn’t exist today, Emerald Cloud Labs does! And yet, they aren’t necessarily a juggernaut. As of today, we exist in a sort of purgatory, mid-way state, where neither the capability nor the trust to fully rely entirely on cloud labs yet exists. But it is coming. You can see it on the horizon. And so the interesting question to ask is: who stands to benefit the most from the wave in the coming years?

And here is where the hardware camp’s bet becomes a lot more convincing in retrospect. Yes, Automata and Ginkgo are selling expensive hardware systems to large pharma. But you can see the inklings of at least Ginkgo attempting lab centralization themselves by dogfooding their own machines to sell data to customers. Right now, it is functionally a CRO, with a menu of options. But what comes next? I don’t personally know how much easier RAC’s are to set-up for high-mix (read: highly heterogeneous lab experimentation), but the general sense I get from people is that they are. And if that’s true, then the Ginkgo play starts to look less like “we are selling expensive hardware to pharma” and more like “we are building the infrastructure that will eventually become the dominant cloud lab, and we’re getting pharma to pay for the R&D in the meantime.” Which is, if you squint, actually quite clever. Will they pull it off? I don’t know! Something similar for Automata could be said as well; the institution who gathered up a decades-worth of information on how automation is practically used may be well-poised to eventually operate their own cloud lab, having already learned—on someone else’s dime—exactly where the workflows break down and how to fix them.

How about the other groups? What can the intelligence and translation layer groups do during this interim period?

There’s a lot of possibilities. The simplest one is to get acquired. If the endgame is cloud labs, and cloud labs need both intelligence and translation layers to function, then the most straightforward path for these startups is to build something valuable enough that a cloud-lab-in-waiting (like Ginkgo or Automata themselves) decides to buy them rather than build it themselves. Similarly, these startups could become the picks-and-shovels provider that every cloud lab depends on.

But you could imagine more ambitious futures here too. Remember: you can just buy the hardware. Ginkgo’s RACs, Hamilton’s liquid handlers—none of this is proprietary in a way that prevents a sufficiently well-capitalized would-be-cloud lab from simply buying or even making it themselves. The hardware is a commodity, or at least it’s becoming one. What’s not a commodity is the intelligence to run it and the translation layer to make it accessible. So you could tell a story where the hardware companies win the short-term battle—racking up revenue, raising money, building out their systems—only to lose the long-term war to translation/intelligence groups who buy their hardware off the shelf and differentiate on software instead.

Of course, the steelman here is that the hardware companies could simply use their revenue advantage to build the software themselves.

We’ll see what happens in the end. Smarter people than me are in the arena, figuring it out, and I am very curious to see where they arrive.

This section is long, but we have one last important question to ask: why did the first generation of cloud labs not do so well? Was it merely a technological problem? Were they simply too early? This is unlikely according to the automation engineers I talked to; there aren’t massive differences between the machinery back then, and the machinery today. Could blame be placed on the translation layer that these companies had? It doesn’t seem like it; using Transcriptic, as documented in a 2016 blog post by Brian Naughton to create a protein using their service, doesn’t seem so terrible.

What else could be the issue?

There is one pitch offered by Shelby Newsad that I found interesting. The problem is not that these companies were too early, but rather that they were simply too general, and because they were too general, they could never actually make any single workflow frictionless enough to matter.

In the comments of that post made by Shelby, the same Keoni we referenced earlier explained what it was actually like to use a cloud lab (Transcriptic): you had to buy your own polymerase from New England Biolabs, ship it to their facility, pay for tube conversion, and then implement whatever cloning and sequencing pipeline you wanted to run. By the time you’d coordinated all of this, you might as well have just done it yourself. The automation was there! The robots were ready! But because Transcriptic had attempted the ‘AWS for biotech’ strategy right out of the gate, they offloaded the logistical headaches to the user. There is also a side note on how fixing issues with your experiment was annoying, as Brian Naughton states in his blog post: ‘debugging protocols remotely is difficult and can be expensive — especially differentiating between your bugs and Transcriptic’s bugs.’

Delighting the customer is important! Compare this to Plasmidsaurus. They (mostly) do one thing: plasmid DNA sequencing. You mail them a tube, they sequence it, you get results. That’s it, no coordination needed on your end, the entire logistics stack is their problem. And it has led to them utterly dominating that market, and slowly expanding their way to RNA-seq, metagenomics, and AAV sequencing. In fact, if we’re being especially galaxy-brained: there is a very real possibility that none of the companies we’ve discussed so far end up ushering in the cloud labs of the future, and instead, that prize shall be awarded to Plasmidsaurus and other, Plasmidsaurus-shaped CROs, expanding one vertical at a time.

Either way, this reframes the earlier question of which camp will win. Perhaps it’s not just about translation layers versus hardware versus intelligence. It’s about who can solve the logistics problem for a set of high-value workflows, and then use that beachhead to expand.

Conclusion

This field is incredibly fascinating, and the future of it intersects with a lot of interesting anxieties. China is devouring our preclinical lunch, will lab robotics help? The frontier lab models are getting exponentially better, will lab robotics take advantage of that progress to perform autonomous science? Both of these, and more, are worthy of several thousand more words devoted to them. However, this essay is already long, so I leave these subjects to another person to cover in depth.

But there is one final thing I want to discuss. It is the very real possibility that lab robotics, cloud labs, and everything related to them, will not actually fundamentally alter the broader problems that drug discovery faces.

You may guess where this is going. It is time to read a Jack Scannell paper.

In Jack’s 2022 Nature Reviews article, ‘Predictive validity in drug discovery: what it is, why it matters and how to improve it’1, he and his co-authors make a simple argument: the thing that matters most in drug R&D is not how many candidates you can test, but how well your tools for evaluating those candidates correlate with what actually works in humans. They call this ‘predictive validity’, and they operationalize it as the correlation coefficient between the output of whatever decision tool you’re using—a cell-based assay, an animal model, a gut feeling—and clinical utility in actual patients. The primary takeaway here is their demonstration that a 0.1 absolute change in this correlation—shifting from, say, 0.5 to 0.6—can have a bigger impact on the positive predictive value of one’s R&D pipeline than screening ten times, or even a hundred times, more candidates.

They illustrate this with a fun historical example: in the 1930s, Gerhard Domagk screened a few hundred dyes against Streptococcus in live mice and discovered sulfonamide antibiotics. Seven decades later, GSK ran 67 high-throughput screening campaigns, each with up to 500,000 compounds, against isolated bacterial protein targets, and found precisely zero candidates worthy of clinical trials. How could this be? It is, of course, because the mice were a better decision tool than the screens, as they captured the in-vivo biology that actually mattered.

What is the usual use-case for lab robotics? It is meant to be a throughput multiplier. It lets you run more experiments, faster. And Scannell is stating that moving along the throughput axis—running 10x or 100x more experiments through the same assays—is surprisingly unimpressive compared to even modest improvements in the quality of those assays. And given the failure rate of drugs in our clinical trials, which hover as high as 97% in oncology, the assays are empirically not particularly good.

But, to be clear, this is not an anti-automation take. It is a reframing of what automation should be for.

It feels like the value that the lab-robotics-of-tomorrow will bring to us will almost certainly not be in gently taking over the reins of existing workflows and running them themselves. It will be in enabling different experiments, better ones, ones with higher predictive validity, at a scale that would be impossible without automation. And this doesn’t require any suspension of disbelief about what ‘autonomous science’ or something akin to it may one day bring! The arguments are fairly mundane.

In the same Scannell paper, he argues that companies should be pharmacologically calibrating their decision tools, as in, running panels of known drugs, with known clinical outcomes, through their assays to measure whether the assay can actually distinguish hits from misses. Almost nobody does this, because it is expensive, tedious, and produces neither a publication nor a patent. But if per-experiment costs drop far enough, if they no longer require expensive human hands to perform, calibration becomes economically rational, and the industry could move from assuming that a given assay is predictive to measuring whether it is. Similarly, given the 50% irreproducibility rate in preclinical research, it may be the case that many otherwise ‘normal’ assays are yielding useless results, entirely because they are performed manually, by individual researchers with slightly different techniques, in labs with slightly different conditions, who did not have the instruments needed to validate their reagents. Sufficiently good cloud automation could free these assays from their dependence on individual hands, and allow higher-standard experimentation to be reliably performed at scale.

In other words: if you follow the trend-lines, if per-experiment costs continue to fall, if the translation layers keep getting better, if the cloud labs keep centralizing and vertically integrating and driving prices down further still, then at some point, perhaps not far from now, it becomes rational to do the things that everyone already knows they should be doing but can't currently justify. And this alone, despite its relative banality, may be enough to alter how drug discovery as a discipline is practiced.

Shout out to Cristian for sending me this paper!

Questions to ask when evaluating neurotech approaches

Abhishaike Mahajan — Sun, 25 Jan 2026 16:11:30 GMT

Note: Extraordinarily grateful to Milan Cvitkovic, Sumner Norman, Ben Woodington, and Adam Marblestone for all the helpful conversations, comments, and critiques on drafts of this essay.

Second note: I am co-hosting an NYC Biotech x ML meetup on Feb 11th, here is the link.

Introduction

Neurotech is complicated. This is because you need to understand at least five fields at once to actually grasp what is/isn’t possible: electrical engineering, mechanical engineering, biology, neuroscience, and computer science. And, if you’re really trying to cover all the bases: surgery, ultrasound and optical physics as well. And I’ve met relatively few people in my life who can operate at the intersection of three fields, much less eight! As a result, I’ve stayed away from the entire subject, hoping that I’d eventually learn what’s going on via osmosis.

This has not worked. Each time a new neurotech startup comes out, I’d optimistically chat about them with some friend in the field and they inevitably wave it off for some bizarre reason that I would never, ever understand. But the more questions I asked, the more confused I would get. And so, at a certain point, I’d just start politely nodding to their ‘Does that make sense?’ questions.

I have, for months, been wanting to write an article to codify the exact mental steps these people go through when evaluating these companies. After talking to many experts, I have decided that this is a mostly impossible task, but that there are at least a few, small, legible fractions of their decision-making framework that are amenable to being written out. This essay is the end result.

My hope is that this helps set up the mental scaffolding necessary to triage which approaches are tractable, and which ones are more speculative. Obviously, take all of my writing with a grain of salt; anything that touches the brain is going to be complicated, and while I will try to offer as much nuance as possible, I cannot promise I will offer as much as an Actual Expert can. Grab coffee with your local neurotech founder!

Questions

How relevant are the state measurements to the application?

At least some forms of neurotech, like brain-computer-interfaces, perform some notion of ‘brain state reading’ as part of their normal functionality.

Well, what exactly is ‘brain state’?

Unfortunately for us, ‘brain state’ lies in the same definitional scope as ‘cell state’. As in, there isn’t really a great ground truth for the concept. But there are things that we hope are related to it! For cells, those are counts of mRNA, proteins found, chromatin landscape of the genome, and so on. For brains, there are four main possibilities to get at a notion of state:

Measure the spiking activity of singular neurons (very invasive)
Measure the activity of local field potentials (can be slightly less invasive)
Measure hemodynamics (blood flow or oxygenation) changes (can be non-invasive, though higher-res invasive)
Measure electromagnetic fields outside the skull (usually non-invasive)

There is an ordering here; at the top, we have measurements that are closest to the actual electrical signaling that (probably) defines moment-to-moment neural computation. As we move down the list, each method becomes progressively more indirect, integrating over larger populations of neurons, longer time windows, and/or more layers of intermediary physiology.

This is perhaps overcomplicating things, but there’s one also, slightly more exotic approach not mentioned here (and that I won’t mention again), called biohybrid devices. In these systems, neurons grown ex-vivo are engrafted to a brain, and those neurons are measured directly, so it’s sort of an aggregate measure like LFP, but also it’s technically able to measure single spikes.

But keep in mind: none of these actually work at understanding the full totality of every single neuron firing in a brain, which is a largely physically intractable thing to perform. Which is fine and fair! Understanding totalities is a tall bar to meet. But it does mean that whenever we stumble across a new company, we should ask the question: how relevant is their method of understanding brain state to the [therapeutic area] they actually care about? Superficial cortical hemodynamics won’t reveal hippocampal spiking, 2-channel EEG won’t decode finger trajectories, and so on.

With this context, let’s consider Kernel, a neurotech company founded by the infamous Bryan Johnson in the mid-2010’s. Their primary product is called Kernel Flow, a headset that does time-domain functional near-infrared spectroscopy (TD-fNIRS) to measure brain state, which tracks blood oxygenation by measuring how light scatters through the skull. In other words, this is a hemodynamics measurement device.

It is non-invasive, portable, and looks like a bike helmet (which is an improvement compared to many other neurotech headsets!).

One common thing you’ll find on most neurotech websites is a ‘spec sheet’ of their device. For most places, you’ll need to formally request it, but Kernel helpfully provides easy access to it here.

In it, they note that the device has an imaging rate of 3.76Hz, which means it’s taking a full hemodynamic measurement about every 266 milliseconds across the surface of the brain. This is fast in absolute terms, but slow on the level of (at least some) cognitive processes, which often unfold on the order of tens of milliseconds. For example, the neural signatures involved in recognizing a face or initiating a movement can happen in less than 100 milliseconds. And to be clear, this is not something that can be altered by increasing the sampling rate; the slowness is inherent to hemodynamic measurements in general.

This means that by the time Flow finishes one hemodynamic snapshot, many of the neural events we care about have started and finished.

The spec sheet also notes that the device comes with 4 EEG electrodes, which have a far higher sampling rate of 1kHZ, or 1,000 measurements per second. At first glance, this seems like it might compensate for the sluggish hemodynamic signal by offering access to fast electrical activity. But in practice, 4 channels are entirely insufficient for learning really anything about the brain. Keep in mind that clinical-grade usually operates at the 32-channel-and-above level!

I found one paper that investigated the localization errors of EEG’s—as in, can you correctly place where in the brain a spike is occurring—across a range of channels: 256, 128, 64, 32, and 16. Not even 4! Yet, even at the 16-channel level, spatial localization was incredibly bad; one example of its failure case being that it mis-localized a temporal-lobe spike to the frontal lobe. Past that, noise like muscle and eye movement artifacts often dominates the EEG signal at the lowest channel counts.

And, again, this was on 16 channels! One can only imagine how much worse 4 channels is.

Of course, 4-channels of EEG data clearly offer something. In the context of the device, they may serve as a coarse sanity check or a minimal signal for synchronizing with the slower hemodynamic measurements. Which maybe is enough to be useful?

But we may be getting ahead of ourselves by getting lost in these details. It is entirely irrelevant to consider the absolute value of any given measurement decision being made here, because, again, what actually matters is the relevancy of those measurements to whatever the intended use case is. Clearly the devices measurements are, at least, trustworthy. But what is it meant to be used for?

Well…it’s vague. Kernel’s public messaging has shifted over the years—from “neuroenhancement” and “mental fitness” to, most recently, “brain biomarkers.”. I am not especially well positioned to answer whether this final resting spot is relevant to what Kernel is measuring, but it feels like it is? At least if you look at their publications, which do show that the device is capable of capturing global brain state changes when under the influence of psychoactive substances, e.g. ketamine. So, even if hemodynamics doesn’t meet the lofty goal of being able to detect face recognition, that’s fine! Static-on-the-order-of-minutes biomarkers are fully within their measuring purview.

Does that make Kernel useful? I don’t know the answer to that, but we’ll come back to the subject in a second.

What are the costs and burdens for the user?

In short: a device must earn its place in a patient's life.

The historical arc of neurotech companies lay mainly in serving desperate people that have literally no other options: ALS, severe spine damage, locked-in syndrome, and the like. The giants of the field—Synchron, Blackrock Neurotech, and Neuralink—have all positioned themselves around these, and so their maximally invasive nature is perfectly fine with their patients. Now, fairly, Synchron apparently doesn’t have the greatest reputation and Blackrock is somewhat old-fashioned, so Neuralink could be considered the only giant, but all three did pop up a lot during my research!

Blackrock Neurotech are the creators of the Utah Array, which remains the gold standard for invasive, in-vivo neural recording. Neuralink, the newest and most-hyped, have iterated on the approach, developing ultra-thin probes that can be inserted into the brain to directly record signals. Synchron has the least invasive approach, with its primary device being an endovascular implant called the Stentrode, allowing neural signals to be read less invasively than a Utah Array or Neuralink (from a blood vessel in the brain rather than in the parenchyma), though at a severe cost of signal quality.

You could find faults with these hyper-invasive neurotech companies on the basis of ‘how realistically large is the patient population?’, but you can’t deny that amongst the patient population that does exist, they’d certainly benefit!

So…if you do spot a neurotech company that is targeting a less-than-desperate patient population, you should ask yourself: why would anyone sign up for this? Why would an insurance company pay for it? And most importantly, why would the FDA ever approve something with such a lopsided risk-reward ratio? This is also why you see a lot of neurotech companies pivot toward “wellness” applications when their original clinical thesis doesn’t pan out. Wellness doesn’t require FDA approval or insurance reimbursement! But it also doesn’t require the device to actually work.

But even if a neurotech company is targeting a less-than-desperate patient population and aren’t trying to push them towards surgery, it’s still worth thinking about the burdens they pose!

Neurotech devices can be onerous in more boring ways too, so much so that they can completely kill any desire for any non-desperate person to use it. One example is a device we’ve talked about: the Kernel Flow. Someone who I chatted with for this essay mentioned that they had tried it, and had this to say about it:

“[the headset] weighs like 4.5lbs. That is so. fucking. uncomfortable.”.

Now, it may be the case that the information that the device tells you is of such importance that it is worth putting up with the discomfort. Is the Kernel Flow worth it? I don’t know, I haven’t tried it! But in case you ever do personally try one of these wellness-focused devices, it is worth pondering how big of a chore it’d be to deal with.

How much is the approach ‘fighting physics’?

Speaking of ‘building things for less desperate patients’, two big neurotech names that often come up are Nudge and Forest Neurotech (the founder of whom I talked to for this article, who has since moved to Merge Labs).

Both of these startups are focusing on brain stimulation for mental health, though Forest’s ambitions also include TBI and spinal cord injuries. Depression, anxiety, and PTSD can be quite awful, but only the most severely affected patients (single-digit percentages of the total patient population) would likely be willing to receive a brain implant. And both of these companies are fully aware of that, which is why neither of them do brain implants.

But, even if you aren’t directly placing wires into the brain, there is still some room to play with how invasive you actually are. I think it’d be a useful exercise to discuss both Nudge and Forest’s approaches—the former non-invasive, the latter invasive (albeit slightly less invasive than a Neuralink, which goes directly into the brain parenchyma)— because they illustrate an interesting dichotomy I’ve found amongst neurotech startups: the degree to which they are attempting to ‘fight’ physics.

At the more invasive end, there’s Forest Neurotech. Forest was founded in October 2023 by two Caltech scientists—Sumner Norman and Tyson Aflalo—alongside Will Biederman from Verily. They’re structured as a nonprofit Focused Research Organization and backed by $50 million from Eric and Wendy Schmidt, Ken Griffin, ARI, James Fickel, and the Susan & Riley Bechtel Foundation. Their approach relies on ultrasound, built on Butterfly Network’s ultrasound-on-chip technology, that sits inside the skull but outside the brain’s dura mater; also called an ‘epidural implant’. Still invasive, but again, not touching the brain!

At the less invasive end, there’s Nudge, who just raised $100M back in July 2025 and has Fred Ehrsam, the co-founder of Coinbase, as part of the founding team. They also have an ultrasound device, but theirs is entirely non-invasive, and comes with a nice blog post to describe exactly what it is: …a high channel count, ultrasound phased array, packed into a helmet structure that can be used in an MRI machine.

So, yes, both of these are essentially focused ultrasound devices meant for neural stimulation, though I should add the nuance that Forest’s device is also capable of imaging. But, despite the surface similarities, one distinct split between the two is that, really, Nudge is attempting to fight physics a lot more than Forest.

Why? Because they must deal with the skull.

Nudge’s device works by sending out multiple ultrasound waves from an array of transducers that are timed so precisely that they constructively interfere at a single millimeter-scale point deep in the brain, stimulating a specific neuron population, usually millions of them. It is not dissimilar to the basic principle as noise-cancelling headphones, but in reverse: instead of waves cancelling each other out, they add up. The hope is that all the peaks of the waves arrive at the same spot at the same moment—constructive interference—and you get a region of high acoustic pressure that can change brain activity. As a sidepoint: you’d think this works by stimulating neurons! But apparently it can work both via stimulation or inhibition, depending on how the ultrasound is set up.

How is the Nudge approach fighting physics?

First, there’s absorption. The skull soaks up a substantial chunk of the emitted ultrasound energy and converts it into heat. One study found that the skull causes 4.7 to 7 times more attenuation than the scalp or brain tissue combined.

Second, aberration. Because the skull varies in thickness, density, and internal structure across its surface, different parts of your ultrasound wavefront travel at different speeds, so, by the time the waves reach the brain, they’re no longer in phase. If the whole point of focused ultrasound is getting all your waves to constructively interfere at a single point, the skull messes that up, and the intended focal spot gets smeared, shifted, or might not form properly at all.

And, finally, the skull varies enormously between individuals. The “skull density ratio”—a metric that captures how much trabecular (spongy) bone versus cortical (dense) bone you have—differs from person to person, and it dramatically affects how well ultrasound gets through.

Now, to be clear, Nudge is aware of all of these things, and the way they’ve structured their device is attempting to fight all these problems. For example, Nudge talks a fair bit about how their device is MRI-compatible. This is great! If you want to correct for aberrations (and for everyone’s brain being a different shape), you need to know what you’re correcting for, which means you need a detailed 3D model of that specific patient’s skull, which means you need an MRI (or better CT). You image the skull, you build a patient-specific acoustic model, you compute the corrections needed to counteract the distortions, and then you program those corrections into your transducer array. Problem solved!

Well, maybe. Fighting physics is a difficult problem, and we’ll see what they come up with. While there is already a focused ultrasound, FDA-approved device that has been used in thousands of surgeries similar to Nudge’s that can target the brain with millimeter-scale accuracy (albeit for ablating brain tissue, not stimulating it, but the physics are the same!), it is an open question whether Nudge can dramatically improve on the precision and convenience needed to make it useful for mental health applications.

On the other hand, Forest, by bypassing the skull, is almost certainly assured to hit the brain regions they most want, potentially reaching accuracies at the micron scale. Remember that these differences cube, i.e. the number of neurons in a 150 micron wide voxel vs. a 1.5 millimeter wide voxel is (1500^3)/(150^3) =1,000 times more neurons. So it’s safe to say that the Forest device is, theoretically, 2-3 orders of magnitude more precise in the volumes it interacts with than Nudge is. Now, Forest still isn’t exactly an easy bet, given that they now have to power something near an organ that really, really doesn’t like to get hot, figure out implant biocompatibility, and a bunch of other problems that come alongside invasive neurotech devices. But they at least do not have to fight the skull, and are thus assured a high degree of precision.

There is, of course, a reward for Nudge’s trouble. Nudge, if they succeed, also gets access to a much larger potential patient population, since no surgery is needed. This is opposed to Forest, who must limit themselves to a smaller, more desperate demographic.

As with anything in biology, there is an immense amount of nuance I am missing in this explanation. People actually in the neurotech field are likely at least a little annoyed with the above explanation, because it does leave out something important in this Nudge versus Forest, non-invasive versus invasive, physics-fighting versus physics-embracing debate: how much does it all matter anyway?

Do they know whether their advantages translates to clinical benefit?

The brain computer interface field is in a strange epistemic position where devices are being built to modulate brain regions whose exact anatomical boundaries aren’t agreed on (and may even diverge between individuals!), using mechanisms that aren’t fully understood, for conditions whose neural circuits are still being studied.

Because of this, despite all the problems I’ve listed out with going through the skull, Nudge will almost certainly have some successful clinical readouts. Why? It has nothing to do with the team at Nudge being particularly clever, but rather, because there is already existing proof that non-invasive ultrasound setups somehow work for some clinically relevant objectives.

Nudge is fun to refer to because they have a lot of online attention on them, but there are other players in the ultrasound simulation space too, ones who are more public with their clinical results. SPIRE Therapeutics is one such company and they, or at least people associated with the company (Thomas S Riis), have papers demonstrating tremor alleviation (n=3), chronic pain reduction (n=20), and, most relevant to this whole discussion, and depressive symptom improvement (n=22 + randomized + double-blind!), all using their noninvasive ultrasound device.

How is this possible? How do these successful results square with the skull problems from earlier?

Clearly, something is getting through the skull, and it seems to be having some clinically significant effect. Because of this, it could very well be possible that the relative broadness of Nudge’s and SPIRE’s (and others like them) stimulation is, in fact, perfectly fine, and being incredibly precise is simply not worth the effort. This all said, it is hard to give Forest a fair trial here, since they are basically the only ones going the invasive route for ultrasound, and their clinical trials (which use noninvasive devices) have just started circa early 2025. Maybe their results will be spectacular, and I’d recommend watching Sumner’s (the prior Forest CEO) appearance on Ashlee Vance’s podcast to learn more about early results there.

But really, this debate between invasive and non-invasive really belongs in the previous section, because the point I am trying to make here is a bit more broad than these two companies. What I’m really gesturing at is that being really good at [X popular neurotech metric] doesn’t alone equal something better! This is as true for precision as it is for everything else.

Staying on the example of precision, consider the absolute dumbest possible way you could approach brain stimulation: simply wash the entire brain with electricity and hope for the best.

This is, more or less, what electroconvulsive therapy (ECT) does. Electrodes are placed on your scalp, a generalized seizure is induced, and you repeat this a few times a week. You are, in the most literal sense, overwhelming the entire brain with synchronized electrical activity. And yet despite the insane lack of specificity, ECT remains the single most effective treatment we have for severe, treatment-resistant depression. Response rates hover around 50-70% in patients for whom nothing else has worked, with some rather insane outcomes, one review paper stating: “For the primary outcome of all-cause mortality, ECT was associated with a 30% reduction in overall mortality.” For some presentations, like depression with psychotic features, catatonia, or acute suicidality, it is essentially first-line.

This should be deeply humbling for anyone looking into the neuromodulation space. There are companies raising hundreds of millions of dollars to hit specific brain targets with millimeter, even micron precision, and meanwhile, the most effective neurostimulation-for-depression approach we’ve ever discovered involves no targeting whatsoever. Now, of course, there are genuine downsides to the ECT approach (cognitive side effects, the need for anesthesia, the inconvenience of repeated hospital visits, obviously doesn’t work for every neuropsychiatric disorder) that make it worth pursuing alternatives! But it does suggest that the relationship between targeting precision and clinical outcome is much more complex than you’d otherwise assume.

Consider the opposite failure mode. Early deep brain stimulation—the most spatiotemporally precise neurostimulation method currently available—trials for depression are instructive here. Researchers identified what they believed was “the depression circuit,” implanted electrodes in that exact area, delivered stimulation, and then watched as several major trials burned tens of millions of dollars on null results. Most infamously, the BROADEN trial, targeting the subcallosal cingulate, and the RECLAIM trial, targeting the ventral capsule/ventral striatum, both of which failed their primary endpoints.

Yet, DBS is FDA-approved for Parkinson’s treatment and is frequently used to treat OCD. Each indication is a world unto itself in how amenable it is ‘precision’ being a useful metric.

But again, this point extends beyond precision.

As a second example, consider the butcher number, a metric first coined by the Caltech neuroscientist Markus Meister, which captures the ratio of the number of neurons destroyed for each neuron recorded. Now, you’d ideally like to reduce the butcher number, because killing neurons is (probably) bad. And one way you could reliably reduce the butcher number is by simply making your electrodes thinner and more flexible. This is, more or less, at least part of Neuralink’s thesis: their polymer threads are 5 to 50 microns wide and only 4 to 6 microns thick (dramatically smaller than the Utah array’s 400-micron-diameter electrodes!) and thus almost certainly has a low butcher number.

Here’s the Neuralink implant:

And here’s the Utah array:

But does having a lower butcher number actually translate to better clinical outcomes? As far as I can tell, nobody knows! It’s largely unstudied! It’s conceivable that yes, lowering this number is useful, but surely there is a point where the priority of the problem dramatically drops compared to the litany of other small terrors that plague most neurotech startups.

The point here is not that the butcher’s number is useless. The point also isn’t that precision is useless. The point is that the relationship between any given engineering metric and clinical success (in your indication) is rarely as straightforward as anyone hopes, and it’s worth considering whether that relationship has actually been established before believing that success on the metric is at all useful.

Could this be done without touching the central nervous system?

Finally: something that repeated across the neurotech folks I talked to was that people consistently underestimate how extraordinarily adaptable the peripheral nervous system is. For example, a company that claims to, say, automatically interpret commands to a digital system via EEG should probably make absolutely certain that attaching an electromyography device to a person’s forearm (and training them to use it) wouldn’t wind up accomplishing the exact same thing.

In fact, there was a company that did exactly this. Specifically, CTRL-labs, a New York City-based startup. They came up over and over again in my conversations as a prime example of someone solving something very useful, in a way that completely avoided the horrifically challenging parts of touching the brain. Their device was a simple wristband that reads neuromuscular signals from the wrist (via electromyography, or EMG) to control external devices. Here’s a great video of it in action.

Now, if CTRL-labs was so great, what happened to their technology? They were acquired by Meta in 2019, joining Facebook Reality Labs. And if you look at the ex-CEO’s Twitter (who is now a VP at Meta), you can see that he recently retweeted a September 2025 podcast with Mark Zuckerberg, in which Mark says that their next generation of glasses will include an EMG band capable of allowing you to type, hands free, purely by moving your facial muscles.

Not too far of a stretch to imagine that this is based on CTRL-labs work! And, by the time I finally finished this essay, the device now has a dedicated Meta page!

What about something that exists today?

Another startup that multiple people were exuberant over was one called Augmental. Their device is something called ‘Mouthpad^’, and a blurb from the site best describes it:

The MouthPad^ is smart mouthwear that allows you to control your phone, computer, and tablet hands-free. Perched on the roof of your mouth, the device converts subtle head and tongue gestures into seamless cursor control and clicks. It’s virtually invisible to the world — but always available to you.

And here’s a wild video of a 19-year old quadriplegic using this device to interact with a computer and even code.

Isn’t this insane? I remember being shocked by the Neuralink demo videos showing paralyzed patients controlling cursors on screens. But this is someone doing essentially the same thing! All by exploiting both the tongue, which happens to have an extremely high density of nerve endings and remarkably fine motor control, and our brain, which can display remarkable adaptivity to novel input/output channels.

Now, fairly enough, a device like Augmental cannot do a lot of things. For someone with complete locked-in syndrome, there really may be no alternative to inserting a wire into the brain. And in the limit case of applications that genuinely require reading (or modifying!) the content of thought, the periphery again won’t cut it. But for a surprising range of use cases, the peripheral route seems to offer a dramatically better risk-reward tradeoff, and it feels consistently under-appreciated when people are mentally pricing how revolutionary a new neurotech startup is.

Conclusion

This piece has been in production for the last five months and, as such, lots of discarded bits of it can be found on the cutting room floor. There are lots of other things, not mentioned in this essay, that I think are also worth really pondering, but I couldn’t come up with a big, universal statement about what the takeaway is, or the point is pretty specific to a small subset of devices. I’ve attached three such things in the footnotes.1

Before ending, I’d like to repeat the sentiment I mentioned at the start: this field is complicated. A lot of the readers of this blog come from the more cell-biology or drug-discovery side of the life-sciences field, and may naturally assume that they can safely use that mental framework to grasp the neurotech field. I once shared this optimism, but I no longer do. After finishing this essay, I now believe that the relevant constraints in this domain come from such an overwhelming number of directions that it bears little resemblance to most other questions in biology, and more-so resembles the assessment of a small nation’s chances of surviving a war. The personality required to perform such a feat matches up with the archetype of individual I’ve found to work in this field, all of whom display a startling degree of scientific omniscience that, in any other field, would be considered extraordinary, but here is equivalent to competence. It would be impossible to recreate these people’s minds in anything that isn’t a seven-hundred-page text written in ten-point font, but I hope this essay serves as a rough first approximation.

Think about how they are powering the device. Brains really, really don’t like heat. The FDA limit is that an implant in or touching the brain can rise at most 1C above the surrounding tissue. So, if a device is promising to do a lot of edge compute and is even slightly invasive, it is worth being worried about this.

Think about whether they are closed-loop or open-loop. An open-loop technology intervenes on the brain without taking brain state into account, like ECT or Prozac. A closed-loop device reads neural activity and adjusts its intervention in real-time. Many companies gesture toward closed-loop as a future goal without explaining how they’ll get there. You may think that this should lead one to being especially optimistic about devices that can easily handle both reading and writing at the same time, because the pathway to closed-loop is technically much cleaner. But again, how much does ‘continuous closed loop’ matter, as opposed to a write-only device that is rarely calibrated via an MRI? Nobody knows!

Think about how they plan to deal with the specter of China’s stranglehold on the parts they need, and their rapidly advancing neurotech industry. This is a surprisingly big problem, and while there is almost certainly plenty of material here for its own section, I ended up not feeling super confident about the takeaway message here. Free article idea for those reading!

And there’s almost certainly a lot more that I’m not even thinking about, because I’m just not aware of it.

The truth behind the 2026 J.P. Morgan Healthcare Conference

Abhishaike Mahajan — Mon, 12 Jan 2026 16:40:20 GMT

Note: I am co-hosting an event in SF on Friday, Jan 16th.

In 1654, a Jesuit polymath named Athanasius Kircher published Mundus Subterraneus, a comprehensive geography of the Earth’s interior. It had maps and illustrations and rivers of fire and vast subterranean oceans and air channels connecting every volcano on the planet. He wrote that “the whole Earth is not solid but everywhere gaping, and hollowed with empty rooms and spaces, and hidden burrows.”. Alongside comments like this, Athanasius identified the legendary lost island of Atlantis, pondered where one could find the remains of giants, and detailed the kinds of animals that lived in this lower world, including dragons. The book was based entirely on secondhand accounts, like travelers tales, miners reports, classical texts, so it was as comprehensive as it could’ve possibly been.

But Athanasius had never been underground and neither had anyone else, not really, not in a way that mattered.

Today, I am in San Francisco, the site of the 2026 J.P. Morgan Healthcare Conference, and it feels a lot like Mundus Subterraneus.

There is ostensibly plenty of evidence to believe that the conference exists, that it actually occurs between January 12, 2026 to January 16, 2026 at the Westin St. Francis Hotel, 335 Powell Street, San Francisco, and that it has done so for the last forty-four years, just like everyone has told you. There is a website for it, there are articles about it, there are dozens of AI-generated posts on Linkedin about how excited people were about it. But I have never met anyone who has actually been inside the conference.

I have never been approached by one, or seated next to one, or introduced to one. They do not appear in my life. They do not appear in anyone’s life that I know. I have put my boots on the ground to rectify this, and asked around, first casually and then less casually, “Do you know anyone who has attended the JPM conference?”, and then they nod, and then I refine the question to be, “No, no, like, someone who has actually been in the physical conference space”, then they look at me like I’ve asked if they know anyone who’s been to the moon. They know it happens. They assume someone goes. Not them, because, just like me, ordinary people like them do not go to the moon, but rather exist around the moon, having coffee chats and organizing little parties around it, all while trusting that the moon is being attended to.

The conference has six focuses: AI in Drug Discovery and Development, AI in Diagnostics, AI for Operational Efficiency, AI in Remote and Virtual Healthcare, AI and Regulatory Compliance, and AI Ethics and Data Privacy. There is also a seventh theme over ‘Keynote Discussions’, the three of which are The Future of AI in Precision Medicine, Ethical AI in Healthcare, and Investing in AI for Healthcare. Somehow, every single thematic concept at this conference has converged onto artificial intelligence as the only thing worth seriously discussing.

Isn’t this strange? Surely, you must feel the same thing as me, the inescapable suspicion that the whole show is being put on by an unconscious Chinese Room, its only job to pass over semi-legible symbols over to us with no regards as to what they actually mean. In fact, this pattern is consistent across not only how the conference communicates itself, but also how biopharmaceutical news outlets discuss it.

Each year, Endpoints News and STAT and BioCentury and FiercePharma all publish extensive coverage of the J.P. Morgan Healthcare Conference. I have read the articles they have put out, and none of it feels like it was written by someone who actually was at the event. There is no emotional energy, no personal anecdotes, all of it has been removed, shredded into one homogeneous, smoothie-like texture. The coverage contains phrases like “pipeline updates” and “strategic priorities” and “catalysts expected in the second half.” If the writers of these articles ever approach a human-like tenor, it is in reference to the conference’s “tone”. The tone is “cautiously optimistic.” The tone is “more subdued than expected.” The tone is “mixed.” What does this mean? What is a mixed tone? What is a cautiously optimistic tone? These are not descriptions of a place. They are more accurately descriptions of a sentiment, abstracted from any physical reality, hovering somewhere above the conference like a weather system.

I could write this coverage. I could write it from my horrible apartment in New York City, without attending anything at all. I could say: “The tone at this year’s J.P. Morgan Healthcare Conference was cautiously optimistic, with executives expressing measured enthusiasm about near-term catalysts while acknowledging macroeconomic headwinds.” I made that up in fifteen seconds. Does it sound fake? It shouldn’t, because it sounds exactly like the coverage of a supposedly real thing that has happened every year for the last forty-four years.

Speaking of the astral body I mentioned earlier, there is an interesting historical parallel to draw there. In 1835, the New York Sun published a series of articles claiming that the astronomer Sir John Herschel had discovered life on the moon. Bat-winged humanoids, unicorns, temples made of sentient sapphire, that sort of stuff. The articles were detailed, describing not only these creatures appearance, but also their social behaviors and mating practices. All of these cited Herschel’s observations through a powerful new telescope. The series was a sensation. It was also, obviously, a hoax, the Great Moon Hoax as it came to be known. Importantly, the hoax worked not because the details were plausible, but because they had the energy of genuine reporting: Herschel was a real astronomer, and telescopes were real, and the moon was real, so how could any combination that involved these three be fake?

To clarify: I am not saying the J.P. Morgan Healthcare Conference is a hoax.

What I am saying is that I, nor anybody, can tell the difference between the conference coverage and a very well-executed hoax. Consider that the Great Moon Hoax was walking a very fine tightrope between giving the appearance of seriousness, while also not giving away too many details that’d let the cat out of the bag. Here, the conference rhymes.

For example: photographs. You would think there would be photographs. The (claimed) conference attendees number in the thousands, many of them with smartphones, all of them presumably capable of pointing a camera at a thing and pressing a button. But the photographs are strange, walking that exact snickering line that the New York Sun walked. They are mostly photographs of the outside of the Westin St. Francis, or they are photographs of people standing in front of step-and-repeat banners, or they are photographs of the schedule, displayed on a screen, as if to prove that the schedule exists. But photographs of the inside with the panels, audience, the keynotes in progress; these are rare. And when I do find them, they are shot from angles that reveal nothing, that could be anywhere, that could be a Marriott ballroom in Cleveland.

Is this a conspiracy theory? You can call it that, but I have a very professional online presence, so I personally wouldn’t. In fact, I wouldn’t even say that the J.P. Morgan Healthcare Conference is not real, but rather that it is real, but not actually materially real.

To explain what I mean, we can rely on economist Thomas Schelling to help us out. Sixty-six years ago, Schelling proposed a thought experiment: if you had to meet a stranger in New York City on a specific day, with no way to communicate beforehand, where would you go? The answer, for most people, is Grand Central Station, at noon. Not because Grand Central Station is special. Not because noon is special. But because everyone knows that everyone else knows that Grand Central Station at noon is the obvious choice, and this mutual knowledge of mutual knowledge is enough to spontaneously produce coordination out of nothing. This, Grand Central Station and places just like it, are what’s known as a Schelling point.

Schelling points appear when they are needed, burnt into our genetic code, Pleistocene subroutines running on repeat, left over from when we were small and furry and needed to know, without speaking, where the rest of the troop would be when the leopards came. The J.P. Morgan Healthcare Conference, on the second week of January, every January, Westin St. Francis, San Francisco, is what happened when that ancient coordination instinct was handed an industry too vast and too abstract to organize by any other means. Something deep drives us to gather here, at this time, at this date.

To preempt the obvious questions: I don’t know why this particular location or time or demographic were chosen. I especially don’t know why J.P. Morgan of all groups was chosen to organize the whole thing. All of this simply is.

If you find any of this hard to believe, observe that the whole event is, structurally, a religious pilgrimage, and has all the quirks you may expect of a religious pilgrimage. And I don’t mean that as a metaphor, I mean it literally, in every dimension except the one where someone official admits it, and J.P. Morgan certainly won’t.

Consider the elements. A specific place, a specific time, an annual cycle, a journey undertaken by the faithful, the presence of hierarchy and exclusion, the production of meaning through ritual rather than content. The hajj requires Muslims to circle the Kaaba seven times. The J.P. Morgan Healthcare Conference requires devotees of the biopharmaceutical industry to slither into San Francisco for five days, nearly all of them—in my opinion, all of them—never actually entering the conference itself, but instead orbiting it, circumambulating it, taking coffee chats in its gravitational field. The Kaaba is a cube containing, according to tradition, nothing, an empty room, the holiest empty room in the world. The Westin St. Francis is also, roughly, a cube. I am not saying these are the same thing. I am saying that we have, as a species, a deep and unexamined relationship to cubes.

This is my strongest theory so far. That the J.P. Morgan Healthcare conference isn’t exactly real or unreal, but a mass-coordination social contract that has been unconsciously signed by everyone in this industry, transcending the need for an underlying referent.

My skeptical readers will protest at this, and they would be correct to do so. The story I have written out is clean, but it cannot be fully correct. Thomas Schelling was not so naive as to believe that Schelling points spontaneously generate out of thin air, there is always a reason, a specific, grounded reason, that their concepts become the low-energy metaphysical basins that they are. Grand Central Station is special because of the cultural gravitas it has accumulated through popular media. Noon is special because that is when the sun reaches its zenith. The Kaaba was worshipped because it was not some arbitrary cube; the cube itself was special, that it contained The Black Stone, set into the eastern corner, a relic that predates Islam itself, that some traditions claim fell from heaven.

And there are signs, if you know where to look, that the underlying referent for the Westin St. Francis status being a gathering area is physical. Consider the heat. It is January in San Francisco, usually brisk, yet the interior of the Westin St. Francis maintains a distinct, humid microclimate. Consider the low-frequency vibration in the lobby that ripples the surface of water glasses, but doesn’t seem to register on local, public seismographs. There is something about the building itself that feels distinctly alien. But, upon standing outside the building for long enough, you’ll have the nagging sensation that it is not something about the hotel that feels off, but rather, what lies within, underneath, and around the hotel.

There’s no easy way to sugarcoat this, so I’ll just come out and say it: it is possible that the entirety of California is built on top of one immensely large organism, and the particular spot in which the Westin St. Francis Hotel stands—335 Powell Street, San Francisco, 94102—is located directly above its beating heart. And that this is the primary organizing focal point for both the location and entire reason for the J.P. Morgan Healthcare Conference.

I believe that the hotel maintains dozens of meter-thick polyvinyl chloride plastic tubes that have been threaded down through the basement, through the bedrock, through geological strata, and into the cardiovascular system of something that has been lying beneath the Pacific coast since before the Pacific coast existed. That the hotel is a singular, thirty-two story central line. That, during the week of the conference, hundreds of gallons of drugs flow through these tubes, into the pulsating mass of the being, pouring down arteries the size of canyons across California. The dosing takes five days; hence the length of the conference.

And I do not believe that the drugs being administered here are simply sedatives. They are, in fact, the opposite of sedatives. The drugs are keeping the thing beneath California alive. There is something wrong with the creature, and a select group of attendees at the J.P. Morgan Healthcare Conference have become its primary caretakers.

Why? The answer is obvious: there is nothing good that can come from having an organic creature that spans hundreds of thousands of square miles suddenly die, especially if that same creatures mass makes up a substantial portion of the fifth-largest economy on the planet, larger than India, larger than the United Kingdom, larger than most countries that we think of as significant. Maybe letting the nation slide off into the sea was an option at one point, but not anymore. California produces more than half of the fruits, vegetables, and nuts grown in the United States. California produces the majority of the world’s entertainment. California produces the technology that has restructured human communication. Nobody can afford to let the whole thing collapse.

So, perhaps it was decided that California must survive, at least for as long as possible. Hence Amgen. Hence Genentech. Hence the entire biotech revolution, which we are taught to understand as a triumph of science and entrepreneurship, a story about venture capital and recombinant DNA and the genius of the California business climate. The story is not false, but incomplete. The reason for the revolution was, above all else, because the creature needed medicine, and the old methods of making medicine were no longer adequate, and someone decided that the only way to save the patient was to create an entire industry dedicated to its care.

Why is drug development so expensive? Because the real R&D costs are for the primary patient, the being underneath California, and human applications are an afterthought, a way of recouping investment. Why do so many clinical trials fail? For the same reason; the drugs are not meant for our species. Why is the industry concentrated in San Francisco, San Diego, Boston? Because these are monitoring stations, places where other intravenous lines have been drilled into other organs, other places where the creature surfaces close enough to reach.

Finally, consider the hotel itself. The Westin St. Francis was built in 1904, and, throughout its entire existence, it has never, ever, even once, closed or stopped operating. The 1906 earthquake leveled most of San Francisco, and the Westin St. Francis did not fall. It was damaged, yes, but it did not fall. The 1989 Loma Prieta earthquake killed sixty-three people and collapsed a section of the Bay Bridge. Still, the Westin St. Francis did not fall. It cannot fall, because if it falls, the central line is severed, and if the central line is severed, the creature dies, and if the creature dies, we lose California, and if we lose California, our civilization loses everything that California has been quietly holding together. And so the Westin St. Francis has hosted every single J.P. Morgan Healthcare Conference since 1983, has never missed one, has never even come close to missing one, and will not miss the next one, or the one after that, or any of the ones that follow.

If you think about it, this all makes a lot of sense. It may also seem very unlikely, but unlikely things have been known to happen throughout history. Mundus Subterraneus had a section on the “seeds of metals,” a theory that gold and silver grew underground like plants, sprouting from mineral seeds in the moist, oxygen-poor darkness. This was wrong, but the intuition beneath it was not entirely misguided. We now understand that the Earth’s mantle is a kind of eternal engine of astronomical size, cycling matter through subduction zones and volcanic systems, creating and destroying crust. Athanasius was wrong about the mechanism, but right about the structure. The earth is not solid. It is everywhere gaping, hollowed with empty rooms, and it is alive.

A 2026 look at three bio-ML opinions I had in 2024

Abhishaike Mahajan — Wed, 07 Jan 2026 18:30:26 GMT

Note: I am in San Francisco right now and, in an extraordinary coincidence, I stumbled across two of the people whose work I mention in this article! Very grateful to John Bradshaw for chatting about reaction prediction and Gina El Nesr for chatting about molecular simulation.

A second note: while here in SF, I will be co-hosting an event on Friday, Jan 16th, from 6-9pm, w/ Tamarind Bio! It will be at Southern Pacific Brewing, here is the link to the invite. You should come by!

Introduction

There are two memories that I have to imagine are particularly heartwarming for any parent. One, seeing their child for the first time, and two, gleefully showing photographs of that child to an older version of that child, shouting, look how small you used to be! So small! Do you know how hard I worked to take care of you? You were so difficult! But it’s okay, because you were so, so tiny.

I will do something similar to this today. This blog has been operating for the exceptionally long period of 1.7~ years, which means I finally have blog posts that I wrote back in 2024 to resurface, dust off, and proudly present back to you, giving you an update on how things have shifted in the 1~ years since they were written.

I will do this for three articles, back when my cover images were stranger:

It’s fun looking back at these three in particular, because they all feel intellectually significant. All of them were, essentially, predictions of where the future in a specific subfield of bio-ML may go. The first was the first time I’d ever seriously engaged in the small-molecule design space, the second for the molecular dynamics space, and the third for what are durable startup plays. Each one required multiple conversations with multiple people, many of whom I’d talked to the first time ever, and some I continue talking to today. Nostalgic!

But why do this at all? It would be easy to write confidently about the future and then quietly memory-hole the predictions when they don’t pan out, which, to be clear, there’s nothing wrong with and I likely will do many times. This is a blog, nobody cares that much. Still, it is worth doing this purely because it forces me to wrap my head around what has changed since I last covered something, not merely everything that is new and exciting. This is a little boring, but it does feel an important muscle to flex for the same reason that it is important to do your A-B-C’s every few months; just making sure you’re still capable of accomplishing the fundamentals.

As for format: for each article, I’ll briefly recap the original thesis, look at what has actually happened since, and render some kind of verdict as to what went right/wrong. I’ll also attach a tl;dr at the top of each section.

Generative ML in chemistry is bottlenecked by synthesis

tl;dr: I was correct in a contrived sense. Arbitrary molecular synthesis is still hard and the models still aren’t perfect at telling you good synthesis routes for whatever they produce. But what has changed is a lot more money has flowed into making synthesis better outright, and, much more importantly, the space of ‘easily synthesizable molecules’ has slowly expanded from ~40B to ~80B, and will likely continue to climb. At a certain point, who cares about what is outside of that? Is it actually bottlenecking anyone?

Back in September 2024, I wrote an article arguing that generative ML in chemistry is bottlenecked by synthesis being slow, costly, or outright impossible. The thesis was not original in the slightest, and was clowned upon in the r/chemistry subreddit for being something that was so patently obvious that how could someone possibly have written 4,400~ words over it. This was very rude, but sadly, they were not wrong. It is pretty obvious.

My basic argument went something like this: creating proteins are easy. Every time I design a protein, I can just send the sequence to Twist Biosciences, have them create a plasmid, and cells will (almost) always pump out my protein. The same is not true for the rest of chemistry.

Of the 10^60 small molecules that theoretically exist, there is no ‘ribosome' for creating them, each must go undergo at least a somewhat custom synthesis process. Some chemicals are impossibly hard to create, some are easy to create, and lots lie in the spectrum between. A fun example of the former I used in the article was erythromycin A, a now-common antibiotic that was originally isolated from a bacterium. From beginning to end, this molecule took 9 years to figure out how to synthesize.

And some molecules are even harder! After 40 or so years, the total synthesis of Paclitaxel, a chemotherapy agent, is still an ongoing research effort. In the meantime, we just harvest the chemical from a very specific of tree.

This makes machine-learning in chemistry somewhat annoying, because it means your generative model can happily spit out thousands of candidate molecules to bind to some input protein, and 99% of them, perhaps 100% of them, are intractable to create given your non-infinite budget and time constraints.

When I first wrote the article, it seemed like there were two possible fixes on the horizon.

The first fix was that that small molecule models will become more ‘synthesis-aware’. What does that mean? Quoting from my article:

One definition could be low-step reaction pathways that require relatively few + commercially available reagents, have excellent PMI, and have good yield. Alongside this, we’d also like to know the full reaction pathway too, along with the ideal conditions of the reaction! It’s important to separate out this last point from the former; while reaction pathways are usually immediately obvious to a chemist, fine-tuning the conditions of the reactions can take weeks.

In this world, you may personally not know how to synthesize the bizarre stuff your model spits out, but, if your model is smart enough, perhaps it’d be able to helpfully provide all the steps needed to make it.

The second fix was that arbitrary synthesis of molecules would simply become a lot easier, and that something akin to ‘ribosome for chemical synthesis’ would miraculously be invented.

Now, there’s the obvious steelman to both of these: chemical screening libraries—or, chemical space that is known to be easily synthesizable—is quite large, and potentially accounts for all useful stuff. So, maybe you don’t need models that even generate molecules or better synthesis, you just need models that can filter from this pre-existing known of ‘reachable’ chemical space. From my article:

One example is Enamine REAL, which contains 40 billion compounds. And as a 2023 paper discusses, these ultra-large virtual libraries display a fairly high number of desirable properties. Specifically, dissimilarity to biolike compounds (implying a high level of diversity), high binding affinities to targets, and success in computational docking, all while still having plenty of room to expand.

With all this background context: how has this field changed in the 1.5 years since this article was published?

On the synthesis-aware modeling front: it may be somewhat interesting for you to learn that a singular MIT professor named Connor Cooley was—circa 2024 when I wrote that article—responsible for a rather significant chunk of the ML x synthesis literature I came across. As of 2026, this continues! And unfortunately, from a paper he published in mid-2025 that was attempting to evaluate possible failure modes of these synthesis-aware generative molecular models, he had this paragraph:

It is also natural to wonder if the task of reaction prediction has been “solved” to a meaningful degree. When using these models in practice, it quickly becomes apparent that the answer is a resounding no. In fact, when using reaction predictors in new domains, not only might a model make an incorrect prediction, it might hallucinate a product preposterous to a human chemist.

So, it does not immediately feel like there are major breakthroughs that have cropped up in the past year—which I further confirmed with John Bradshaw, the first author of the paper, who I coincidentally met up with the other day. Now of course, I’m certain there has been some material progress, but little that is immediately legible to me.

Let’s move on. How are we doing with the second problem: improving our ability to synthesize arbitrary chemicals?

Curiously, there has been a huge flurry of startup activity here over the last year. onepot raised a $15M Series A in November 2025 to synthesize arbitrary molecules, Chemify raised a $50M series B in October 2025 to synthesize arbitrary molecules, Excelsior Sciences raised a $95M series A in December 2025 to synthesize arbitrary molecules. A pattern is emerging here, and I’m not even naming everyone who has started something in this space!

The optimistic read is that we're witnessing the early stages of the long-prophesied chemical synthesis revolution, that the combination of better robotics, improved reaction prediction, and some clever engineering is finally paying off some fundamental fruit. Is that true? I don’t know! These things take time to play out.

Speaking of onepot—which is a synthesis-on-demand service startup—there’s a particularly illuminating text interview that the renowned Corin Wagen had with the founders of onepot (Daniil Boiko and Andrey Tyrin) back in December 2025 that may teach us something useful. A few interesting excerpts are as follows:

Andrei: I also think that synthesis is a very complicated problem, and I think we have unique insight on how to solve it from an interdisciplinary standpoint. So you can imagine the company would be trying to solve it just by improving the organic chemistry side of things, and that's a very reasonable approach, or there are companies that would really invest in developing very sophisticated models for synthesis, but in our perspective it's important to have all of the components, if that makes sense.
Both the organic chemistry side of things and the computer science side of things are intertwined. So that is very important here for the success of making synthesis automated basically.

This is, I think, a very fun take. Maybe there is no magic sauce that needs to be really invented, but rather, all the ML and chemical tools for (largely) solving synthesis are already out there, someone just needs to have a broad-enough knowledge base to glue it all together. Happily, Corin presses on this a bit further:

Corin: Where do you guys see that your big advances have happened so far? So are you inventing new reactions? New instruments? Is the magic in the integration? What are you doing that other folks haven’t figured out yet?
Daniil: Yeah, it’s a good question. So, automation is really straightforward. When you start doing something very complex, I always think that I’m going to hit a wall in what I’m doing—everybody says it’s very hard—and then we start doing it and it just never happens. We just do it and still it’s fine all the way down. It’s a lot of work but still totally fine. So we don’t see much of a problem on the automation side there. We have to customize existing hardware a little bit, but it’s fine.
We do see a lot of gains on this tool ML layer. So Andrei probably could tell more about this, but it’s the reason why we have success-based pricing. So if we fail an experiment, we’re the ones who have to pay for it, which is very unfortunate. So there is very clear value from these models. I mean, you can literally calculate the economic impact of increasing your accuracy of the model by another 5%. It’s very clearly translated.
And on the agentic side, it’s another thing: if you could make a thought experiment and let’s say replicate one of the largest companies that works in enumerated library space, you would need to get all the reactions, all the protocols, and develop them from scratch. Just imagine the amount of effort that will go on there. You would need chemists working on hardware, setting up the reactions, doing hundreds of experiments for every single reaction, then analyzing the data and making conclusions, then optimizing again. It’s absolutely ridiculous.
Andrei: I also think that with our approach, we have the pieces that improve one another. So you know: one direction is we have the agents that have this broad intelligence, that have this literature knowledge, that can design high-level conditions for experiments. This is an exploration phase, a creative phase.
And then we also have another direction with custom models that are highly specialized to capture these patterns hidden in reaction data. They just complement each other really well: we have this general-layer intelligence and this specialized intelligence that captures things at the atom level. You know, you place a nitrogen somewhere, reactivity changes suddenly, and you need to find another route, maybe even get more steps there. It’s a bit counterintuitive for humans and for LLMs as well, so having this specialized intelligence is very important.

Neat! There is a bet here on natural language LLMs being very essential to the whole process. And, for what it is worth, there is plenty of precedent that this is a directionally correct idea, here, and here, and here. Cool that someone started up a company with that explicitly part of the thesis.

Finally: how is the steelman I presented in the old article doing? Do we need arbitrary molecular synthesis? The definition of ‘easily accessible chemical space’ changes as reaction pathway design as a discipline gets better and better, and, when I first last looked into the subject, this space was hovering at ~40B molecules.

As of September 2025, this space has been expanded to 83B, which specifically refers to a chemical area that Enamine, a very well-established, chemical supplier company, considers easy-to-create within 2-3 weeks. And, in April 2025, Recursion and Enamine announced a partnership to use Recursion's MatchMaker tool (a model that assess whether a small molecule is compatible with a specific protein binding pocket), to intelligently filter Enamine’s 65B (at the time) compound library down to 10 enriched screening libraries from over 15,000 newly synthesized compounds designed to find binders to 100 drug targets.

This is, I think, exactly what you'd expect to happen if the steelman is correct. If easily-accessible chemical space is large enough to contain all the good stuff, then you don't need fancy synthesis, you simply need fancy filtering. As of today, I do not think there has been an update to this project, but excited to see where this goes! If anyone from Recursion is reading this and would allow me to break the scoop here, contact me!

So, to summarize, ML is still technically bottlenecked by synthesis, but there is increasingly aggressive effort to fix both the synthesis problems outright, and to figure out whether the ‘hard’ synthesis is even needed. Now, similar efforts almost certainly existed back when I wrote the article, but they seem much more well-capitalized and numerous today.

Molecular dynamics data will be essential for the next generation of ML protein models

tl;dr: My thesis was somewhat accurate. Some ML + molecular dynamics (MD models have been trained out since the article, and the synthetic data used for them was somewhat helpful. But the core dataset problem that comes alongside MD—timescales that are too short—have not been solved, and those prevent MD from being extremely useful as training data. There is good effort being put towards fixing this though! But my thesis was also too absolutist; non-MD models likely will continue to have their place.

The origin of rot

Abhishaike Mahajan — Tue, 30 Dec 2025 17:29:55 GMT

Note: I spent my holidays writing a bunch of biology-adjacent, nontechnical pieces. I’ll intermittently mix them between whatever technical thing I send out, much like how a farmer may mix sawdust into feed, or a compounding pharmacist, butter into bathtub-created semaglutide. This one is about history!

The book ‘Death with Interruptions’ is a 2005 speculative fiction novel written by Portuguese author José Saramago. It is about how, mysteriously, on January 1st of an unnamed year in an unnamed country, death ceases to occur. Everyone, save the Catholic church, is initially very delighted with this. But as expected, the natural order collapses, and several Big Problems rear their ugly heads. I recommend reading it in full, but the synopsis is all I need to mention.

The situation described by José is obviously impossible. Cells undergo apoptosis to keep tissues healthy; immune systems kill off infected or malfunctioning cells; predators and prey form a food chain that only works because things end.

But what you may find interesting is that what exactly happens after death has not always been so clear-cut. Not the religious aspect, but the so-called thanatomicrobiome—the community of microbes that colonize and decompose a body after death—is not necessarily a given. And there is some evidence that, for a very, very long time, it simply did not exist at all. Perhaps for much of the planets lifetime, the earth was a graveyard of pristine corpses, forests of bodies, oceans of carcasses, a world littered with the indigestible dead.

Implausible, yes, but there is some evidence for it: the writings of a young apprentice scribe, aged fifteen, named Ninsikila-Enlil who was born in 1326 BCE and lived at a temple in ancient Babylon. Ninsikila-Enlil kept a diary, inscribed in tight, spiraling cuneiform on long clay tablets. In these tablets is his daily life, which primarily consisted of performing religious rituals for what has been loosely translated as the ‘Pit of Eternal Rest’. The purpose of this pit was precisely what the name implies: to store the deceased. It is, from the writings, unclear how deep the hole went, only that it was mentioned to be monstrously deep, so deep that centuries of bodies being slid down into it continued to slip into the nearly liquid darkness, sounds of their eventual impact never rising back to the surface.

But a particular curiosity were the bodies themselves.

Here I shall present two passages from Ninsikila’s writings, the first from early in his service, the second from a year later. The former is as follows:

The bodies wait in the preparation hall for seven days before consignment. I am permitted to visit after the second washing. My mother’s mother has been waiting for three days. She is the same as the day she passed. [The chief priest?] says the gods have made a gift of flesh. That it will remain this way even after she enters the pit. Her hands were always cracked from work, and they are still cracked.

There are many, many other paragraphs through his tablets that parallel this. An amber-like preservation is referenced repeatedly, described variously as “the stillness of resins,” or “flesh locked in golden sap.” But, later, Ninsikila put down the first observation of something new occurring amongst the bodies that wait to be placed in the pit. The second writing is this:

The wool-merchant [deposited?] on the third of Nisannu, and had been waiting for some time now. I pressed his chest and the flesh moved inward and did not return. Fluid on my hand. A smell I have not encountered before. Small, ebony things in his eyes, moving. I washed with būrtu-water seven times. I do not know what this is.

Rot, decomposition, it seemed, had finally arrived to a world that had not yet made room for it.

We know from Ninsikila writings that the wisest of the period, in search of what could have caused this, posited that the whole world had been tricked. That the flesh had once made a pact with time to remain eternally perfect, and time, in its naivety, had agreed. But something in the ink, some theorized, had curdled. Some insects had crawled across the tablet while the covenant was still wet, dragging one word into another and rendering the entire contract void.

Of course, it is worth raising some doubt at this. Ninsikila is a child, albeit clearly an erudite one, and would be prone to some flights of fantasy. How could we trust his retelling of the story? Unfortunately, we cannot, not fully, at least if our standard of proof here is having multiple, corroborating writings from the same period. But what we do have is historical evidence, or, at least, what some have argued is corroborating historical evidence.

Just a month after the initial finding of decomposition, Ninsikila writings cease. Moreover, this ending coincided with beginnings of the Hittite plague, an epidemic that, depending on which Assyriologist you consult, began somewhere between 1322 and 1324 BCE. And there is proof to suggest the fact that the true geographic foundations of the plague were, in fact, at the exact site of the pile of bodies watched over by Ninsikila. Some historians will protest at this, claiming that the Hittite plague was primarily a disease of the Anatolian heartland, far removed from Babylonian temple complexes. They will point to the well-documented military campaigns, the movement of prisoners of war.

But they all fail to account for, during the years in which the plague is believed to have started, there were multiple independent corroborations of the the skies of Babylon turning nearly ebony with flies, a canopy so dense it shaded the temple courtyards and drowned out religious chants with its own droning liturgy—a wet, collective susurration, the sound of ten billion small mouths working. The air turned syrupy, clinging to the skin, the foulness so thick it could nearly be chewed, metallic and rotten-fruit sweet. And the closer one got to Babylon, the more it drowned them beneath this sensory weight. We have records from a trade caravan whose leader—a merchant of salted fish and copper ingots—noted in his ledger that he could smell the city three days before he could see it. At one day’s distance, taste it, the foulness nearly making him retch.

The concentration of bodies in the Babylonian pile was higher than it had ever been not just in Babylon, not just in Mesopotamia, but in the entire known world. Tens of thousands of bodies stacked, pressed, pooled together in heat and humidity; an unprecedented density of biological matter that, prior to the centuries-long effort to gather it together, had never existed. Is it not possible that in this particular place, in the wet anaerobic environment, that new forms of life emerged? It feels obvious to posit that something was created here, something that consumed the pile, infected the air, and gorged itself on so much biological matter that it survives to this day, still swimming in our land and oceans.

Ninsikila-Enlil’s final entry is not particularly illuminating, but what is worth mentioning is where his resting place lies. Ninsikila was born with a birth defect: his sternum never fused, a fact we know from his writings. A soft hollow where his chest should have been, the bones bowing outward like the peeled halves of a pomegranate, exposing a quivering pouch of skin that pulsed visibly with his heartbeat. He noted that his priest-physicians, embarrassed, called it a divine aperture. His mother bound the hollow in layers of linen and never spoke of it again.

This is important, since it allowed us to place Ninsikila’s skeleton, which lies not at the top of the pile—as one may expect of a child succumbing to disease—but near the bottom. Endless bodies lay above him, centuries of death, likely nearly liquified when he encountered them. But his position is not passive, rather, his arms are outstretched, fingers cracked and blackened, the bones of his hands splintered at the ends, as though he had clawed his way down through thousands of corpses. Ninsikila was a child of God, born into the priesthood, spent his short life in faithful rituals to the divine, and it is perhaps only expected that his final moments were in desperate excavation, believing that somewhere below, at the base, lay the answer as to what had been corrupted, and whether it could be undone.

The ML drug discovery startup trying really, really hard to not cheat (Leash Bio)

Abhishaike Mahajan — Tue, 23 Dec 2025 13:05:02 GMT

Note: I’ll be Austin until Jan 3rd, and in San Francisco (for JPM) from Jan 3rd-17th, message me on X/email to hang out! Also, thank you to Ian Quigley and Andrew Blevins, the two co-founders of Leash Bio, for answering the many questions that arose while writing this essay.

Introduction

What I will describe below is a rough first approximation of what it is like to work in the field of machine-learning-assisted small-molecule design.

Imagine that you are tasked with solving the following machine-learning problem:

There are 116 billion balls of varying colors, textures, shapes, and sizes in front of you. Your job is to predict which balls will stick to a velcro strip. To help start you off, you’re given a training set of 10 million balls that have already been tested; which ones stuck and which ones didn’t. Your job is to predict the rest. You give it your best shot, train a very large transformer on 80% of the (X, Y) labels, and discover that you’ve achieved an AUC of .76 on a held out 20% set of validation balls. Not too shabby, especially given that you only had access to .008% of the total space of all balls. But, since you’re a good hypothetical scientist, you look more into what balls you did well on, and which balls you did not do well on. You do not immediately find any surprises; there is mostly uniform error across color, textures, shapes, and sizes, which are all the axes of variation you’d expect exists in the dataset. But perhaps you’re a really good hypothetical scientist, and you decide that to be certain of the accuracy here, you’ll need to fly in the top ball-velcro researcher in the world to get their take on it. You do so. They arrive, take one look at your results, and burst out in laughter.‘ What’, you stutter, ‘what’s so funny?’. In between tears and convulsions, the researcher manages to blurt out, ‘You fool! You absolute idiot! Nearly all the balls in both your training set and test set were manufactured between 1987 and 2004, using a process that was phased out after the Guangzhou Polymer Standardization Accords of 2005! Your ball-velcro model is not a ball-velcro model at all, but rather a highly sophisticated detector of Guangzhou Polymer Standardization Accords compliance!’ The researcher collapses into a chair, still wheezing.

Actually, this hypothetical situation is easier than the real one, since there are several orders of magnitude more small-molecules in existence than the 116 billion balls, and there are also a few tens-of-thousands of possible velcro strips— binding proteins—in existence too, each with their own unique preferences.

Given the situation here, there is a fair bit of cheating that goes on in this field. Most of it is accidental and maybe even unavoidable, and truthfully, it is difficult to not feel at least some sympathy for the researchers here. There is something almost cosmically unfair about trying to solve a problem where the axes of variation you don’t know about vastly outnumber the axes you do, making it so the space of possible ways you could be wrong is practically infinite. Can we fault these people for pretending that their equivalence to the compliance-detection-machine is actually useful for something?

Well, yes, but we should also understand that the incentives aren’t exactly set up for being careful, thinking really hard, and trying to ensure that the model did the Correct Thing. This is true even in the private sector, where the timelines for end utility of these models are far off in the horizon, where the feedback loops are so long that by the time anyone discovers your model was secretly a Guangzhou Accords detector, there are no meaningful consequences for anybody involved.

This is why I think it is important to shine a spotlight on people trying to, despite the situation, do the right thing.

And this essay is my attempt to highlight one such party: Leash Bio.

Leash Bio is a Utah-based, ~~12~~~ 9-person startup founded in 2021 by two ex-Recursion Pharmaceutical folks: Ian Quigley and Andrew Blevins. My usual biotech startup essays are about places that have strange or especially out-there scientific theses, so I spend a long time focusing on the details of their work, where it may pay off big, and the biggest risks ahead.

I will not do this here, because Leash Bio actually has both a very well-trodden scientific thesis (build big datasets of small-molecules x protein interactions and train a model on it) and a very well-trodden economic thesis (use the trained model to design a drug). There’s clearly some value here, at least to the extent that any ML-for-small-molecule-development play has value. There’s also some external validation: a recent partnership with Monte Rosa Therapeutics to develop binders to novel targets.

Really, what is most unique about Leash is almost entirely that, despite how hard it is to do so, they have a nearly pathological desire to make sure their models are learning the correct thing. They have produced a lot of interesting artifacts from this line of research, much of which I think should have more eyes on. This essay will dig deep into a few of them. If you’re curious to read more about their research, they also have their own fascinating blog here.

Some of Leash’s research

The BELKA result

You may recall an interesting bit of drama that occurred just about a year back between Pat Walters—who is one of the chief evangelists of ‘many people in the small-molecule ML field are accidentally cheating’ sentiment—and the authors of DiffDock, which is a (very famous!) ML-based, small-molecule docking model.

The drama originally kicked off with the publication of Pat’s paper ‘Deep-Learning Based Docking Methods: Fair Comparisons to Conventional Docking Workflows’, which claimed to find serious flaws with the train/test splits of DiffDock. Gabriel Corso, one of the authors on the DiffDock paper, responded to the paper here, basically saying ‘yeah, we already knew this, which is why we released a follow-up paper that directly addressed these’. After many comments back and forth, the saga mostly ended with the original Pat paper having this paragraph being appended to it:

The analyses reported here were based on the original DiffDock report [1], with performance data provided directly by authors of that report, corresponding exactly to the published figures and tables. Subsequently, in February 2024, a new benchmark (DockGen) and a new DiffDock version (DiffDock-L) was released by the DiffDock group [21]. This work post-dated our analyses, and we were unaware of this work at the time of our initial report, whose release was delayed following completion of the analyses.

All’s well that ends well, I suppose.

But what was the big deal with the train/test splits anyway?

To keep it simple: the original DiffDock paper trained on pre-2019 protein-ligand complexes, and tests on post-2019 protein-ligand complexes. This may not be too terrible, but you can imagine one failure mode of this is that there is a lot of conservation in the chemical composition of binding domains, making it so the model is more interested in memorizing binding-pocket-y residues rather than trying to learn the actual physics of docking. So, when presented with a brand new binding pocket, it’d fail. And indeed, this is the case.

In the follow-up DiffDock-L paper, the authors moved to a benchmark that ensured that proteins with the same protein binding domains were either only in the train or only in the test dataset. Performance fell, but the resulting model was able to demonstrate much better diversity to a broader range of proteins.

Excellent! Science at work. But there is an unaddressed elephant in the room: what about chemical diversity? DiffDock-L may very well generalize to unseen protein binding pockets, but can it do well on ligands that are very structurally different from ligands it was trained on? This isn’t really a gotcha for DiffDock, because it turns out that the answer is ‘surprisingly, yes’. From a paper that studied the topic:

Diffusion-based methods displayed mixed behavior. SurfDock showed declining performance with decreasing ligand similarity on Astex, but surprisingly improved on PoseBusters and DockGen, suggesting resilience to ligand novelty in more complex scenarios. Other diffusion-based and all regression-based DL methods exhibited decreasing performance on Astex and PoseBusters, but remained stable—or even improved slightly—on DockGen, likely implying that unfamiliar pockets, rather than ligands, pose the greater generalization barrier.

But docking is not the big problem, not really.

The holy grail for protein-ligand-complex prediction is predicting affinity; not only where a small-molecule binds to, but how tightly. And here, it turns out that it is incredibly easy to mislead oneself on how well models can do here. In an October 2025 Nature Machine Intelligence paper titled ‘Resolving data bias improves generalization in binding affinity prediction’, they say this:

This large gap between benchmark and real-world performance [of binding affinity models] has been attributed to the underlying training and evaluation procedures used for the design of these scoring functions. Typically, these models are trained on the PDBbind database³⁷^,³⁸, and their generalization is assessed using the comparative assessment of scoring function (CASF) benchmark datasets¹⁰. However, several studies have reported a high degree of similarity between PDBbind and the CASF benchmarks. Owing to this similarity, the performance on CASF overestimates the generalization capability of models trained on PDBbind¹⁰^,³⁹^,⁴⁰. Alarmingly, some of these models even perform comparably well on the CASF datasets after omitting all protein or ligand information from their input data. This suggests that the reported impressive performance of these models on the CASF benchmarks is not based on an understanding of protein–ligand interactions. Instead, memorization and exploitation of structural similarities between training and test complexes appear to be the main factors driving the observed benchmark performance of these models³⁵^,³⁶^,⁴¹^,⁴²^,⁴³.

What a pickle!

Now, the paper goes on to come up with its own split from the PDB that takes into account a combination of protein similarity, binding conformation similarity, and, most relevant to us, ligand similarity. How do they judge ligand similarity? A metric called the ‘Tanimoto score’, which seems like a pretty decent way to get to better generalization per another Pat Walters essay.

Well, that’s that, right? Have we solved the ball problem before?

Not quite. Tanimoto-based filtering is an improvement, but it is still an exercise in carving up existing public data more carefully. Why is that a problem? Because public data are not random samples from chemical space, but are rather the the accumulated residue of decades of drug discovery programs and academic curiosity. Because of that, even if you filter out molecules with Tanimoto similarity above some threshold, you might still be left with test molecules that are “similar” in ways that Tanimoto doesn’t capture: similar pharmacophores, similar binding modes, similar target classes. A model might still be learning something undesirable, like, “this looks like a kinase inhibitor I’ve seen before”, and there is really no way to stop that no matter how you split up the public data.

How worried should we be about this? Surely at a certain level of scale, the Bitter Lesson takes over and our model is learning something real, right?

Maybe! But we should test that out, right?

Finally with this background context, we can return to the subject of this essay.

In late 2024, Leash Bio, in one of the most insane public demonstrations I have yet seen from a biotech company, issued a Kaggle challenge to all-comers: here’s 133 million small molecules generated via a DNA-encoded library (which we’ll discuss more about later) that we’ve screened against three protein targets, and here’s binary binding labels for all of them. The problem statement is as follows: given this dataset—also known as ‘BELKA’, or Big Encoded Library for Chemical Assessment—predict which ones bind.

How large is this dataset in relative terms? In the introductory post for the dataset, Leash stated this:

The biggest public database of chemistry in biological systems is PubChem. PubChem has about 300M measurements (11), from patents and many journals and contributions from nearly 1000 organizations, but these include RNAi, cell-based assays, that sort of thing. Even so, BELKA is >10x bigger than PubChem. A better comparator is bindingdb (12), which has 2.8M direct small molecule-protein binding or activity assays. BELKA is >1000x bigger than bindingdb. BELKA is about 4% of the screens we’ve run here so far.

As for the data splits, Leash provided three:

A random molecule split. The easiest setting.
A split where a central core (a triazine) is preserved but there are no shared building blocks between train and test.
A split based on the library itself. In other words, it was a test set with entirely different building blocks, different cores, and different attachment chemistries, molecules that share literally nothing with the training set except that they are, in fact, molecules. The hardest setting.

Here is the hilarious winning result from the Kaggle competition, where ‘kin0’ refers to the 3rd data split:

In other words, a model was trained on a dataset that is an order of magnitude larger than any dataset that has come before it. And it completely failed to generalize in any meaningful capacity, being nearly perfectly equivalent to random chance. In turn, Leash’s blog post covering the whole matter was titled ‘BELKA results suggest computers can memorize, but not create, drugs’.

Now, it is worth protesting at this result. Chemistry is complex, yes, but it is almost certainly bounded in its complexity. So, one defense here is that diversity matters more than scale, and that, say, bindingdb’s ~2.8 million data-points, despite being far smaller, span far more of chemical space than BELKA’s 133 million. Moreover, bindingdb contains hundreds of targets, whereas BELKA only contains 3. In comparison, BELKA is, chemically speaking, incredibly small. Is it any wonder models trained on it, and it alone—as these were the rules for its Kaggle competition—don’t generalize well?

These are all fair arguments. Is this entire thing based on a contrived dataset?

There is an easy way to assuage our concerns. We can just load up a state-of-the-art binding affinity model, one that has been trained on vast swathes of publicly available data out there, and try it out on a BELKA-esque dataset. Say, Boltz2. How does that model perform?

The Hermes result

Well, BELKA can’t just be used out of the box. To ensure that they are truly testing ligand generalization, Leash first curated a subset of their data that has no molecules, scaffolds, or even chemical motifs in common with training sets used in Boltz2 training. This shouldn’t be any trouble for a model that has sufficiently generalized!

At the same time, they put Boltz2 in a head-to-head comparison against a lightweight sequence-only, 50M parameter (!!!) transformer called Hermes trained by the Leash team. Given 71 proteins, 7,515 small molecule binders, and 7,515 negatives, the task was to predict the likelihood of binding given a pair of proteins and small-molecules.

But before we talk about the results, let’s quickly discuss Hermes. Specifically, that Hermes was not trained on any public data, but rather, on the combined sum of all the binding affinity data that Leash has produced. How much of this data is there? At the time Hermes was trained, just shy of 10B ligand-protein interactions. At the time this essay you are reading was published, it is now 50B interactions. Both of these numbers are several orders of magnitude higher than any other ligand x protein dataset in existence.

To note: BELKA is not included in these numbers, because it is not actually a dataset they use to train their models, due to it prioritizing an extremely high number of ligands to a few proteins, rather than a mix of diversity between the two. But the same DNA-encoded library process is used to generate it!

Finally, we can move onto the results.

Hermes did decently, grabbing an average AUROC of .761. Notably, the validation set here is meant to have zero chemical overlap with Hermes train set, which is something we’ll talk about more in the next section, which makes the result even more striking.

On the other hand, Boltz2 scores .577.

Hmm. Okay.

You could imagine that one pointed critique of this whole setup is that the validation dataset is private. Who knows what nefarious things Leash could be doing behind the scenes? Also, it may be the case that Leash is good in whatever space of chemistry they have curated, whereas Boltz2 is good in whatever space of chemistry exists in public databases. The binding affinity results in the Boltz2 paper are clearly far above chance, so this seems like a perfectly reasonable reconciliation of the results.

Well, Leash also curated a subset of data from Papyrus, a publicly available dataset of binding affinity data, and threw both Boltz2 and Hermes at that.

From their post:

Papyrus is a subset of ChEMBL and curated for ML purposes (link). We subsetted it further and binarized labels for binding prediction. In brief, we constructed a ~20k-sample validation set by selecting up to 125 binders per protein plus an even number of negatives for the ~100 human targets with the most binders, binarizing by mean pChEMBL (>7 as binders, <5 as non-binders), and excluding ambiguous cases to ensure high-confidence, balanced labels and protein diversity. Our subset of Papyrus, which we call the Papyrus Public Validation Set, is available here for others to use as a benchmark. It’s composed of 95 proteins, 11675 binders, and 8992 negatives.

On this benchmark, Boltz2 accuracy rose up to .755, and Hermes stayed in roughly the same territory it was previously at: .703, its confidence interval slightly overlapping with that of Boltz2’s.

So, yes, Boltz2 does edge out here, but given that the chemical space of Papyrus substantially overlaps with the CheMBL-derived binding data trained on by Boltz2, you may naturally expect this.

So, to summarize where we are at: Leash’s in-house model, trained exclusively on their proprietary data, performs about as well on public benchmarks as a model that was partially trained on those benchmarks.

And on Leash’s private data, which, crucially, has little overlap with public training sets as measured by Tanimoto scores (also included in their post), their model handily beats the state of the art.

This is all very exciting! But I want to be careful here and explicitly say that the the story here is far from complete. What we can say, with confidence, is that Leash has demonstrated something important: a lightweight model trained on dense, high-quality, internally consistent data can compete with architecturally sophisticated models trained on the sprawling, noisy, heterogeneous corpus of public structure databases. This is made even more interesting by the fact that Hermes is not structure based, allowing it to be 500x~ faster than Boltz2, the advantages of which are discussed in this other Leash post.

But what is not yet clear is proof that Leash has cracked the generalization problem. I think they are asking the right questions, and perhaps have early results that the yielded answers are interesting, but chemical space is large, far larger than anybody could ever imagine, and it would be naive of anyone to claim that the two simple benchmarks here are sufficient to declare anything for either side.

But even after tempering my enthusiasm, I still find the results fascinating. The only outstanding question is: where does this seemingly high generalization performance actually come from? Is it from the extremely large dataset? Surely partially, but, again, chemical space is so extraordinarily vast that a few tens-of-millions of (sequence-only!) samples from it surely is a drop in the bucket, and . Is it perhaps from the Hermes architecture? Also unlikely, because remember, the model itself is dead-simple, just a simple transformer that uses the embeddings of two pre-trained models (ESM2-3B and ChemBERTa).

What’s going on? Where is generalization arriving from? Well, we’ll get back to that, because first I want to talk about how the Leash curated their own train/test splits.

The train/test split result

As I’ve been repeating throughout this essay, Leash’s model is trained using DNA-encoded chemical libraries. These are combinatorial libraries where each small molecule is tagged with a unique DNA barcode that identifies its structure. The molecules themselves are built up from discrete building blocks. You have a central scaffold, and then you attach different pieces at different positions. A typical DEL molecule might have three attachment points, each of which can hold one of hundreds of different building blocks. Multiply those possibilities together and you can get millions of unique compounds from a relatively small set of starting materials.

It feels wrong to give this explanation without an associated graphic, so I asked Gemini to create one:

This is great for generating diverse molecules, but also for splitting a chemical dataset, because it allows you to split them by the building blocks they share. If there are, say, 3 possible building blocks in the library, that means a rigorous way to split things is to ensure that there are no building blocks in the train set that are in the test set.

But you may immediately see a problem here; what if two different building blocks have very chemically similar properties? This can be easily remedied by not only ensuring that there are no building-block overlaps, but also checking that the chemical fingerprint of building blocks in the train set are sufficiently dissimilar from those in the test set. In other words, you cluster the building blocks by chemical similarity, and then filter any that are in the train set from the test set.

And they did exactly this. From their post:

Our Leash private validation set is this last category: it’s made of molecules that share no building blocks with any molecules in our training set, and also the training set doesn’t have any molecules containing building blocks that cluster with validation set building blocks. It’s rigorous and devastating: splitting our data this way means our training corpus is roughly ⅓ of what would be if we didn’t do a split at all (0.7 of bb1*0.7 of bb2*0.7 of bb3 = 0.343)…
In exchange for losing all that training data, we now have a nice validation set where we can be more confident that our models aren’t memorizing, and we can use it to make an honest comparison to other models that have been trained on public data.

Using this dataset, they applied Hermes (and XGBoost as a baseline) to four increasingly difficult splits of the data: a naive split based on chemical scaffold, 2 building blocks shared split, 1 building block shared split, and 0 building blocks shared + no chemical fingerprint clusters shared. The results are as follows:

Here, simple XGBoost beats Hermes on almost every split other than the hardest one. Only when you ensure that there are zero shared building block clusters, when you truly force the model to chemically novel territory, does the more complex Hermes pull ahead.

Okay, this is a fine result, and it does rhyme with the theme of the essay w.r.t ‘being rigorous’, but this should raise more questions than it answers. As a result of how they have constructed the training dataset for Hermes, wouldn’t we expect it to have a relatively small area of ‘chemical space’ to explore? By going through this building-block and cluster filtering, surely the training data is almost comically O.O.D from the test set! And yet, as we mentioned in the last section, Hermes seems to display at least some heightened degree of chemical generalizability compared to state-of-the-art models! How is this possible?

It may have to do with the nature of the data itself: DNA-encoded libraries. Leash writes in their blog post that the particular type of data is perhaps uniquely suited for forcing a model to actually learn some physical notion of what it means to bind to something:

Our intuition is that by showing the model repeated examples of very similar molecules - molecules that may differ only by a single building block - it can start to figure out what parts of those molecules drive binding. So our training sets are intentionally stacked with many examples of very similar molecules but with some of them binding and some of them not binding.
These are examples of “Structure-activity relationships”, or SAR, in small molecules. A common chemist trope that illustrates this phenomena is the “magic methyl” (link), which is a tiny chemical group (-CH3). Magic methyls are often reported to make profound changes to a drug candidate’s behavior when added; it’s easy to imagine that new greasy group poking out in a way that precludes a drug candidate from binding to a pocket. Remove the methyl, the candidate binds well.
DELs are full of repeated examples of this: they have many molecules with repeated motifs and small changes, and sometimes those changes affect binding and sometimes they don’t.

Neat! This all said, the usage of DELs is at least a little controversial, due to it often producing false negatives, being limited in overall chemical space, and the actual hits from DEL’s not being particularly high-affinity. Given that I do not actively work in this area, it is difficult for me to give a deeply informed take here. But it is worth mentioning that even if the assay seems to have its faults, the fact that Hermes performs competitively on Papyrus—a public benchmark derived from ChEMBL that has nothing to do with DEL chemistry—suggests that whatever Leash’s models are learning cannot purely be an artifact of the DEL format. Of course, it is almost certainly the case that Hermes has its own failure modes and time will tell what those are.

And with this, we can arrive to the present day, with a very recent finding from Leash over something completely unrelated to Hermes.

The ‘Clever Hans’ result

Truthfully, I’ve wanted to cover Leash for a year now, ever since the BELKA result. But what finally got me to sit down and do it was an email I received from Ian Quigley, a co-founder of Leash, recently on November 27th, 2025. In this email, Ian attached a preprint he was working on, written alongside Leash’s cofounder Andrew Blevins, that described a phenomenon that he dubbed, ‘Clever Hans in Chemistry’. The result contained in the article was such a perfect encapsulation of the cultural ethos I—and many others—have come to associate with Leash, that I finally wrote the piece I’d been putting off.

So, what is the ‘Clever Hans’ result? Simple: it is the observation that molecules created by humans will necessarily carry with it the sensibilities, preferences, and quirks of the human who made them.

For example, here are some molecules created by Tim Harrison, a distinguished medicinal chemist at Queen’s University Belfast.

And here are some other molecules made by Carrie Haskell-Luevano, who is a chemical neuroscientist professor at the University of Minnesota’s College of Pharmacy.

I don’t know any medicinal chemistry! You may not either! And yet, you can see that there is an eerie degree of same-ness within each chemist’s portfolio. And if we can see it, can a model?

Yes.

Using ChEMBL, Leash collated together a list of chemists who they considered prolific (>30 publications, >600 molecules contributed), scrapped all their molecules, and then trained a very simple model to play Name That Chemist.

Out of 1815 chemists, their trained model had a top-1 accuracy of 27%, and a top-5 accuracy of 60% in being able to name who created an arbitrary input molecule.

If curious, Leash also set up a leaderboard for you to see how distinctive your favorite chemist is! And while some chemists’ molecules are far harder to suss out than others, the vast majority of them did leave a perceptible residue on their creations.

This may seem like a fun weekend project, but the implications start to get a little worrying when you realize that the extreme similarity amongst a chemists molecules are less of an idiosyncratic behavior, and more of a career-long optimization process of creating molecules that do X, and molecules that do X may very well end up looking a particular way. Which means that if a model can detect the author, it can infer the intent. And if it can infer the intent, it can predict the target. And if it can predict the target, it can predict binding activity. All without ever learning a single thing about why molecules actually bind to proteins.

Is this actually true though? It seems so. Using a split based on chemical scaffold (which is a pretty common, though increasingly discouraged practice), Leash found that that there is no functional difference in accuracy between giving a model a rich molecular description of the small-molecule (ECFP), and only giving a model the name of the author who made it. Even worse, both seem to encode roughly the same information.

They have a few paragraphs from their preprint that I really want to repeat here:

Put differently, much of the information that a simple structure-based model exploits in this setting is explainable by chemist style. The activity model does not need to infer detailed chemistry to perform well; it can instead learn the sociology of the dataset—how different labs behave, which series they pursue, and which targets they favor.
….
We interpret this as evidence that public medicinal-chemistry datasets occupy a narrow “chemist- style” manifold: once a model has learned to recognize which authors a molecule most resembles, much of its circular-fingerprint representation is already determined. This reinforces our conclusion that apparent structure–activity signal on CHEMBL-derived benchmarks is tightly entangled with chemist style and data provenance.

Now wait a minute, you may cry, this is just repeating the same point made in the last section about rigorous train/test splitting! And yes, this result does certainly rhyme with that. But the difference here is that the author signal seems to be inescapable through the standard deconfounding technique. Consider the following plot from the paper:

If chemist style were simply “chemists make similar-looking molecules,” you’d expect clear separation here—intra-author pairs clustering at low distances, inter-author pairs at high distances. But the distributions almost completely overlap. Both peak around 0.85-0.9 Tanimoto distance. The intra-author distribution has a slightly heavier left tail, but the effect is marginal. By the standard metric the field uses to assess molecular similarity, molecules from the same author are barely more similar to each other than molecules from different authors.

And yet, models can detect it. And it is almost certainly the case that binding affinity models trained on human-designed molecules are exploiting it.

But it gets worse. Authorship is just one axis of ‘human-induced chemical bias’ that we can easily study! There is a much more subtle one that Leash mentioned in a blog post over the subject: stage of development. Unfortunately, this type of data is a fair bit harder to get. They put it best in their blog post:

One dataset we wish we had includes how far along the medicinal chemistry journey a particular molecule might be. As researchers grow more confident in a chemical series, they’ll start putting more work into it, and this often includes more and more baroque modifications: harder synthesis steps, functional groups further down the Topliss tree, that kind of stuff.

Leash doesn’t need to worry about any of these issues for its own work, since their dataset is randomly synthesized in parallel by the millions, tested once, and either they bind or they don’t; the human intent that saturates public datasets simply isn’t present. So overall, this is a win for the ‘generate your own data’ side!

Either way, I still hope they study more and more ‘bizarre confounders in the public data’ phenomena in the future. How many other things like this exist beyond authorship and stage of development? What about institutional biases? The specifics of which building blocks happened to be commercially available where? Subscribe to the Leash blog to find out!

Conclusion

One may read all this and say, well, this is all well and good for Leash, but does every drug discovery task require genuine generalization to novel chemistry? Existing chemical space probably isn’t too bad to explore in!

And yes, I agree, and I think the founders of Leash would also. If a team is developing a me-too drug in well-explored chemical territory, a model cheating may be, in fact, perfectly fine. Creating a Guangzhou Polymer Standardization Accords detector would actually be useful!

But there is an awful lot of chemical space that is entirely unexplored, and almost certainly useful. What’s an example? I discuss this a little bit in an old article I wrote about the challenges of synthesizability in ML drug discovery if curious; an easy proof point here are natural products, which can serve as excellent starting points for drug discovery endeavors, and are known to have systemic structural differences between them and classic, human-produced molecules. Because of these differences, I would bet that the vast majority of small-molecule models out there would be completely unable to grasp the binding behavior of this class of chemical space, which, to be clear, almost certainly includes the current version of Hermes.

So, to be clear, as fun as it would be to imagine Leash doing all this model and data exploration work of a deep spiritual commitment to epistemic hygiene, the actual reason is almost certainly more pragmatic.

I gave the Leash founders a chance to read this article to ensure I didn’t make any mistakes in interpreting their results (nothing significant was changed based on their comments), and he offered an interesting comment: ‘While this piece is about us chasing down these leaks, I do want to say that we believe our approach really is the only way to enable a world where zero-shot creation of hit-to-lead or even early lead-opt chemical material is possible, particularly against difficult targets, allosterics, proximity inducers, and so on. Overfit models are probably best for patent-busting, and the past few years suggest to us that’s a losing battle for international competition reasons.’.

In other words, if the future of medicine lies in novel targets, novel chemotypes, novel modalities, you need models that have learned something fundamental about what causes molecules to bind to other molecules. They cannot cheat, they cannot overfit, they must really, genuinely, within its millions of parameters, craft a low-dimensional model of human-relevant biochemistry. And given how much they empirically care about finding ‘these leaks’, as Ian puts it, it’s difficult to not be optimistic about Leash’s philosophy being the best positioned to come up with the right solution to do exactly this.

What if we could grow human tissue by recapitulating embryogenesis? (Matthew Osman & Fabio Boniolo)

Abhishaike Mahajan — Wed, 17 Dec 2025 14:41:26 GMT

Note: Thank you to latch.bio for sponsoring this episode!

LatchBio is building agentic scientific tooling that can analyze a wide range of scientific data, with an early focus on spatial biology. Check out their agent at agent.bio! Clip on them in the episode.

If you’re at all interested in sponsoring future episodes, reach out!

Introduction

This is an interview with Matthew Osman and Fabio Boniolo, the co-founders of Polyphron.

The thesis behind Polyphron is equal parts nauseating and exciting in how ambitious it is: growing ex-vivo tissue to use in organ repair.

And, truthfully, it felt so ambitious as to not be possible at all. When I had my first (of several) pre-podcast chats with Matt and Fabio to understand what they were doing, I expressed every ounce of skepticism I had about how this couldn’t possibly be viable. Everybody knows that complex tissue engineering is something akin to how fusion is viewed in physics; theoretically possible, but practically intractable in the near-term. What we can reliably grow outside of a human body are simple structures—bones, skin, cartilage—but anything beyond that is surely decades away.

But after the hours of conversation I’ve had with the team, I’ve began to rethink my position. As Eryney Marrogi lines out in his Core Memory article over Polyphron, there is an engineering system that has reliably produced viable human tissue for eons: embryogenesis.

What if you could recapitulate this process? What if you could naturally get cells to arrange themselves into higher-order structures, by following the exact chemical guidelines that are laid out during embryo development? And, most excitedly, what if you didn’t need to understand any of these overwhelmingly complex development rules, but could outsource it all to a machine-learning system that understood what set of chemical perturbations are necessary at which timepoints?

This does not exist today, but Polyphron has given early proof points that is possible. In their most recent finding, which we talk about on the podcast, their models have discovered a distinct set of chemical perturbations that force developing neurons to arrange themselves with a specific polarity: just shy of 90°, arranged like columns. This is obviously still a simple structure—still a difficult one to create, given that even an expert could not arrive to that level of polarity—but it represents proof that you can use computational methods to discover the chemical instructions that guide tissue self-assembly.

We discuss this recent polarity result, what the machine-learning problems at Polyphron looks like, and the genuinely insane economics of the whole endeavour. The last of which is especially exciting; it is rare you hear biotech founders talk about ‘expanding the Total Addressable Market’, and actually believe them. But here, it is a genuine possibility if the Polyphron approach ends up working.

Enjoy!

Links

Spotify

Apple Podcasts

Substack/Transcript

Timestamps

(00:02:16) Introduction
(00:02:37) Why replace tissue rather than the whole organ?
(00:10:34) Why not do simple stem/progenitor cell injections?
(00:13:51) Can organs repair themselves naturally?
(00:18:21) What does “structure” actually mean in tissue engineering?
(00:21:04) Why are skin and bone the only FDA-approved tissues today?
(00:23:45) What exactly are tissue scaffolds?
(00:27:52) Why are organoids a “dead end” for this field?
(00:35:08) The argument for recapitulating developmental biology
(00:40:28) Walk us through the Polyphron experimental loop
(00:47:56) Can you simulate morphogenesis with only small molecules?
(00:49:49) How large is the set of possible tissue scaffolds?
(00:52:32) How reliable are developmental atlases?
(00:56:45) What is the machine learning model actually optimizing for?
(01:04:04) Polyphron’s first big tissue engineering result: polarity
(01:15:33) What comes after polarity?
(01:17:09) Why is vascularization the hardest problem of tissue engineering?
(01:20:33) Why can’t you just wash angiogenesis factors over the tissue?
(01:22:25) How does the graft integrate with the host’s blood supply?
(01:25:45) How do you validate tissue function before implantation?
(01:29:01) How do you design a clinical trial for a biological pacemaker?
(01:37:01) The argument for being a pan-tissue company
(01:41:57) What are the biggest scientific and economic risks?
(01:45:23) Who are Polyphron’s competitors?
(01:47:07) Expanding the TAM beyond transplant lists
(01:52:28) Autologous vs. Allogeneic approaches
(01:55:07) Is a 3-year timeline to the clinic realistic?
(01:56:28) Cross-species translation
(01:58:05) What would you do with $100M equity free?

Transcript

[00:02:16] Introduction

Abhi: Today I’ll be talking to Matthew Osmon and Fabio Boniolo, who are co-founders of Polyphron, a startup working to grow ex vivo tissue to use in organ repair. We’ll be talking about the history of the tissue engineering field, the science of existing approaches, and their argument for why computation is necessary for the field to move forwards. Thank you for coming onto the show.

Matthew: Thank you for having us. Happy to be here.

Fabio: Happy to be here.

[00:02:37] Why replace tissue rather than the whole organ?

Abhi: The first question I have is somewhat of the obvious one: why do tissue replacement as opposed to either cell replacement or full-out organ replacement?

Matthew: Right. So, first, let’s give an overview of the kinds of conditions and diseases where tissue replacement is potentially warranted as a strategy. There are a lot of them. So it’s any indication where there has been tissue loss, 3D architecture loss, and where function is downstream of that 3D architecture. You’re looking at indications where really it’s not plausible that you could drug your way out of the fibrotic tissue. So you just can’t plausibly drug scar tissue in the heart back into being a beating heart.

Now on the other end, you have kind of whole organ transplantation, which is the existence proof that tissue replacement at all—writ large—works as a strategy. There are obviously some serious limitations there, particularly around supply, eligibility, of course, the need for lifelong immune suppression, and ICU time. Organ transplantation is enormously invasive; thoracic or abdominal surgery.

And so our thesis is that actually for certain indications—a lot of the chronic age-related diseases—what you should really try and do is build functional units of tissue and intervene a lot earlier. So instead of waiting for total organ collapse, which is basically the organ transplantation model, what you do is you identify early focal forms of damage, which are highly predictive of eventual failure, and intervene at that level. With modular tissue replacement, that can essentially prevent the collapse of the organ occurring in the first place.

So places where this makes sense: heart—so scarring after a myocardial infarction, left ventricle tissue there. Advanced CKD, liver, fibrosis in the lung, COPD, pancreatitis, a bunch of CNS disorders around trauma, and in the retina. Am I missing any? There are like a hundred million people in the US with these chronic diseases that involve organ damage. And so drugs, we don’t think can modulate; devices, we think can route around the problem, but never restore full function.

Abhi: It makes instinctive sense why you would want to replace individual aspects of an organ. If most of it works fine, there’s just a tiny few bits of it that don’t... The reason I thought people usually opt for full organ replacement is more that they don’t understand what bits are damaged and what bits are not damaged, and so they just want to replace everything outright. But you told me that that is in fact actually not true. There are usually pretty clear markers of damage that you could just excise and replace, and I’d love to just hear you repeat that.

Matthew: Yeah. So, that’s absolutely the case. So you are really looking for places where there is focal damage before it gets fully diffused. Now, obviously when an organ is close to full collapse, damage is probably so widespread that it would be difficult to select individual portions to resect and replace with replacement tissue. But there are many, many steps before that.

So you will see clearly in imaging, damage to heart tissue in ischemic cardiomyopathy, and also all the other indications that I mentioned, with a couple of exceptions. So you should be able to locate focal lesions, focal damage that you could potentially treat with this replacement strategy—maybe across more than one site, right? We’re not saying it’s necessarily one site per organ, although within the heart I think it probably would be one site, but in the kidney it’s probably multiple sites. But you would intervene early enough that it becomes a tractable problem.

Abhi: And for the example of left ventricular damage... those are pacemaker cells, correct?

Matthew: Uh, so left ventricular tissue is actually not pacemaker cells. Left ventricular tissue is the tissue that is very often damaged in a myocardial infarction, in a heart attack. So what you get is reduced ejection fraction, so your heart stops pumping enough blood out of the ventricle. And that causes obviously a huge amount of health problems down the line, leading eventually to death.

Abhi: And the damage you would see is so visually apparent that a doctor could just see like black speckles or something and just extract that out?

Matthew: It could be visually apparent to a surgeon. It’s certainly apparent to all of the diagnostic tools currently used in modern surgery. One of the things that we wanted to do when trying to build a platform—because we are trying to pioneer tissue blocks, functional replacement tissue as a new modality—is require as few people to change what they do as possible. And so we’re always looking for ways that we can piggyback off existing reimbursement pathways, surgical workflows, et cetera.

So yeah, all the indications that we’re initially looking at have whole organ transplantation as a covered treatment. Because what you really want to do is intervene kind of much, much earlier. Just to make it super concrete: there were 48,000 organ transplants last year, give or take a few hundred. Of those, maybe 10,000 might have been heart transplants. You have 6.7 million people in the US with heart failure. So that’s anything from New York Heart Association category one all the way up to category four.

What we are proposing initially doing just in the heart case—and we have multiple tissue types we’re going after—is to intervene just before they would need a heart transplant. So it would be a deferral strategy for that specific product. We have other heart products we’re working on, including a pacemaker actually, which we can chat about. But for that product, what we’re trying to do is act as a bridge to prevent them from needing a heart transplant. So it’s either a deferral strategy for someone who needs more time, or it is an alternative strategy for someone who is not able to get a heart transplant—so either for comorbidities, for age, for adherence, or psychosocial reasons, which is one of the eligibility criteria that would prevent you from getting a heart transplant.

And then what you would do is you would kind of move earlier and earlier in the progression of the disease and much more minimally invasively. So all of these tissue products are delivered through surgical workflows that are considered to be minimally invasive. So in our case, it’s not a full thoracic—you’re not cutting open the sternum. So it’s much, much easier to slot into that existing surgical workflow. And it’s already reimbursed and there’s an anchor price. So you know how much a heart transplant is going to cost, which is 1.6 million in the US plus everything else. So you have, for the insurer, a really strong argument about “this is how much you should pay to defer that happening.” And it means that you get a product in the sort of low to mid six figures, which is important because you’re having to figure out how to manufacture this stuff at scale.

[00:10:34] Why not do simple stem/progenitor cell injections?

Abhi: I think the economics here are really crazy in the sense that you get access to this entire patient population that currently no one is really able to touch. I think that is something I will want to talk about a little bit later.

I think the craziest part about Polyphron is this extreme importance of tissue structure. Which I did not naively appreciate before talking to you. Before speaking to especially Fabio, I kind of had the belief of, well, why can’t we just squeeze in some progenitor or stem cells into the site of the tissue damage? Like excise the tissue, squeeze in the stem cells. The body will figure out how to work with it. Why doesn’t that work? Obviously it doesn’t seem to work, but *why* doesn’t it work?

Fabio: No, absolutely. It would be great if it worked. Unfortunately, it does not. And it has been tried, especially in the heart. So people have tried to inject cardiomyocytes into the infarcted area to see whether there was any regain on function. And this did not happen. And it really goes back to the fact that in nature, in vivo, cells really exist in a specific environment within which they can perform whatever function they’re supposed to perform—meaning they can proliferate, they can commit or differentiate into a specific lineage, they can grow, they can assemble, so on and so forth.

And this relation between the microenvironment and the architecture within which these cells grow, and the cells themselves, is something that is established throughout development. And of course, this is something that is lacking at the injury site after any type of traumatic event. Therefore, cells that are just injected there are not able to assemble properly, are not able to signal the microenvironment their presence properly, and they’re not able to learn what the microenvironment is telling them properly. And therefore they’re just basically unable to understand what to do. They’ll just be either washed away or they will start moving around and then they either die or they are killed by the organ.

And this is quite important because, again, one of the things that we really have at the core of the company is this idea that structure is the fundamental underpinning for any tissue engineering approach that has the potential to cure people.

Matthew: I would just kind of piggyback off that—Fabio mentioned the work that was done in cardiomyocytes. So structure is important for function; it is also incredibly important for safety. So particularly where you derive the function of the tissue from its structure—so anything that has signaling or conductivity—if you don’t have structured tissue, what you get is incredibly aberrant effects that are really, really damaging. So in the iPSC cardiomyocyte work that was done, you get arrhythmias because heart tissue is part of an electrical system. And likewise, if you try and inject excitatory neurons that aren’t in a proper structure, you get epilepsies. So it’s very, very important from a safety profile to have as close as possible to native in vivo morphology.

[00:13:51] Can organs repair themselves naturally?

Abhi: Is there any organ that, if damaged, will be able to repair itself to some reasonable degree? Like most of organ development happens while you’re an embryo... is there any ability for repair to happen after you’re born?

Matthew: I mean, the liver is probably the case of fairly persistent regenerative capacity. It is an extreme outlier. You just do not see that in the heart or the lung, for example, to nearly the same extent. So the liver is a very specific case. For what it’s worth, the liver is a really interesting example of an existence proof of not needing to replace the entire organ in order to get clinical benefit, because you have had successful liver segment transplants for a while. So that actually gave us sort of comfort that there’s not something that we would be missing by not having to fully recapitulate the entire organ before doing the transplant. There are also examples in the intestine, I think, and a couple other organ systems as well where segmental transplantation gives you a huge clinical uplift.

Fabio: So we... I think we will be talking about quite a few different axes of complexity today, but I think one of the most interesting ones is the regenerative potential of these different tissues. You can put them on a continuum going from, as what Matt was saying, from the liver to the heart. Now the interesting thing is that people noticed quite early that a few organs could actually be intervened on by having some type of structured framework scaffold with isolated cells coupled in. So much so that the first real applications of tissue engineering, which were probably in the 1980s, were all about getting specific plastics or biomaterials, filling them with cells and putting them in animals to see whether there was any regenerative potential.

And you know, these applications were called *Chimeric Morphogenesis*, just to remember this idea of trying to recapitulate what happens in development that leads to tissue formation. So all of these kind of small elements point to the fact that we can indeed regenerate organs. And I think the exciting thing and cool thing about tissue engineering is that we are not necessarily asking the organs themselves to regrow tissue, but we’re rather using engineered solutions to support regain of functionality.

Abhi: The primary thing I was trying to question was: on one hand, the heart is not able to repair an aorta by itself, so you need to go in and replace it. On the other hand, given the fact that sometimes you do see this—like the liver is able to partially repair itself—is there any pathway to being able to convince an organ, primarily through genetic or chemical means, to repair itself? Or is that just out of the picture? Is there no developmental pipeline that does that, or is it kind of unknown?

Matthew: I mean, it hasn’t worked. I can tell you that. I think that if it were to work, it is most likely to work in the liver, but unlikely to work in other organ systems where the niche of the damage is so fundamentally changed from the developmental program that it’s hard to know how you would kind of act on it in the way that you’ve just described reliably. So it’s somewhat of an unknown right now, but I can tell you it hasn’t worked.

Fabio: I would speculate there is some threshold below which the regenerative potential is not enough to actually bring back functionality. In many of the chronic inflammation indications that Matt was mentioning, and especially in their acute phases, there is necrosis happening. So the actual focal location in the organ dies, it stops working. Therefore, there is not much the body can do.

[00:18:21] What does “structure” actually mean in tissue engineering?

Abhi: That makes sense. And when we like vaguely gesture to “structure” and “tissue”, what does that actually mean? One axis is clearly that there exist multiple cells and multiple cell types in this environment. What other types of structure exist?

Fabio: This is quite important and actually it is so important that at Polyphron what we are doing is trying to establish metrics that can tell us how close our grafts are to in vivo structure. And the way we look at structure is really at the three-dimensional architecture and composition of the tissue. Where by this we mean we look at how different cell types are patterned within the tissue, how they locate themselves with respect to each other, and whether they have specific orientations or polarities—so where there are specific distributions of proteins that tell us what is up and what is down. And we can see that these elements—so polarity, multicellularity and cell composition, and of course also the layering and the geometry of the tissues—are something that happens across different organs and different tissues. Of course with different features, but macroscopically, we can identify these. And one of our hypotheses is that really we should try to recapitulate all of these single steps towards our goal of recapitulating tissue structure.

Abhi: Do you think there are dimensions of structure that are not well either currently unknown today, or not legible entirely and you need a model to encompass it all for you?

Fabio: As many things in biology, it really depends at the resolution level at which you look at tissue. We have had decent ways to measure tissue morphology or tissue structure for quite some time, either imaging-based or fluorescence-based. And what we’re learning with more and more powerful technologies such as electron microscopy, for example, is that we can really go down and look at the nanoscale organization of these tissues. Now one question is how relevant it is to understand all of these different scales to actually be able to recapitulate structure in the lab or manufacture it. But it is for sure true that we can see this continuum of complexity scales. And as for many complex systems, the macro features and behaviors we see really arise from this continuous scale of complexity.

[00:21:04] Why are skin and bone the only FDA-approved tissues today?

Abhi: There is an existing proof point today—beyond just an organ’s ability to regenerate or organ transplantation—that you can do this sort of fractional replacement, and it has only popped up in, as far as I can tell, three areas: skin, cartilage, and bone grafts. These all exist. There seem to be FDA-approved products in the market that do this. Why hasn’t the tissue engineering field moved beyond these three?

Matthew: So, I can give the very naive response, which is that those are easier to engineer for a couple of important reasons. One, skin is very thin, which means you don’t have to solve the vascularization problem of perfusing vasculature that you have to solve for thick tissue. I mean, something like cartilage as well often doesn’t have blood vessels, so there you aren’t having to solve the vascularization problem either. They’re metabolically less demanding tissues to produce as well. And you can do a lot of this work in thin 2D sheets, which is what some of the original skin work was done with.

Fabio: Following up on what Matt is saying, he has identified two more axes of complexity, which are metabolic demand—we don’t need vascularization, which is a bottleneck for any tissue engineering approach, and I’m sure we’ll discuss that. So we don’t need vascularization as much for these products. And also they have a relatively simple structure again. So there’s a relatively small number of cell types. They’re organized in a very, in a relatively simple configuration—so there are ways... it’s simple layers.

And for example, for bones... I actually had my first experience in tissue engineering in a bone graft production company. And the incredible thing about bones is that there is this mineral inorganic component that we can find elsewhere in nature. Bovine bone is exactly the same basically as human bone in terms of mechanical features. Corals can be used as bone graft, and they’re just the perfect scaffold to put in the body. There are maxillofacial applications, there are spinal cord applications for the bone component of course. And it is quite simple to insert them and have the body repopulate them and basically make them their own. So you know, in a way they were the low hanging fruits of tissue engineering. And now the challenge is kind of on us to go to the more complex structures.

[00:23:45] What exactly are tissue scaffolds?

Abhi: We’ve mentioned scaffolds a few times in this conversation. I probably should have asked this question earlier. What *is* a scaffold in the world of tissue engineering?

Fabio: So the scaffold is basically one of the key elements of tissue engineering. If we look at the field in general, you basically need three elements. And then every single approach combines these in different ways. These elements are: some type of isolated cells [whether iPSC derived or primary cells] , you need some type of bioactive factors that help make whatever you want to make, and then you need this kind of scaffold, which is basically the framework or the structure that cells need to develop, mature, and make the tissue you want.

Now, in tissue engineering, this scaffold is looked at as a structure that can be either a plastic or a biopolymer that can be transplanted, or it is a nature-derived material such as collagens. And you can imagine them as sponges or basically 3D porous matrices that you can use to seed cells in. You can use them to create gradients. And you can also use them to tune the mechanical and physical properties of the sponges so that cells receive very well-defined stimuli. And the hypothesis in the tissue engineering field has been: if we give the initial structure to the cells and then we let them do their thing, basically, they will remodel the scaffolds on their own. And then what we will get out at the end is the desired graft that might be then transplanted. And what people quickly realize is that unfortunately, this is too much of an artificial setup for the cells. So they will not be able to actually go and remodel and restructure these scaffolds. They will kind of go midway, and the product will not be as effective as a real graft might be.

Abhi: What do you mean when you say the scaffold is “too artificial”? Like what does artificial concretely mean?

Fabio: So in this case, what I mean is that what happens naturally in vivo—once again looking at development—is that the scaffold [which in this case for natural tissues means the extracellular matrix and the environment within which cells grow and proliferate] evolves and changes together with the cells that are developing and committing to specific cell lineages.

Abhi: So it’s not just secreted during morphogenesis and then populated by the cells and it stays static?

Fabio: No, it actually changes throughout development. So the physical properties, the stiffness of the scaffold changes because it has to support different cell types and different functions. And this change will be impacting developing cells, but will also be impacted by developing cells. And what we are seeing and what we hypothesize is that this complexity is really a process that basically reaches equilibrium through these different steps in vivo. While what I was describing earlier of this artificial setup where we give the scaffold from the outside and hope the cells will grow inside, is a very kind of non-natural setup where we’re trying to define complexity from the top down and not having it grow and stabilize on its own.

And one of the things we’re trying to do at Polyphron, or that we’re actually building with our technology, is really a way to allow cells to create their own three dimensional microenvironment, their own scaffold, and therefore also the structures that are relevant for function.

[00:27:52] Why are organoids a “dead end” for this field?

Abhi: And I think gesturing back to the current FDA approved products, most of the way that those worked is like bioprinting—layering on one layer of cells at a time works well for those particular cell types. Obviously it doesn’t scale to more complicated ones. One of the other approaches people seem to be working on is organoids, and Matthew has not positive opinions about organoids with tissue engineering, and I’d love to get your take on them.

Matthew: Uh, so I think that organoids are a dead end for therapeutics. I think that as a strategy, it is just not gonna lead to meaningful therapies that could bring the need for organ transplantation to an end. I think that there are definitely some useful drug screening use cases with organoids, but for a bunch of reasons they lack the complexity and in vivo structure that you would need to get any of these functional restoration effects that we think that we need.

Abhi: But you do see the organoids are willing to like mangle themselves into some sort of structure. And so you do get something that’s clearly better than single cell replacement. Why is that not enough? Like where does that start falling apart?

Fabio: Yeah, so you are right in saying that what we’re seeing with organoids is some type of self-assembling behavior. Um, and this is mainly dictated by trying to intervene on typically pluripotent stem cells in a way that simulates how different cell lineages come to be during development. The, there are multiple issues though, with this approach. Um, the first one is that, once again, development really is successful because cells develop not in a vacuum, but in a very specific microenvironment. And this is not recapitulated in typical organoid cultures. Cells will grow in collagen or in some type of extracellular matrix, but they will not be receiving the chemical and especially the mechanical stimuli that cells need to create structure in vivo.

Secondly, what you typically see in organoids is that due to this self-assembling behavior, you will see sporadic structures arising. And here with structure, I mean micro features that resemble natural structure. What we’re missing though is the micro features that kind of put all of these smaller components together. So I can give an example. If you take for example, kidney organoids, you will see within an organoid some cells that make glomeruli-like structures, some other features that will make renal tubule-like structures. But we will be missing the union between structure one and structure two. And this of course, is quite important for what we want to put in in vivo because we need that graft to be able to accomplish its function.

And this is not always possible, actually, it is not possible with organoids. Um, and one more complexity is that organoid cultures are still an intrinsically stochastic process. And of course this puts quite some limitation in terms of approval and in terms of manufacturing. So one of the things that we’re thinking quite often at Polyphron is how to make the whole process of production of replacement tissues as robust as possible so that it can be, you know, a proper technology that can scale.

Matthew: Sorry... I was just going to—maybe we’ll come to some of the commercial manufacturing challenges later, but one of the things I also wanted to point out that I think is unusual about the way that we are designing the various loops that will allow us to build functional tissue units is that we are taking cost into account as part of the cost function of the overall loop. So we’ll talk about like how we try and recapitulate morphogenesis, I’m sure. But one of the things we’re really trying to do as well is to select the cheapest way down the mountain. So we take into account the price of reagents, we’re optimizing the pathways to recapitulate morphogenesis because the last thing we want is to pull off this kind of technical miracle and have a commercially non-viable product at the end of it. So we’re like trying to build in manufacturing COGS viability even in our initial ML approach.

Abhi: Have potentially organoids... like do they not work at all? Has it ever been successfully—or like ever a transplantation has ever happened and it just didn’t take? Like the native functionality was not restored, or has it still never been tried?

Fabio: There are examples in rats, specifically in the intestine where there seems to be integration. Restoration of function though is, has not been proven yet. Okay. So these, these organoids are recognized as self, they’re integrated, but you know, there is no real restoration of the functions that they’re supposed to carry.

Abhi: Yeah. So it sounds like there’s multiple hard problems. You first need to grow the tissue in the first place. And then the second hard problem is you need the body to be willing to accept that tissue and to integrate into the rest of the body. Is that connected to the structure problem or is that an independent thing?

Fabio: No, it is fundamentally connected. I would actually say it is... you know, structure is the underpinning element to being recognized and integrated properly. And this is because if you think of organs, they work as this extremely well integrated kind of setup. And the question is what is the smaller functional unit we can use that can be recognized, connect and basically restore function. And we believe that, you know, in order for this to happen, the graft should be as similar as possible to what was lost. In terms of structure, in terms of cellular composition. And this should ease the way that the body integrates and makes the graft its own.

[00:35:08] The argument for recapitulating developmental biology

Abhi: Okay. So it seems like bioprinting is too simple for more complicated things. Decellularized scaffolds are also potentially too simple to do the most complicated things. Like scaling up organ transplantation via xenotransplantation is both super technically risky and also... you don’t wanna do organ transplants for everyone. What is the way out of this conundrum that you’ve set up where every approach is either too simplistic or too complicated?

Matthew: Our approach...

Abhi: What is it?!

Matthew: It’s to make much smaller functional units of tissue that are recognized as self and integrate and restore function. So to give you a rough sense of the order of magnitude, these are tissue chunks in the sort of centimeter-cubed volume or less, right? We’re not trying to build entire hearts in terms of biomass. But yeah, so that’s the way out of the conundrum. You have to be able to do it in a repeatable way with exceptionally low variability. You need to be able to control COGS. And ideally you should try and get this as much clinical effect as you possibly can with the smallest unit of tissue, because the less biomass you have to produce, the cheaper it’s going to be. Whereas we think the pricing will probably stay where it is, because there are these anchor prices for transplantation, for assisted devices.

Abhi: But it seems like the core tenant is almost like: if everything else is too simple to recapitulate developmental biology, and now your answer is basically “let’s just recapitulate, let’s just do developmental biology straight up.”

Matthew: So if I had to kind of sum up like one of the precepts of the company, it’s that there is an engineering system that has already produced functional human tissue—and that’s human development. And why don’t we try and recapitulate that as much as possible? And that over the past few years, the data sets that allow us to have at least a fuzzy starting prior of what development does have come online and become available. So these are developmental atlases that are often multi-omics based or including increasingly spatial transcriptomics. And so the prior kind of set of tissue engineering approaches involve this highly mechanistic understanding where you’re trying to smooth out the complexity of what’s happening in development to fit it in the brains of the scientists that you have working on the problem and to kind of fit it experimentally.

Um, our view was that now the data sets are rich enough and wide enough that you can start throwing them at some of the exciting new architectures that we are seeing and have models learn latent rules of morphogenesis in a way that doesn’t need to be legible to a human. This obviously needs to be very, very tightly paired with wet lab validation, which is something that we are super explicit about. We have a kind of a closed loop between the developmental references—which are kind of our fuzzy priors—and what’s being tried and validated in the wet lab in this loop.

But our view is that now there’s a plausible path to us being able to start from what happens in development, potentially find alternative pathways to achieve the same goal—which is like super, super exciting—and eventually end up with a functional unit which is similar to what nature produces. And one advantage, and I’m sure we’ll talk about kind of like model architecture and some choices that we’ve made there, is that we have a strong hypothesis that there will be like an ultimate latent logic of development that models will learn when they see more and more tissues. And there are kind of already hints of that in some of our experimental data, that you might be able to kind of transfer things across tissues. We don’t yet know, I should be clear, whether, you know, you only need to do three tissues and then you can do everything. Or whether it’s you can do 10 and therefore you can do 20. But we do fully expect there to be transfer learning across tissues as we go.

Abhi: I think the pan-tissue aspect of Polyphron is something I really wanna talk about because I think it’s one of those crazier...

Matthew: It’s kind of insane. But actually I think it makes sense because the human body doesn’t produce a liver separate from a kidney, right? The human body is an engineering system which is holistic and comprehensive. Um, and therefore you would expect there to be redundancies and kind of the same techniques across different tissue types. And it’s very, very important for us that the space of interventions that you can use to manipulate morphogenesis is bounded. It is finite. And in natural development it is by definition.

[00:40:28] Walk us through the Polyphron experimental loop

Abhi: To look at Polyphron’s experimental loop with more of a concrete lens... Like what do you start... like you have a box of like collagen or something, you start as the existing scaffold. You seed that with induced pluripotent stem cells, iPSCs. What’s the next step? Like, let’s say you’re trying to produce some functional heart tissue. What would you do next?

Fabio: Yeah, so the first step really is to look at developmental atlases, where we are looking at single cell atlases, spatial and transcriptomic atlases of the developing human heart. And this allows us to first understand which developmental time points have been sampled. And, you know, this then dictates what type of dynamics and what type of lineages we can try to recapitulate in vitro. We then use different types of computational approaches to mine these high dimensional data sets and extract temporal trajectories and dynamics that we care about—these being specific lineages, when they arise and when they commit, or specific microenvironmental interventions or perturbations.

And then we move to our in vitro setup where we have, as you’re saying, these kind of three dimensional boxes within which we can use different types of extracellular matrices depending on what microenvironment—or what developmental microenvironment rather—we might want to try to simulate. And we then seed these scaffolds with different types of either progenitors, pluripotent, or committed cells, depending again on which type of cell type we want to recapitulate, which type of structure we might want to achieve. And what happens then once we have our cells in this kind of 3D box, is that we can start perturbing them using the same kind of perturbations that we have learned are effective during development based on our atlases.

Abhi: But the developmental atlases are, as far as I know, telling you like the ligands that it sees on like day 15 of heart development. How do you relate that back to like a causal relationship that like these ligands caused—like was essential to day 15 of heart development?

Fabio: So, couple of things. What basically state-of-the-art computational biology approaches allow you to do right now is to go from a discreet sampling of a developmental trajectory to a continuous trajectory. So you can really start to see kind of continuous dynamics, whether there are peaks, valleys, whether there is a steady state at one point in development. And this then tells us basically which molecules to apply when. But one important thing to your point is that we really do not care about understanding the relationship. Because what we want to do is to just define the broadest set of interventions that might matter for the structure we want to recapitulate. See how those perform on our cultures and then optimize based on that. So that is what Matt was saying: our atlases just become our basically first and initial prior and then we quickly move into the lab, we start generating relevant data, and then we optimize on this data only so that we can see how different interventions when combined in a certain way, give us certain structures that we can then optimize on, select for, et cetera.

Abhi: So you have like a box of scaffold with cells on top of it. You have this developmental atlas that tells you like at each of these time points what ligands was noticed in that developmental environment. And you sample those and apply them to your Polyphron sample and just see how well it recapitulates like the native tissue.

Fabio: That is correct.

Like, what you can do potentially is actually do this at a whole transcriptome scale. You don’t necessarily need to focus on ligands. We focus on ligands because again, we have this quite strong hypothesis that the microenvironment is what matters. Not only cell intrinsic transcription factor related dynamics. But yeah, the other advantage really with ligands is that you can get small molecules for them to simulate their activity oftentimes. So it’s okay to actually go out buy them and then apply them to our experimental setup.

Abhi: My impression is that there are just like thousands upon thousands of small molecules going on inside an embryo while it’s developing. Do you guys have the ability to also put in thousands upon thousands of small molecules in your tissue sample? Or is it like you picked up like a dozen?

Fabio: Yeah. Okay. So, let me first say what happens in vivo and then there’s kind of a jump to what we do in the lab. But spoiler, the jump is mainly due to our current constraints in terms of teams and instrumentation. Um, but what happens in development is that there is actually—and also Matt was referring to this—there is quite a finite space of molecules and pathways that is activated at specific time points to get the, basically to get cells through morphogenesis. And the other interesting thing is that there is a lot of redundancy. So there are parallel pathways where one might be active, the other one might not be active. And all of this basically becomes a quite constrained space to start from.

What we then do is to pick from this list of molecules and proteins that are activated, the ones that we can easily source. The ones that we can cheaply source. And the ones that actually have quite a clear MOA [mechanism of action] , so that we can at least predict what we are really perturbing in vitro. Let us say that this brings us to a hundred molecules, we can create specific sets of combinations from these hundred molecules and then use them to perturb our cultures. Um, what we want to do moving forward is to scale up in terms of automation. And, you know, the more we have robots that allows us to make more and more complex interventions, the more we can explore different regions of this space. For now, we’re limited in that as we are doing this kind of semi manually. But the idea is to potentially browse the space using automation and robots.

[00:47:56] Can you simulate morphogenesis with only small molecules?

Abhi: My impression is that while morphogenesis is going on, there’s a lot more going on than just small molecules alone. There’s electrical fields, there’s mechanical forces. Like right now, are you just thinking “well, small molecules get us 80% of the way there, we’ll deal with the other 20% later”? Or what are your thoughts on the subject?

Fabio: Yeah, so it is exactly as you’re saying. Um, and we were discussing this earlier also when we were talking about organoids... what really pushes tissues across the line in terms of functionality and maturity is something that goes beyond chemical perturbations—with this being either mechanical stimuli or electrical stimuli. And this actually has been proven where, for example, to get mature cardiomyocyte fibers, you need basically periodic electrical or mechanical stimuli that can basically bring your tissues to function.

We are fully aware of that, and this is something we are taking into account for our next generation of experimental setup where we will try to integrate chemical perturbations and mechanical and electrical ones. What we can do for now is to push as much as we can with small molecules and also be smart about the way we design our extracellular matrix. So one of the cool things with these kind of gels or plastics is that you can play with their chemical structures or you can kind of embed them with different molecules so that their chemical or physical features change. And by putting together different types of molecules with different type of ECMs, we’re actually able to find proxies for most of the knobs that one might want to tune during... while growing a tissue graft.

[00:49:49] How large is the set of possible tissue scaffolds?

Abhi: I can vaguely understand like, oh, there are this universe of small molecules that happen in developmental biology. Let’s just recreate those for ours. For ECMs, how much do you need to stick to the world of natural things versus explore that into novel chemical territory?

Fabio: Yeah. There’s a trade off and a fine line to thread there in the sense that natural ECMs are better used to culture cells in, so it is easier to culture and grow cells in these natural ECMs. They recognize, you know, again, the familiar kind of microenvironment and they will be happy basically, and grow. The kind of other side of the metal is that you even do not have the ability to really fine tune the features you might want to get as you might have for example, with biopolymers or any type of, again, plastic that can be used. Therefore, for us it has been easier for now to use natural derived extracellular matrices. It might be that at one point in the future we start playing with biomaterials, which I find as a very, very interesting venue or direction of research and application. But for us, natural extracellular matrices have worked quite well. We can now control and engineer them quite well.

Abhi: How large is like the universe of natural ECMs? Are there like a flat dozen or like... hundreds?

Fabio: Yeah, so there are kind of macro categories or buckets, but then there is a plethora of modifications you can apply. So you can take different types of collagens, you can take different types of any type of extracellular matrix that you know is present in other tissues. And then you can start tuning it in terms of how cross-linked it is. And this will determine the physical qualities. You can start again, embedding different types of molecules. You can have different types of porosity of the matrices. And all of this moves on different continuums, right? So you can potentially tune it for as long as you want, and to the resolution you want.

[00:52:32] How reliable are developmental atlases?

Abhi: With regards to like... you’re treating the developmental cell atlases as almost like a ground truth of the real system. I’m curious as to how trustable are those? How heterogeneous are they amongst different embryos? Like do you have one golden standard data set that you can derive everything from, or do you need to average across like hundreds of these?

Fabio: Yeah. So, just a few words on how these data sets come to be. So for the most studied tissues, such as the brain or the cortex, really, there is quite a lot of public available data out there in terms of single cell developmental data sets. So what is possible to do is to compile all of these data sets together and create what is called a gigantic or very large atlas where different samples come from different studies, and of course you will have multiple individuals from every study, right? So you have a multi-donor, multi study starting atlas that gives you some confidence that you can actually—that you’re actually capturing enough heterogeneity and cell types for whatever your goal is. And of course this is also function of the actual number of cells in the dataset, due to single cell technology themselves. But also, you know, due to basically random sampling, you will have an overrepresentation of some cell types versus others. So the hope is that by accumulating again, different data sets and different donors, you’ll be able to have a good representation of the cell population of the tissue of interest.

Unfortunately, this is not the case for every tissue. And incredibly, there are some tissues that are quite important in terms of human disease, such as the kidney, for which there is not much data out there. Other tissues such as the heart... we’re somewhere in between the cortex and the kidney in terms of representation. So a lot right now comes in identifying which data sets we can use and how to integrate them properly in our kind of technology.

Matthew: I’ll just add that as a company, we’ve signed a partnership agreement with, I think it’s one of only two places that you can get developmental tissue as a commercial entity. It’s extremely hard to get it. And we essentially have tissue that arrived maybe a week ago. It was an extraordinary freight process. Um, so there’s the possibility that for some of these tissue types we’d want to tackle like the kidney—because CKD by itself would be a mega blockbuster product as a tissue construct—we may want to create our own analysis.

Abhi: Is the usual process for creating these developmental atlases... like you get an embryo at like day 30 of development and then you sequence stuff from it? Or do you sequence continuously while the embryo is developing?

Fabio: No, so these assays are destructive. You get one sample per developing tissue. I mean, you can get couple of samples if the tissue is big enough, but once it is sequenced, it is over. Um, so when you are really looking at different time points, every time point is typically actually from a different donor. Also because, you know, of course developmental tissues come under very, very tight regulation and control. So it’s not very easy to source them. And then there are also like quite brutal tissues to handle. So there are specific sequencing protocols one needs to apply. And there are consortia out there that have optimized the whole process. Uh, and yeah, the idea is you get them, you isolate the cells, you in some way, you sequence them, and then you have these kind of gigantic tables that you have somehow to make sense out of.

[00:56:45] What is the machine learning model actually optimizing for?

Abhi: And so like I... we haven’t actually... like we’ve just discussed a lot on the data collection problem here, where you have like... you pick from many ligands through the days of trying to recapitulate the developmental process. Is the machine learning task you’re trying to solve selecting like the minimum number of ligands you need to reconstruct the native tissue? Is that largely the primary problem?

Fabio: Actually it is quite the opposite of the limit in the sense that what we might want to do if we had infinite manpower and infinite automation would be to try every possible combination of ligands. What we must use the developmental reference for is to go from all the potential ligands in the human genome to the set of ligands that matters for that cell type in that tissue. Right? So there’s already like a funnel there happening. And then from there we don’t really care of understanding what each ligand does, but it is rather: can we try them in our cultures and see how they perform? And this, in my opinion, is quite important because what we want to do is to extend our protocols to different cell lines, for example, right? And different iPSCs. And that has been proven in different settings—we respond differently to the same ligands. So the idea is: can you build enough redundancy in your set of interventions to take into account, for example, donor variability or interpersonal differences?

Abhi: So is it fair... actually, if a model is trying to help you decide which ligands I should be... what ligands should I introduce at this time point for this specific tissue to lead to this final outcome... What is this model actually trained on in terms of the labels? Like what is the final readout of this whole tissue creation process?

Fabio: Yeah. So there are two things that have to be clear here. On the one end we have the reference developmental atlas that is only used to create this list of perturbations and maybe the order. And after that we move it to the lab, right? We move to generating our own data from these 3D structures and boxes that we were discussing earlier. And this is where the actual optimization and model training happens. So what we do is we have ways to non-destructively monitor what happens in our cultures. Right now we are looking at microscopy mainly, but the idea is to extend this to other modalities to get more and more complex readouts.

We semi-continuously basically take pictures and videos of our cultures, and then we use these images and this data to build basically a digital twin of our experiments. We look at embeddings we see how different cultures and cells are growing. We see how these differences are linked to different ligands. And then we define one direction that matters for us. So one basically cost function that we want to minimize to get to the tissue of interest.

How do we do that? I think this is one of the most innovative approaches in our platform. In order to identify what we want to maximize or minimize, we have to first identify what feature of natural tissue we want to recapitulate. Once we do that, let us take for example, the heart. As we have been discussing the heart for quite a few times today. We take the heart and we want to try to recapitulate fiber orientation and alignment. We quantify what these measures look like in a mature, healthy human heart. That becomes our quantity we want to get as close as possible to. And then we optimize with an active learning setup what happens in our culture so that whatever chip or whatever combination of interventions brings us closer and closer to this quantity of interest.

Abhi: Is the cost function or the loss or whatever the model is optimizing for... is it usually right now, as of today, like a single metric of interest? And in the future you’ll extend to multiple things, but for now it’s just a single metric?

Fabio: Uh, so it is a single... yeah. A single structural feature.

Abhi: Kind of relatedly, we... this is something I guess like the conversation didn’t naturally lead to, but I think it was fascinating enough that I wanna divert back around to it: in the limit case, you can imagine that whatever Polyphron comes up with will create the natural end result that developmental biology does, but go about it in a way that’s potentially more compressed and cheaper than it is in the real world...

Matthew: We hope.

Abhi: Are there existing proof points that this is indeed possible? Like you can have a model that instead of these tens of thousands of ligands, it compresses down to like 50 that do most of the work?

Matthew: I mean, iPSC differentiation protocols, this probably is like an existence proof. Transdifferentiation... these are like non-developmental pathways that get you to a cardiomyocyte that on qPCR and sequencing looks like a cardiomyocyte. So that’s a pretty strong existence proof in our view.

Abhi: And with regards to the experimental loop, how long does each experimental loop take? And is it the sort of thing where you need to like... like each one costs a million dollars, you need to really think about it each time before you go in? Or you can kind of just throw it and see what comes out?

Fabio: Yeah, so, um, what we’ve been trying to do so far—I think I should preface this—is to again, recapitulate one specific feature of structure, which is polarity. Okay. So what we set out to do to de-risk our platform was to say, okay, can we control polarity across different cell types? Once we have identified polarity, we can then say... we can first decide which tissues to try. And then we can basically define the set of ligands and the set of differentiation protocols that allows us to get to this, to basically to try to control this cell type for the feature of interest.

This is where the time comes in. Depending on which tissue we’re looking at and what polarity looks like for that tissue, the time for one experimental loop, one experimental round might change. Um, what we’re actually seeing is that it is pretty fast. Okay. So, like we are running two programs right now. Uh, we’ll be publishing about them, but one is in the cortex, one is in the Heart with cardiomyocytes. And what we’re seeing is that for both of them, we can run hundreds of experiments really in the span of one week. And within this week we will basically try the first pass of our developmental inspired interventions.

[01:04:04] Polyphron’s first big tissue engineering result: polarity

Abhi: Is that... I think this actually lends naturally well to the next question of what is the first interesting slash promising result that Polyphron is willing to share? And it sounds like it is along the lines of this polarity thing.

Matthew: Yeah. So I mean, to put some more numbers on it. So we started with the cortex as our first program. We have a cortical program, a cardiac program, and then potentially have other couple programs, which we’re not kind of being public with right now. But we started with the cortex as the sort of proof of concept in part ‘cause of data availability. It is the most mapped tissue type, like really, really beautiful deep developmental atlases. So if you wanted to kind of prove that that could be your sole prior, it’s a good place to start.

And what we did was we identified a key feature of native morphology in the cortex, both developing an adult, which is neurite orientation. So there are these things called neurites, exciting neurons, that in the cortex have a polarity, like an angle relative to the apical surface of the developing cortex or the adult cortex, which is about 90 degrees. So it looks like it’s kind of beautiful row of neurons. Now it’s a specific subtype of neuron. It’s not all neurons in that kind of section of tissue. So if you were ever going to try and recapitulate cortical tissue, for example—which we actually don’t think is a good initial therapeutic, which we can discuss later, but it’s a very, very good proof of concept for the platform—if you ever wanted to produce cortical tissue, you need to be able to have those neurites have be 90 degrees to the—give or take five degrees—to the apical surface. And just those neuronal subtypes.

And so what we did is we basically created a developmental atlas out of all the available public data. And it was a 10 week period. The total experiment took 10 weeks. It was three loops. We went from a starting orientation, which is measured in an angle. So the, you know, in vivo is 90 degrees. An organoid—going back to our favorite approach—gets you about 45 degrees on average. So the neurites are random, but it kind of averages out to 45 degrees. And we took it from 45 degrees to 82.2 degrees, which is damn near close to in vivo morphology in a three iteration loop that took 10 weeks. Um, we’re extending that right now to the heart. That experiment is ongoing. It looks like we will have sped up experimentally, which is good. Like one of the things that we’re trying to do here is to make it easier and cheaper to onboard each incremental tissue.

Abhi: Is polarity a pretty important phenomenon in a lot of tissues beyond like... it seems like the brain and the heart? Is it important in a lot of other tissues besides that?

Matthew: I mean, it’s important in all tissues, but it’s definitely like the polarity of specific cell types is super important to any tissue that has like an outside and an inside, for example, any tissue that is conducting a signal, be it electrical or otherwise. It’s kind of most of them. And it’s also one of the first macro features of tissue-ness that emerges during a kind of a classic developmental pathway. It’s like one of the first things that’s laid down in development is figuring out... like, development, you’re just this one long tube and you need to figure out which way is up and which way is down. Like, that’s one of the first things that is done. So it made sense to start there for a bunch of reasons.

Abhi: And how long did it take? So like you had scaffold, you seeded with these neuronal subtypes. Eventually polarity emerged after experimental iteration. What was that experimental iteration process like in terms of time? In terms of I guess cycles? Or is there like a single foundation model at Polyphron that decides all ligands, or is it all like you have a new model for each new experimental loop?

Fabio: Yeah. So... so. Keep in mind this was our first program, so there was no model before this one. But the whole idea is, we train our first—let’s call it V zero model—after the end of the first iteration of the neuronal program. And this was a model based on imaging data that was supposed in a self supervised way to learn features across all of our experiments. We then use this model to basically dictate what the next set of interventions might be to optimize for polarity. This then led us to round number two. And then the same happened between round two and round number three, which was our final round. And that’s where we got to. So we started from 45 median round one to 82 max in iteration three. This all happened in 10 weeks. Of course, we are basically still working a relatively low data regime, so we’re not using most cutting edge type of architectures approaches. But one of the very cool things about the way the active learning field is moving is that these are relatively non data hungry approaches. So they’re really effective even if they do not see very, very vast amount of data.

Matthew: I just wanna add something about why starting with polarity is both... so I think we’ve covered why it’s kind of useful from a technical perspective. I think we always have our eye on the clinic. And so something else that we considered as well is: are there tissue types and cell types within that tissue where solving polarity gives you a huge clinical unlock that was otherwise not available? And so that’s why we have a cardiac program. Because in myocardium, a couple things are interesting about myocardium. One is that you have to have like the right alignment. And if you have incorrect alignment, you have arrhythmias. Also the contractile tissue has this kind of helical arrangement, which is kinda interesting. So most of the approaches in cell replacement for heart failure and also some of the engineered muscle patches have not successfully solved this alignment problem. And we believe that... we actually will have a number of advantages relative to those approaches beyond solving alignment. But we believe that if we can solve the alignment problem we’ll have a much, much better safety profile. So even though it’s like the first element of tissueness, just solving that gets you something that is potentially clinically transformative.

Abhi: Sorry, I may be bit confused. Alignment is equivalent to polarity here?

Matthew: Uh, in this case, alignment is a sub feature of polarity. Polarity is like a broad category of directionality.

Abhi: Is polarity the sort of thing where getting like near native polarity after three experimental cycles just feels like crazy given how large the initial search base is? Is it that you suspect polarity is like a pretty low dimensional thing? Because the way that I’m imagining is like first experimental loop, the model gets like... okay, it seems like the model’s getting three data points in total. That’s a lot of extrapolation.

Matthew: Well, I should point out that this is being done in a high throughput chip.

Abhi: Oh, so this is not like you apply a bunch of perturbations, you get a single readout at the end?

Matthew: Sorry. No, no, no. This is like a relatively high throughput. Right now it’s a microfluidic system. We’re gonna build our own slightly more than microfluidic, like meso-fluidic system. So the model at each round is seeing like per plate, there are what? 40. And we do it in duplicate or triplicate. So you’re not just seeing one per round. It is a significant compression. I think we worked out the total combination set of all the ligands and conditions, et cetera, like cell density, ECM, was like 1.7 million or something. And in total, the model probably saw like 90 different experimental conditions. Less than 99%.

Abhi: So when you refer to three experimental loops, what does the three refer to?

Fabio: Yeah, absolutely. So imagine that taking the mental framework of what we are doing. We start from a developmental atlas. We have this list of molecules that we might care about. And then we define random sets of these interventions at the very beginning. So we start from a hundred different molecules. We want to perturb ourselves with three molecules at a time. And then we define all the potential triplets from this list of a hundred ligands, right? We have our microfluidic high throughput set up that allows to try as many of these triplets in parallel as possible. Taking however many plates it takes. And this gives us many, many data points from which we can learn which interventions are more efficacious and which interventions are less or even deadly for the cells. And that’s what we then use to optimize. And that’s why the active learning setup is very useful, because not only it will tell us which triplets that it has seen are interesting, but also it will predict which unseen triplets might be very cool to try. There of course is this play between exploitation and exploration. But you know, all in all, what we see is that we can get enough triplets to go to round two. We already see an improvement in round two, and then we can have further improvement going to the one additional round, which is round three.

Abhi: Or for the case of polarity, is that like a single step perturbation in that like you’re not doing like one set of perturbations and then tomorrow you’re doing another set of perturbations?

Fabio: So right now it is, we’re looking at one time point. And then this time point, it’s one set of perturbations that lasts a few days. In the future, you know, the more complex the structure we will have to recapitulate is, the more complex this kind of protocol will be. So we’ll have different interventions at different time points. And we are already playing with this a little, but the idea is for now was, okay, let us see if the active learning setup and the closed loop system can actually work.

[01:15:33] What comes after polarity?

Abhi: And so you mentioned that one of the reasons you opted for polarity is that like alone polarity is like cool. It’s almost like sufficiently MVP to some capacity. What is the second lowest hanging fruit that you would wanna optimize for after polarity?

Fabio: So there are three things that we mentioned that I discussed at the very beginning that we believe are structure. One is polarity, two is multicellularity, and three is basically reaching the size and the shape you want to achieve for the graft to be clinically meaningful. And that involves vascularization. But our next step will for sure be multicellularity. Again, these things will not happen sequentially, right? We will optimize polarity first and multicellularity... they will rather happen altogether, so they will be optimized as one system, but I think that conceptually it makes sense to think at them as three kind of things we need to care about. So yeah, we’ll basically start adding multiple cell types and seeing: can we preserve the polarity structure we defined in our first programs while having multiple cell types that interact with each other?

Abhi: Do you imagine like... at what point will you need to move into the realm of like multiple time points? Is it kind of like unclear?

Fabio: Right now it’s happening. So as soon as you go above polarity and really... for some tissues, we’re already past that. Just for polarity, you need to have multiple, multiple time points.

Matthew: Lots of robot arms.

[01:17:09] Why is vascularization the hardest problem of tissue engineering?

Abhi: No, automation seems like pretty essential for this. It’s a very interesting direction and cool initial result. And you mentioned vascularization as the final thing that needs to be done for any tissue engineering company to eventually take off. And I think while I was researching for this, like, it just seems like everyone is like talking about like vascularization is like a fundamentally unsolved problem in the tissue engineering field. Why is it so hard?

Fabio: Yeah. So let me go once again back to our reference, which is development, human development. Vascularization is an incredible system and the way it arises throughout development is incredible because as you can imagine, every developing organ needs nutrients. So as soon as the first stage of development is over when basically diffusion is enough, organs and tissues need vascularization. So it is incredibly complex. It has very specific phases that it uses to first define kind of the general vascular framework of the body, and then to generate all the capillaries and this incredible dense network. And once again, we have not been able so far to recapitulate this complexity or to trace this complexity properly in the lab. So all the approaches that have been tried so far have been quite simplistic and reductionist. So we were not able to really achieve the vascular complexity needed to feed growing tissue grafts and make them, you know, and bring them to the necessary size and shape.

Interesting... what has been happening in the space is that people have started understanding how to use different approaches to have vessels grow and how to engineer them. Where the key insight has been: we have to have a clear starting point and we have to have a final point towards which the vessels can grow, right? We have basically have some attractor that the cells can use to point at. Our insight is, once again, you cannot use or recreate this complexity artificially or top down. You have to grow it. So our approach is as we are growing different tissue structures by tracing development similarly, we are also trying to grow vessels in these three dimensional boxes full of extracellular matrix of some type. And again, the way this will happen is that we will have computational models that try to drive vascularization and optimize for different features. And this is important because one of the very cool things about vascularization is that it varies across different organs. So the brain with a blood brain barrier needs a very specific type of vascular network. The heart, a different one. Kidney, pancreas, liver, different ones again. So there is a lot of optimization to be done there as well.

Matthew: And we have a vascularization program underway. Like, we know it’s a showstopper. We’re working on it. It’s not an afterthought.

[01:20:33] Why can’t you just wash angiogenesis factors over the tissue?

Abhi: I spiritually get why you guys want to just like mirror developmental biology. ‘cause that’s kinda like the thesis of Polyphron as a company. Why, why, like, naively, why can’t you just like seed the whole thing with endothelial cells at the very beginning and wash over angiogenesis factors to create the vessels? Like why doesn’t the naive solution work?

Fabio: Yeah. So the problem is that cells need to interact in a very specific way in order to create the structure first and then to gain functionality. If the first steps are skipped or not properly recapitulated, you will not be able to obtain a functional network by the end. That’s why the majority of vascularization approaches first implies some type of kind of blob of endothelial cells that need to exist. And then from this blob, you know, the vessels will start to arise and kind of diffuse throughout the growing graft.

Abhi: Is there like any world in which you grow the vessel separately and then you can join them back in?

Fabio: It is extremely difficult. Imagine that for many tissues there is basically one capillary per cell.

Abhi: I was not aware of the complexity. It’s not like flooding a rough neighborhood of cells...

Fabio: It’ll change from tissue to tissue, but the density you need to achieve is astounding. So the chances of being able to do this ex-graft and then plug this in, in a way I think are relatively low. Um, and again, it’s really difficult to reach equilibrium of a complex system by combining things. It’s just easier to have the features arise together with complexity.

[01:22:25] How does the graft integrate with the host’s blood supply?

Abhi: Makes sense. Um, and let’s say like you do solve this like grand challenge of the field. Um, you’re able to get vascularization working. You give it to a surgeon, they’re about to implant it into a patient. Once it’s in the patient, it’s not like integrated with the rest of the vascular system of that patient. How does that integration occur?

Fabio: Yeah, so there is a process that needs to happen and this regularly happens in the surgical rooms during organ transplant or any type of surgery, which is anastomosis—where the vessels in the entering body have to be connected to the existing vessels. One way is to do so is surgically. The other way is to try to exploit the natural way the body reacts to foreign bodies, which is basically by perfusing them with blood or with fluids having their own immune cells and cells kind of try to colonize the graft and try to find anchor points that can be used to make basically, um, integrate this, this addition.

Abhi: By anchor points... is that like a physical vein that they’ll attach?

Fabio: So it’ll be first cells and then it’ll be, you know, it’ll be either like some type of fibrotic tissue. I should preface, I’m kind of speculating here. It hasn’t been, I think, fully proven across grafts. But what I can tell you, for example, in bone replacement, this is exactly what happens. So the bone replacements are designed to have basically to be exposing anchor points that the body sees, recognizes. It latches onto, and this basically helps the body integrate the new graft. Right? So we can imagine something similar happening with our grafts, whether we have to insert these kind of anchor points—this might be like peptides, so nothing too problematic for the body—or we should find ways to stimulate angiogenesis, which of course is kind of tricky due to oncogenic potential. But you know, one way to approach the problem will be how can we exploit the way the body reacts and basically use it as a way in.

Abhi: I didn’t know about that bone thing. That’s interesting. Like, I know like skin doesn’t really need vascularization all that much. ‘cause it’s thin enough that like, diffusion just works fine. I didn’t know bone grafts even had like needed blood flow into it.

Fabio: They do. So imagine that a bone graft is basically again, this rigid sponge. And it’s immediately perfused. And then you have these tiny pores that kind of are exposed and cells will just pass by, latch onto them. There’s a small period of inflammation. And this then will have other host bone cells that colonize the graft and then start basically remodeling it and making it new bone from the host.

Abhi: This is just my own curiosity, but like, in bone grafts, do they also replace the stem cells within it, or is that ignored?

Fabio: No, no. You can just put the graft. This mineral kind of thing.

[01:25:45] How do you validate tissue function before implantation?

Abhi: Interesting. Um, okay. I wanna zoom out a bit, like more the future. And so like right now you don’t necessarily have functional tissue in like an absolute sense, but someday you will. When you get to that point, are there functional assays for this sort of thing that you can use to prove out that like, oh, this chunk of cardiovascular tissue is actually gonna be useful? Once I implant into someone, like, how do you prove that out in advance? Beyond like measuring polarity, the proteins are there, is there anything else?

Matthew: Yeah, I mean, so in vitro you have like cell identity sequencing, qPCR, all of that stuff. But you also have electrophysiology. You have functional readouts where you can measure the function of the tissue. So like contractility in heart tissue, pacing if it’s pacemaker type tissue. So that’s all the stuff that you have in vitro. And then obviously you have animal models. So in the heart you can’t really do functional readouts in rodents because the physiology is so different. Like the heartbeats are like an order of magnitude apart in terms of beats per minute. So it’s a very poor translational model for cardiac interventions, for most cardiac interventions for that reason. So you would probably do it in a pig. So there is a panel of fairly robust functional assays. You can do the in vitro level and then you would do a large animal study in a pig first before you put it in a human.

Abhi: I remember when I interviewed Hunter, the Until Labs guy, my last episode... he said that, “oh, well we’re good with functional assays because the organ transplant field has already like, figured out most of them.” Is that also true for the tissue side where like... I know that there are discrete tissues that are transplanted from like one person to another person. Are those metrics like pretty well flushed out?

Matthew: So, I mean, my understanding is that—and I’m not a complete expert in organ transplantation to the degree that Hunter must be by now—but my understanding is that there are pretty minimal assays that are done on those organs before they’re transplanted. So you’ll probably have like an ischemia time window check. You probably have like a biopsy, maybe you have some imaging. But these are basically being done in a helicopter, so your QC/QA is: “it’s human tissue and human tissue is good, so let’s put it in and this person’s gonna die otherwise.” So that obviously changes your risk profile.

Abhi: He did mention like the big advantage of cryopreservation was like, you get to do more testing right now.

Matthew: And actually it’s an equivalent advantage of being able to do these ex vivo grafts, which is that you get to do QC/QA like really, really deeply.

Abhi: Like as much as you want.

Matthew: And I think that’s gonna be a core advantage going forward.

[01:29:01] How do you design a clinical trial for a biological pacemaker?

Abhi: Makes sense. The other big question I have is like, this is a brand new therapeutic modality, in basically every fashion. And so how do clinical trials work for this sort of thing where you were going up to a patient who—like for the pacemaker case—like they already have a pacemaker? How do you convince them? Like, “oh, can we put this engineer tissue into you to see if you like, can go without the pacemaker?” How do you recruit these patients in the first place?

Matthew: Yeah. So... so we have—before I get to that step, I just wanna kind of touch on how we think about indication selection. ‘Cause I think it’s quite important. So because we are you correctly pointed out, trying to pioneer a completely new therapeutic modality, we have a barbell strategy for every new tissue we approach, which is that we are looking for two potential products. One we call a Regulatory Pathfinder. One is like a Commercial Workhorse.

So in the cardiac case, our regulatory pathfinder is a biological pacemaker. And what you’re looking for for a regulatory pathfinder product is the ability to have a unbelievably unambiguous clinical readout, incredibly fast, with the circumstances that would allow you to have a very, very small clinical trial. And then the commercial engine, which in our case is a left ventricular muscle patch for heart failure with reduced ejection fraction in stage three and stage four... that’s basically 95% of the revenue, but the readouts take a lot longer because the clinical readout for the reimburser for your commercial engine is reduced hospitalizations, which you just have to measure over the course of a year or two years even. We need to find proof of efficacy as quickly as possible in the most unambiguous way we possibly can to give a kind of a halo effect to both products in the barbell strategy.

So in the biological pacemaker, what you are looking for—and back to your kind of the initial challenge that you gave—is you’re actually not looking for patients where they have something that’s working perfectly well. You are deliberately trying to find cases where this can be a salvage therapy. And because you have the commercial engine, you don’t really mind about the size of the patient population that you’re gonna go after there. Because these things work together in tandem. So you are really trying to optimize for: can I have a first, like a phase one trial that is like almost N of 1, potentially even compassionate use so that you can get as much regulatory speed up as possible and have an unbelievably clear binary signal.

So for the biological pacemaker, what we’re looking for is “hardware exhausted” patients. Uh, they’re often pediatric or neonatal, sometimes they’re adult. So these are patients for whom they have a device that is going to fail. It’s either because of infection or its device rejection in some form. They’ve had multiple surgeries. There’s nowhere to put a lead left, like occluded veins. There are a bunch of reasons that this could happen, but basically your quality of life is gonna be anything from very, very bad to palliative care. And you need to find one of those people.

From a patient recruitment perspective, a lot of these cases are clustered in the same three or four surgical centers in the US. So you need to basically build inroads with the cardiologists who are doing these lead extraction, lead placement operations. But it’s essentially the same surgical workflow that they’re already doing. So again, no one needs to be like retrained on anything. Um, but the way that the trial would be designed would be as like a weaning trial. So you would have someone who is on device who has a pacemaker right now, but they are hardware exhausted—that device is gonna fail.

Abhi: And that’s usually predictable.

Matthew: It’s incredibly predictable, right? They will be classified as a hardware exhausted patient. That is known. There’s a code, like you can find them. And you essentially run a parallel trial where you have the engraftment of our biological pacemaker and they have the device still there as a safety backup, which again makes it should make it a lot easier to get buy-in from all of the patient groups and FDA and cardiologists because you have a kind of a baked in safety valve, which is that you already have this device and what you do is you toggle the device on and off and you see how much time can you have off device. And that readout you should find out whether you’ve built a functional tissue that does the thing that we said it was gonna do within days, hours, certainly, certainly weeks. And so it is the fastest possible way that you can get a human clinical readout. And so patient recruitment, you are really looking for like one, maybe two people for that trial.

Abhi: Why are these types of patients congregated at three medical centers? Is just like the rarity of this condition?

Matthew: Yes, because so much of it is pediatric. So it tends to be clustered in like where there are pediatric cardiologists, which is obviously a specialism within a specialism.

Abhi: This is maybe... these are difficult questions to answer because it’s something that’s gonna take a while for you guys to get to, but finding these sort of patients, or like convincing clinicians that they should be using this seems challenging. How much do you convince the cardiologist themselves that this is a worthwhile approach versus like, you just file a clinical trial and if like the parents of the child are interested, then they’ll sign up for it?

Matthew: So, I mean we think there’s a pretty good shot that... obviously we wanna engage early and often with the cardiology community, and we want ‘em to be very, very supportive of this as I think they should be. When it comes to the first use, it could be done through a compassionate use pathway, which has a much, much lower regulatory bar. There are large number of functional assays that we would do before even attempting to put this in a human, including successful large animal trials, obviously. And then the good thing about the way that the pacemaker trial would likely be designed is that there are no other options for that patient. Um, and there’s already a device in place that you could switch on to take over from the graft if something were to go wrong. So you are looking in these regulatory pathfinder indications—which we have for every tissue—we are looking for something with these features where you can try and have safety built in almost by definition. So looking to take someone off a device is a good way for looking at indications.

Abhi: Is there some analog to that in anywhere else? Like “you are on a device right now, we’re gonna introduce an intervention to try and get you off the device”?

Matthew: Is there an analog? I mean, I would guess that in kidney there must be... you know, to get you off dialysis. Time off dialysis is an endpoint. And that would probably be our endpoint for what it’s worth for nephron units and stuff.

[01:37:01] The argument for being a pan-tissue company

Abhi: One of the other... this is something we touched on earlier, but right now you have cortical programs going on right now. You have heart programs going on right now. And like the hope for Polyphron is that you become this pan-tissue company that you are involved in every single organ, every single system in the body at once. And when I first talked to you guys, it seemed like clearly insane. After you explained to me the rationale, it makes a bit more sense. I’d love for you to just like, repeat that basically.

Matthew: I mean there’s a... there’s I suppose a technical dimension to this and there’s like a commercial dimension as well. Um, so from a technical perspective—and Fabio can chime in if I absolutely mangle this—but as we said before, we have a relatively strong hypothesis that there are gonna be some latent rules of morphogenesis and development that extend across tissue. And we think that there’s a critical mass of tissue systems and tissue products that we can onboard, after which we could probably do most other products. So we actually think that there’s a race to reach that critical mass and that the company that does it first likely has a very strong competitive position relative to the market. Helped by the fact that most of the data is actually being driven by the lab. So after you have this initial bootstrap from reference data, it’s really, really hard to come in as a fast follower and attempt to try and produce this stuff. You would have to build the loop and run through the loop the same way that we’ve done. And we think that that will take a lot of guts and capital and maybe wouldn’t even work if someone were to try it afterwards. So the race is on in our view to try and get to that critical mass of tissues.

Not only that, but from like an organizational perspective, you need to master the manufacture, production, deployment of new tissue products at commercially reasonable cost of goods, which is something we’ve built into the loop from day one. And we’ll get better and better at that over time. And then we’ve also built these kind of nested flywheels where we should not only get better as a company and then at the company level, at the model level, each new tissue type we onboard, but each graft, each tissue product that is sent out into the world. So we’ll be very, very careful about collecting super deep telemetry data about what these grafts are doing, engraftment rates, clinical readouts, et cetera, all of which will be incorporated back into our loop.

And then the last thing I’ll say is that from like a company compounding perspective, we believe that these tissues will not suffer the patent cliff problem. So the value is not a composition of matter, which you would normally protect with a patent—which is the kind of the grand bargain of pharma is that you get 10 to 12 years of like monopolistic profits, but you have to disclose exactly how you’re doing it. Process power beats patents basically every time, and ours is a very, very process power driven company. So we anticipate for each product to have quasi-perpetual revenue streams at the limit. Like a bunch of stuff can happen, people can produce better products than us, but I don’t think we will have this kind of drop off. And so we should get quicker and quicker and faster and faster with every new tissue type and every new tissue product we produce. And the company should get more and more defensible as we go.

Abhi: Is that last point along the lines of like, oh, the defensibility of what we produce at Polyphron will be a combination of one, it’s just very hard to produce, and two, how we made it is a trade secret?

Matthew: Basically. So it’s model weights and process. It’ll... it’s closer to a semiconductor fab, honestly.

Abhi: That’s interesting. Um, I do have a friend who actually considers that like this will happen to the biology field in general because... it’s just like, like if China can just like pick up the molecule, then like why would you ever give away the... yeah.

Matthew: I would anticipate actually across the field, like classic pharma and biotech approaches working on regular modulations will start to move more and more into trade secret. Particularly with all of the target crowding that we’re seeing.

[01:41:57] What are the biggest scientific and economic risks?

Abhi: What is the most risky thing that could possibly happen at Polyphron that is either scientific or economic or both?

Fabio: Scientific... I think we made this explicit, but we have quite a few technical problems to solve. Adding complexity in terms of structure. We will have to see how well our platform gets to that level, or which knobs we will have to tune. Of course vascularization, we have this very strong hypothesis, but I can’t lie, it is something that we’ll have to face and that we are already starting to think about. The more on theoretical side, I find this idea of learning these latent rules of development as quite an interesting challenge. It’s not really a roadblock, but I think it’s something very fascinating that we will try to prove.

Matthew: Economic... Um, so I think I’m gonna kind of give like a non-answer. Here’s my non-answer. So when investors invest in biotech or tech bio companies, I think there is a delusion... which I don’t know who it serves... but there is a belief that you are only taking technical risk and you’re not taking market risk. And that’s what kind of deep tech investment is. That’s what biotech investment is. That’s absolutely not true. You’re taking both. And if you are not appreciating the commercial risk you’re taking, then... well, it’s not ideal.

So the worst position to be in as a company that has pulled off technical miracles—and we’ve pulled off maybe like one or two of the technical miracles we need to, but there are plenty more we need to pull off to make this successful—the worst thing that could happen is that we end up in a position where we don’t know how to manufacture this profitably. We don’t know how you would sequence the regulatory strategy, the market rollout strategy. Like Bluebird Bio is the now, I think, canonical example of what happens where you can do unbelievably good science and build like an unbelievably impactful product that can meaningfully benefit lots and lots and lots of patients, and you get sold in a fire sale for like 5% of the money you raised. So as much as possible, not just in how we build the organization, how we think about things strategically, but actually in what we are trying to optimize in the lab, we are baking in, as I said before, like COGS, vendor redundancies, like where we are sourcing this stuff from, to ensure as much as possible that we’re building a technically viable and commercially viable organization at the same time. So I suppose the risk is I’m wrong about any of that. But we’re trying as much as possible to kind of think around the corner even well in advance of when it would actually be going up for reimbursement by a payer.

[01:45:23] Who are Polyphron’s competitors?

Abhi: Relatedly, who do you need to worry about in terms of competitors in the tissue engineering space? I can’t really think of anyone. You the only...

Matthew: So... I mean, there are people working on tissue. I think we are the only ones with this pan-tissue approach. I think it’s because it probably does seem insane to people initially and you have to think about it a little bit to get more comfortable with it. But there are tissue specific companies that are working on things. It’s difficult to find out exactly what their approach is from the outside sometimes. But Bob Langer has a company called Satellite Bio that, as far as I know is doing liver. Aspect Biosystems, I think is taking a bioprinting approach. They’re doing some endocrine stuff, liver as well. And then they did a Novo Nordisk deal, which I think is obviously gonna be obesity related. And then you have, I suppose at the other end you’ve got the Xeno companies as well. Again, like I don’t believe that if you could produce human tissue or you could have pig tissue, you’d pick human tissue every time. And I actually believe that the pace of the field in sort of human tissue engineering is gonna exceed xenotransplantation pretty quickly in terms of technical maturity. And so I think that you’ll always want human tissue. If you had like a head to head, you’re always gonna want human tissue for a bunch of reasons. Not least of which is the genetic engineering, which adds some safety concerns and may not even be transferrable patient to patient.

[01:47:07] Expanding the TAM beyond transplant lists

Abhi: You mentioned earlier on about how by going after this non-super advanced stage of patients where they need organ transplants, you get access to this patient population that there is not really any therapeutic intervention available to them other than perhaps like medication. I’d love to get your take on that because I think it does dramatically change how I think about the economics of Polyphron.

Matthew: Yeah. So I think that’s something that you probably should believe in order to be bullish on like the extremely successful case of Polyphron. So let’s take an example. Let’s take the heart, which I know we’ve been talking about a lot, but we know a fair amount about it. So there are 6.7 million Americans with heart failure. It is the second biggest cause of death in the US I think. And of those 6.7, maybe 10,000 roughly will get a heart transplant. There are a large number of people who just cannot get the heart transplant even if they kind of were on the list. So either for frailty, age... like you’re just not gonna give it to someone who’s over a certain age. Comorbidities, lifestyle. You know, there are potentially other kind of exclusionary factors that would prevent you—like say you can’t take immunosuppression, for example. You can’t deal with being in the ICU because the heart transplant involves this like full sternum being cut open.

So of those 6.7 million, only 10,000 will make it to a heart transplant. But you have 670,000 roughly—so about 10% of those—are in stage three or stage four of the categorization that is used to categorize heart failure. And of those 670,000, we think that automatically with our first product, you can start to kind of address most of that population. So these are people who would not be able to get a heart transplant, who would be kind of allowing to defer that potentially forever. So you can meaningfully expand the population by, I mean, 10 to 20x roughly.

And it remains to be seen how much you can push it, but to kind of give you a sense of like TAM expansion: Okay, so let’s imagine that you start with a patient population who have heart failure with reduced ejection fraction below 35%. So there will be like a kind of a cutoff limit. And these are people in stage three or stage four. Your initial patient population is gonna be about 40,000 people. And the pricing for that product, you probably would anchor against a left ventricular assisted device. So call it 150 to 200,000 dollars, let’s say 200,000 dollars. Again, you’re probably anchoring against deferring a heart transplant and not having a left ventricular assisted device. So you have... let’s take the midpoint between 20,000, 40,000, which is number of these patients. You have a TAM just there of like 6 billion dollars in the US alone. That’s probably 10 globally. But you can meaningfully expand that just by going up. You can potentially even get to 50 plus billion dollars of US TAM by being able to address like a fraction of the people who have like progressive chronic heart failure. So right there, just in like expanding the patient population window slightly to say, you know, a hundred, two hundred thousand people, you begin to be in a revenue range of like Eli Lilly. And we believe that for most of these tissues, right of the list that I gave of like 10 different tissues we could go after, we believe that there are these mega blockbuster products waiting if you can just slightly—not even fully—just slightly increase the patient population window. And at the limit we think that we can increase it significantly.

Abhi: Is this particularly true in heart or do you imagine like similar dynamics would occur in almost every organ? Maybe except the brain?

Matthew: Almost every organ except the brain. Maybe eventually the brain, but that requires a bunch of technical work. There’s no brain transplant, right? So you take a lot of reimbursement risk and then there’s a huge amount of translational risk that you probably don’t wanna layer on top of the other risk that you’re taking here. But yeah, conceivably for pretty much every other organ that we currently have transplantation for.

[01:52:28] Autologous vs. Allogeneic approaches

Abhi: In the limit case of like Polyphron is now outwardly open to patients in need, would you go autologous or allogeneic for whatever your tissue construct is and why?

Matthew: So I think it depends. I think it depends on the indication. I think that there are manufacturing considerations obviously around autologous. So the optimal COGS scenario is probably immune-cloaked allogeneic, so that you can amortize the cost across multiple units. Because our system is designed from the get go to be resistant to starting donor line heterogeneity, you could actually just as easily imagine doing HLA matched allogeneic. So there are 40,000 roughly different subtypes, but that’s not normally distributed. There’s a kind of a fat head of subtypes that people belong to. Um, you could have a kind of an inventory of like a hundred different cardiac products and cover 70, 80% of the US population. It’s lower in more homogenous, genetically homogenous countries. So like Japan, I think you can do like 20 and you can cover 80% of the population. So I think that really, if you’re gonna be able to produce this at scale, you are better off going with a kind of immune matched or immune cloaked allogeneic product.

Our system is designed to be able to produce both hypothetically. Now this is like a bit more of like a sci-fi case, but if Hunter is successful, you could imagine biobanking frozen tissue well ahead of time, right? We know that there’s cluster dysfunction. That’s how age-related chronic diseases work. We don’t necessarily know the order, but we know that your kidneys are gonna fail, your lungs gonna fail, your heart’s gonna fail. And you could imagine a future where you would create a bank of constructs that you think you might need, and those are kept on ice for you. But I think for near term commercial solutions, you would want to do immune compatible, allogeneic.

[01:55:07] Is a 3-year timeline to the clinic realistic?

Abhi: The future of Polyphron in terms of like... when will the first clinical trial start? To me, when I first talked to you, I thought like, okay, it’s gonna like six to 10 years away. You had a rather aggressive timeline: three years. What’s the rationale on that?

Matthew: So now just to be clear about what the claim is. I am not claiming that within three years we could produce every single tissue. I’m also not claiming that any tissue that any of all the possible tissues could get into to a human within three years. I am claiming that if you are very specific about indication and tissue product selection and you parallel track some stuff—you are properly financed to allow you to parallel track some stuff—then yes, you could put the Regulatory Pathfinder product in a human within three years. So I think you would do, you know, large animal takes you some amount of that, and then you would jump straight to a large animal, and then you would go into human from there. And I think you could do it in less than three years if you had a compassionate use.

[01:56:28] Cross-species translation

Abhi: This isn’t something we actually talked about, like how how big of the translational risk is there that you might optimize something really good for large animal, but might not be very good for human?

Fabio: So our whole computational setup is designed to optimize for human structure. We are using human cell lines. So there’s always some type of translational gap. There is the limits of predictive validity of animal models. But you know, in terms of what we’re trying to recapitulate, we’re looking at like human tissue.

Abhi: But like in that case, that means it’s very difficult to do this like large animal study because you don’t know how to create tissue to their species. Is that not fair?

Matthew: I mean, you’ve got a decent similarity. Enough of a similarity between like in the cardiac case between pigs and humans. So the heart is similar enough in terms of size. Alignment is not perfect for what it’s worth, which is another issue with like xenotransplantation is that there is like a slightly non-natural morphology that you’re having to deal with. But that I believe would be considered kind of the best in class model. Porcine... maybe, but probably pig.

[01:58:05] What would you do with $100M equity free?

Abhi: The last question I have is: if someone gave you a hundred million dollars equity free tomorrow, where would that be best allocated to make the company move faster?

Matthew: Doing that thing I said in three years? No, in all seriousness, I think that a hundred million dollars would be best spent on... because we believe that this company is truly a platform and that there are some very particular moats that will emerge at a certain scale... sizing up the automation fairly aggressively early on. Uh, it can be sequenced. It doesn’t all need to be done at once, but a lot of the investment would go there. A lot would go on compute, honestly. And then the remainder would be the steps that you cannot skip and you should not skip, around like CMC and QC/QA and like all of the regulatory stuff to be able to put these products in humans. But I mean, with a hundred million dollars equity free, we would imagine that we could try and have two products in human, one within IND underway. And of those two products, they’d be in two different tissue systems. So we wanna prove that this... it’s not something weird about the heart that allows this to be the case. That this transfers across germ layers. Because it transfers across tissue systems, it transfers across germ layers. So that would be my use of funds.

Abhi: I’m a little bit curious... I felt for like your take of like on one side, I can imagine this a hundred million dollars is like really well spent in gathering like a lot more embryo data. But is that actually like not super useful? Like you have plenty like of set line data that you need?

Fabio: Yeah. So. We do not have it right now for sure. So there is gaps both in terms of stages for one specific tissue as much as we do not have pretty much anything for other tissues. But I do agree. My sense would be that after a certain size of the data set, the amount of information you can extract kind of plateaus. Just you need the feedback loop instead. So what we really have to do is to scale up as much as possible the scale of our experiments. So having automation that allows us to create very, very complex intervention kind of groups. And secondly, it is how many readouts we can get out of our experiments. What we’re doing right now, it’s mainly imaging based and it has of course its limitations. There are solutions out there that allow you to do spectroscopy or collect ‘ome or any other type of sensor based data. And you know, by building this kind of 360 view of our experiments, we can really start exploiting our computational approach and then really try to optimize for the structures we want to get.

Abhi: Okay. Cool. I think those are all the questions I had. Thank you so much for coming on, Matthew and Fabio.

Matthew: Thanks for having us.

Fabio: It was a pleasure. Very fun. Thank you.

We don't know what most microbial genes do. Can genomic language models help? (Yunha Hwang, Ep #7)

Abhishaike Mahajan — Mon, 08 Dec 2025 22:04:11 GMT

Note: Thank you to rush.cloud and latch.bio for sponsoring this episode!

Rush is augmenting drug discovery for all scientists with machine-driven superintelligence.

LatchBio is building agentic scientific tooling that can analyze a wide range of scientific data, with an early focus on spatial biology. Clip on them in the episode.

If you’re at all interested in sponsoring future episodes, reach out!

Introduction

This is an interview with Yunha Hwang, an assistant professor at MIT (and co-founder of the non-profit Tatta Bio). She is working on building and applying genomic language models to help annotate the function of the (mostly unknown) universe of microbial genomes.

There are two reasons you should watch this episode.

One, Yunha is working on an absurdly difficult and interesting problem: microbial genome function annotation. Even for E. coli, one of the most studied organisms on Earth, we don’t know what half to two-thirds of its genes actually do. For a random microbe from soil, that number jumps to 80-90%. Her lab is one of the leading groups working to apply deep learning to solving the problem, and last year, released a paper that increasingly feels foundational within it (with prior Owl Posting podcast guest Sergey Ovchinnikov an author on it!). We talk about that paper, its implications, and where the future of machine learning in metagenomics may go.

And two, I was especially excited to film this so I could help bring some light to a platform that she and her team at Tatta Bio has developed: SeqHub. There’s been a lot of discussion online about AI co-scientists in the biology space, but I have increasingly felt a vague suspicion that people are trying to be too broad with them. It feels like the value of these tools are not with general scientific reasoning, but rather from deep integration with how a specific domain of research engages with their open problems. SeqHub feels like one of the few systems that mirrors this viewpoint, and while it isn’t something I can personally use—since its use-case is primarily in annotating and sharing microbial genomes, neither of which I work on!—I would still love for it to succeed. If you’re in the metagenomics space, you should try it out at seqhub.org!

Youtube:

Spotify:

Apple Podcast:

Transcript: https://www.owlposting.com/p/we-dont-know-what-most-microbial

Timestamps

00:02:07 – Introduction

00:02:23 – Why do microbial genomes matter

00:04:07 – Deep learning acceptance in metagenomics

00:05:25 – The case for genomic “context” over sequence matching

00:06:43 – OMG: the only ML-ready metagenomic dataset

00:09:27 – gLM2: A multimodal genomic language model

00:11:06 – What do you do with the output of genomic language models?

00:17:41 – How will OMG evolve?

00:20:26 – Why train on only microbial genomes, as opposed to all genomes?

00:22:58 – Do we need more sequences or more annotations?

00:23:54 – Is there a conserved microbial genome ‘language’?

00:28:11 – What non-obvious things can this genomic language model tell you?

00:33:08 – Semantic deduplication and evaluation

00:37:33 – How does benchmarking work for these types of models?

00:41:31 – Gaia: A genomic search engine

00:44:18 – Even ‘well-studied’ genomes are mostly unannotated

00:50:51 – Using agents on Gaia

00:54:53 – Will genomic language models reshape the tree of life?

00:59:18 – Current limitations of genomic language models

01:08:54 – Directed evolution as training data

01:12:35 – What is Tatta Bio?

01:19:02 – Building Google for genomic sequences (SeqHub)

01:25:46 – How to create communities around scientific OSS

01:29:06 – What’s the purpose in the centralization of the software?

01:35:37 – How will the way science is done change in 10 years?

Transcript

[00:02:07] Introduction

Abhi: Today I’m gonna be talking to Yunha Hwang, an assistant professor at MIT, applying machine learning to microbial genomes. She’s also the co-founder and chief scientist at Tatta Bio, a scientific nonprofit dedicated to building tools for genomic AI. Welcome to the show, Yunha.

Yunha: Thank you. Thank you for having me here.

[00:02:23] Why do microbial genomes matter

Abhi: First question, what makes microbial genomes so interesting to you?

Yunha: So yeah, I get this question a lot. If we think about the history of life, microbes have dominated that history of life, which means it’s the most diverse, it’s the most flexible, and in terms of the chemistry that it can do, it’s the most divergent you can possibly imagine. When we think about diversity of sequences, that’s where you’re gonna find most of the diversity of sequences, in microbial genomes. Yeah.

Abhi: And so it feels like a natural place to take AI and ML tools to just throw at it.

Yunha: Yeah. That’s one way to look at it. I think when we think about using biology to do cool things, I think about doing cool chemistry. So there’s like a utility aspect there as well.

Abhi: Were you focused on this topic since your undergrad days, or was it something you switched to during your PhD?

Yunha: Yeah, so I was a computer science student in undergrad, and I learned about the human genome. So I was interested in biology, but I got really hooked when I learned about this field of environmental microbiology, which sounds really niche, but it’s essentially... you’re looking at life in very extreme environments or places that you wouldn’t really typically look for life, such as the deep sea or deserts and so on. And then you’re finding new types of life, all through sequencing, through different kinds of methods. And that’s when I really got hooked in terms of scientific interest.

[00:04:07] Deep learning acceptance in metagenomics

Abhi: I think an interesting trend in a lot of people applying AI to at least somewhat niche fields in biology is that they are usually amongst one of the first people to stand up and say, “Hey, deep learning could be really useful here.” And the culture around that field is usually not pretty accepting of deep learning. How much did you find that when you were applying AI to metagenomics?

Yunha: Yeah, that’s a good question. I think.. . I think at the beginning, people were a little skeptical. But I think people were also quite open to it because when you’re studying metagenomics, you basically scoop up dirt and then you sequence everything out of it and you use computation to piece them together. So essentially you’re looking at billions of base pairs, and there’s no way a human can do it. There are people who are really good at it and who can piece together entire genomes using manual curation who are just pattern recognition geniuses. But for the most part, we’ve been using computation to study these billions of base pairs of divergent data. So in that sense, people are not so opposed to the idea that, “Wow, maybe we’re not very good at doing this. Maybe we do need machines. We do need some extra layer of understanding in order to understand this massive amount of divergent data.”

[00:05:25] The case for genomic “context” over sequence matching

Abhi: Traditionally—I’m not super familiar with the field—but my interpretation is that the traditional bioinformatic tools for studying metagenomics are like... you’re literally matching nucleotides between sequences that you found in one pile of dirt to another pile of dirt. What is your pitch for a better way to do it?

Yunha: Yeah, that’s a great question. So sequence matching is definitely part of the workflow. I think what’s really interesting is when you can look at a sequence and also consider the context it’s found in, and then understand that sequence within that context. And then also do basically comparative work between that sequence found in different contexts, and how the differences in the sequence can be made sense of using that information. So if you just take out the sequences, then these are just two sequences that are a few mutations apart. But then if you consider the full context of either the sample or the genomic context or the taxonomic context, then you’re actually answering a much more biologically relevant question.

Abhi: So you’re adding multiple layers of information on top of the raw sequences alone? And seeing what else you can pattern match from that?

Yunha: Exactly. Yeah.

[00:06:43] OMG: the only ML-ready metagenomic dataset

Abhi: And I think that leads well to perhaps probably your first big paper in the space. Maybe there’s others. But I think the first one that I was made aware of is a paper that introduces two things. One is a really large metagenomic data set called OMG. The second, included in the same paper, is gLM2, a genomic language model. I’ll separate my questions for both of those. The first one is OMG. Why did you release another metagenomic data set? Because from my outside view, right, there’s already a few out there. Why was there a need for another one?

Yunha: Yeah, that’s a great question. I would argue there were none out there—none that was useful for machine learning. Yeah, so there are public data sets. That doesn’t mean they’re useful. That they can be used immediately for machine learning purposes, for language modeling, for instance. An example is, metagenomic sequences can be very poor in quality, so you do need to do a lot of quality filtering. Also, there’s a distribution effect where you have a lot of really short sequences. ‘Cause as I said, you’re doing shotgun sequencing and piecing them together. So the curve, if you look at the distribution, it’s just like this. So you get a lot of really short sequences that don’t even contain a single gene, so you have to throw them out. If you modeled using that, then you’ll be basically modeling nothing. So there is some sort of filtering that you need to do with quality control.

There’s also two major big public databases. One is JGI’s IMG database. And the other is EMBL’s [European Molecular Biology Laboratory] MGnify database. And there is overlap between the two, and also a lot of biases. So for instance, people like to... it’s much easier to sample human feces, compared to deep sea ocean, even though that has a lot more diversity. So you get hundreds of samples of the same sort of human gut sample, but then very few of the very diverse deep sea sample. So by putting them together and then doing dereplication and semantic deduplication and various sort of methods in order to de-bias the data set, we’re making it actually a resource that’s useful for machine learning as opposed to its raw state, which was not really useful.

Abhi: So OMG was for the most part a combination of the existing data sets with a huge amount of pre-processing on top.

Yunha: Yeah.

[00:09:27] gLM2: A multimodal genomic language model

Abhi: I think that dovetails well, and you mentioned semantic deduplication. I’ll have questions about that later. But first, maybe we can start with... you created this data set, you built a model on top of this data set called gLM2. What is gLM2?

Yunha: gLM2 is a genomic language model, but it’s actually not a DNA language model. So, it’s trained on metagenomic data. It’s a multimodal model in that all the DNA sequences or all the intergenic regions are encoded in DNA nucleotides, and the coding sequences are encoded in amino acids. There was a reason why we did that. We actually wanted to make sure that we can model amino acid interactions across protein sequence boundaries. So if it’s a protein language model, it is not gonna learn protein-protein interactions, because you’re not seeing multiple proteins in the same context. Whereas a genomic language model that contains multi-protein context, you’re actually able to model multi-protein interactions or intergenic region-to-multi-protein interactions, which I think was what we wanted to do. And that was like what we wanted to do from the beginning. That’s why we modeled it that way.

Abhi: What is the actual task for this language model?

Yunha: It’s a masked language model.

Abhi: Like given this protein sequence, inter-genomic sequence, protein sequence and so on... you mask out like 15% of that. The job is reconstructing?

Yunha: Yeah, exactly.

[00:11:06] What do you do with the output of genomic language models?

Abhi: At inference time, what do you do with the output of the model?

Yunha: Yeah, so we were mostly interested in representation learning as opposed to generation, for instance. Because our goal was... there were two main tasks. One was we wanted to see if it learns inter-element interaction. So that’s one thing we wanted to learn.

Abhi: By inter-element, does that mean inter-protein...

Yunha: Inter-protein-protein is definitely one. So multi-protein. So protein interactions, but also we wanted to see, can we actually detect RNA-protein interactions? That’d be pretty cool because then you can find new types of RNA-guided systems, or can we just find like promoters for sequences, which we should be able to do, but we still don’t know how to do for a lot of divergent sequences. So that was what we wanted to do as like our primary task.

The secondary task was, we wanted to improve sequence representation such that we can propagate annotations better. So by that... so basically we have this problem where we have a lot of proteins and sequences, but we know less than 1% of what they do. Because we laboratory validated less than 1% of these proteins. So the problem is, there’s no way we’re gonna be able to laboratory validate all of these functions when we don’t even know what the assay is. So then the problem is we... the thing that you have to do is you need to propagate that information as much as possible, and then help that information guide the next set of experiments. And that’s the only thing we know how to do. And the only method that we’ve been doing it with was sequence similarity-based propagation. So if things are decently similar, we just call it the same thing, which is true for... to a certain degree. And then now you can do it with structure with FoldSeek and so on. If things are similar in structure, we just call it the same thing, which is also not always true, but it’s the best attempt at doing what we have to do.

So you can think of that as we’re basically compressing information across these different axes of information, which is sequence, and the other one is structure. The question is, can we do that across context? And that was a sort of motivating factor for genomic language modeling. Can we infuse like contextual information such that things that are similar in context would be pushed together in representation space, such that we can actually propagate information from one protein to another protein because they share the identical semantics in terms of context. So that was the sort of main motivator for why we wanted to do representation learning.

Abhi: And instinctively what’s the intuition for why just because two proteins are near each other, it means anything?

Yunha: Yeah. That’s a great question. So this is actually going back to why microbial genomes are cool. Unlike mammalian genomes or like anything that’s eukaryotic, microbial genomes can exchange DNA almost stochastically. That’s just part of its evolution. So things that are really far apart can exchange genomic information, which is not something that humans can do. We cannot exchange DNA with plants, right? So what that means is because there’s all these stochastic processes that’s happening in orders that we can’t even think about because there’s just so many microbes with really short, much shorter lifespan compared to our lifespan. These processes that are happening have been happening for the past billions of years.

So there’s selection pressure that’s keeping these sequences together in a certain order. And this is probably... some of these things are the things that we can rationally understand, as in these three proteins must be kept together because they literally form a complex that if one fell apart by chance, that organism just would not live and therefore would not propagate that particular arrangement of the genome. So certain ways in which genomes are arranged—gene content and genomic organization—all of these things have some sort of meaning. Some of them we can’t understand. Some of them we might be able to understand. So it is just... there’s patterns there. So how do we extract that pattern? And that is all selected upon. Some of them are random, so what we’re assuming is that the language model, by finding these patterns that are really salient, those salient patterns are probably not gonna be random. So then how do you extract noise from signal using language models?

Abhi: Yeah, it makes sense. Yeah. Like, the explanation of why protein-coding genes exist near each other means like some functional... has some functional meaning. Alternatively, I could imagine one explanation being that, oh, the microbial genome is just gonna be filled with a bunch of nonsense stuff. Like there’s one explanation of, yeah, nearness of protein-coding genes mean something because they need to travel together. Alternatively, it could be that even if one of them traveled to another bacterial genome, it’s just not used and it just sticks around there, like taking up space. Is that ever a concern?

Yunha: I think it’s less of a concern. So we talk about this junk DNA; we don’t really know what they do in like human genomes. I think for microbial genomes... there is... so no one really knows what junk DNA does, so that’s a separate conversation. For microbial genomes, if you have a gene that is not being used, there is a cost. So in order to be able to carry this forward, there is energy that’s required. There’s information burden, there’s just mutational burden. It’s just better to get rid of it.

Abhi: Yeah. That does make sense.

Yunha: Yeah. So I think it’s really difficult to conceptualize this because we’re thinking of it as, oh, like there’s gonna be so many random things that happen. But if you look at it from across samples, across history, the patterns that get picked up... there is a reason for that pattern.

[00:17:41] How will OMG evolve?

Abhi: Going back to the OMG data set, ‘cause I realized I have more questions about it. I imagine OMG is not gonna be like the final iteration, like the final metagenomic database. What do you wanna improve about the next version?

Yunha: Yeah, that’s a great question. So metagenomic databases are exponentially growing, so there is the sort of the size consideration. So I think since... I forget when exactly OMG came out. I think it was like a year ago. It basically grew almost like twice. So you can imagine like that being a big piece of what OMG-2 might be.

I think there is also sort of new types of data that’s being generated. So when it comes to things like epigenetics, so like methylation signal... it’s not as prevalently available as the raw sequence data or like the assembled genomic data. But I think that subset of data that has methylation calling done by the sequencing technology itself, I think that’s a really interesting data set to include or to subset. So I think ideally, OMG extends beyond genomic data into transcriptomic data and other types of omics data. So that’s the vision that we have down the line. But that does require many more iterations.

Abhi: Are you not a “DNA is everything you need” maximalist?

Yunha: No.

Abhi: I guess has anyone trained a DNA-plus-epigenetic or some other type of modality model and seeing that there are vast improvements in being able to represent something? I guess like you did that with genome and proteomic stuff. But has anyone else extended beyond that?

Yunha: Yeah, I think there was a new paper that came out recently. For human and mouse genomes where they included a bunch of like functional genomic data. I think it came outta Genentech actually. That was an interesting paper. I think it’s exciting, because you are basically adding genomic data with a bunch of other tracks of information. I think the sort of limitation there is you can’t do that for a vast majority of life branches. So you can’t call it like a foundation model for biology because we simply would not have that data for most branches of life. Like basically everything except like human and mouse and maybe a few things that we can culture.

[00:20:26] Why train on only microbial genomes, as opposed to all genomes?

Abhi: Why—this is almost like a cultural question—why is there the separation of like metagenomics and human genetics? Why isn’t there, like, why isn’t gLM2 trained on all genomes?

Yunha: Yeah. That’s a good question. So, all genomes as in mammalian and... yeah. Okay. So I think there are some practical reasons why we didn’t extend our model to eukaryotic genomes. One reason is like for plants, you can’t even call genes for a vast majority of their genes. So calling genes is not a trivial task for even some microbes actually.

Assuming that a sequence that you currently have in front of you is a protein sequence or protein coding sequence, that is not an assumption we can always make for a lot of genomes. Given our sort of data structure, we couldn’t make that assumption for plant genomes, fungal genomes, or mammalian genomes. There’s that consideration. Also, there is... microbial genomes are really tightly packed. So there’s very few intergenic, or very small intergenic regions that you have to consider. Whereas for eukaryotic genomes, there’s really long intergenic regions. So in order to be able to model multiple proteins at the same time, your context length has to increase significantly. And that was just not a very... it was not a practical thing to do for our model.

And I think in terms of... if you’re thinking about like data, like bang for the buck kind of situation, you’re getting so much more from microbial data, not just because it’s things are more packed, but it’s just way more diverse. So if you were... if you had a pool of data that was organized in terms of diversity and you were picking things out, like vast majority would just be microbial genomes and microbial genes. So why inject human bias and then add a human genome when it’s not really for understanding human genomes in its innate purpose? So that was the reason why we didn’t include mammalian genomes.

[00:22:58] Do we need more sequences or more annotations?

Abhi: Is it fair to say at this point, the thing you need to turn up is quality of the existing data rather than quantity of like more sequences? Or is it like non-obvious?

Yunha: I think it’s very obvious we need more labeled data and I think everyone probably agrees there. The question whether we need more metagenomic data or more unlabeled data... I’m probably... it’s probably nice to have. It can’t hurt. But it’s just a matter of... you have a lot of metagenomic data and then you find patterns that are becoming more and more salient because you have data that’s less sparse and therefore you are recognizing cooler patterns. But there’s no way of understanding what those patterns are if you can’t match it to any labels. So that labeled data is a lot more valuable, in my opinion.

[00:23:54] Is there a conserved microbial genome ‘language’?

Abhi: What do you think is... I guess this is a good question the protein people have also, but I imagine like proteins are a little bit more conserved. There’s 20 possible amino acids. Maybe not. Maybe that’s also a contentious point, but... At like gLM2, how close to like full universal microbial... or how close is it to like fully understanding the universe of microbial genomes? Like if we take gLM2 and we apply it to say the genome of like a hydrothermal vent bacteria... how good is it at representing that particular genome?

Yunha: Yeah. So if it’s in the training dataset...

Abhi: Sure.

Yunha: It will be better at it than when it’s not in the training dataset, as with any language model including protein language models. If you throw in a sequence that is very different from seen sequences, then ESMFold will fail. AlphaFold will fail. Same with representations for gLM2 and so on. So yeah, I think there is value in training this sort of like base layer, going from sequence to some sort of representation or some sort of understanding. Because yeah, if you have a really divergent sequence that’s out of the training set, then it will not generalize to that particular sequence.

Abhi: Yeah, I guess like the dream for the non-MSA protein language models is that it has this like universal understanding of proteins, regardless of like how many MSAs actually exist for the protein. Like, for Alphafold, as the MSA depth goes down, performance gets worse. Do you see something like that also for GLM where like as a sequence gets further and further away from the training data set, it also goes down?

Yunha: Yeah.

Abhi: Do you think you’ll ever escape that? That you’ll ever discover some like universal grammar for microbial genomes? Or it’s just so diverse, it’s like unlikely.

Yunha: I think it’s probably the latter, but maybe there are cool new advances that prove me otherwise.

Abhi: Moving away from the actual dataset and like more closer to the model... what was the context size for gLM2 and why did you pick it?

Yunha: Yeah, I forget the exact context size, but the benchmark was we wanted to include about 10 genes. And the reasoning there was, we’re looking at sort of an average length of operons, or average gene number for operons, and then we wanted to have multiple operons. And so that came down to about nine or 10 genes.

Abhi: Do you see, or do you intuitively expect as a context window expands you see better and better representation performance? Or does it probably max out?

Yunha: Yeah, that’s a good question. We experimented a little bit with varying the context length for the tasks that we benchmarked against. We did not see a significant improvement as we increased the context length. But that’s the benchmarks that we used, which is limited because what we know is limited. So yeah, it’s all against what you’re measuring. So if you’re measuring against something that’s super obvious, then the model is gonna learn something without needing a lot of context. But if you’re measuring for things that require multi-protein context across multiple proteins, across interactions that are really far apart, then maybe it actually benefits from including that context. I think the things that we’re measuring are too shallow. And too... we are trying to understand biology and we’re chipping it away at emergent properties that come from biology and these are really obvious patterns that we’ve observed. So no wonder these obvious patterns are the first ones to be picked up, without necessarily requiring like large context.

[00:28:11] What non-obvious things can this genomic language model tell you?

Abhi: When you say obvious patterns, I’m curious, like what did gLM2 tell you about microbial genomes that was like interesting? Like you mentioned like it was able to pick up inter-genomic elements and like what each one of those inter-genomic elements potentially mean. What could it do besides that?

Yunha: Yeah. So one thing that we were able to showcase was protein interaction. So it’s not just about “oh, these genes co-occur.” But these genes actually have co-evolving residues that goes across multiple proteins that actually maps to the protein interfaces that are known. So if you apply that to things that we don’t know much about, then we can actually resolve new types of PPI interfaces.

Abhi: Can you walk me through like how you extract PPI information from a model like gLM2?

Yunha: Yeah. This actually was in collaboration with Sergey’s Lab. Sergey’s Lab showed that you can use this method called Categorical Jacobian, where you are getting out co-evolving residues within a protein. And you can basically use that in order to identify residues that are close together and therefore co-evolving. And then you’re basically like turning the 3D structure into a 2D space. B

ut you could technically do the same thing for protein-protein interaction. It’s just two things folding together. But in order to do that, you need to have an understanding of which of these protein variants co-occur in the genome, right? So that if you just have protein A and 50 variants of protein A and protein B and 50 variants of protein B, but you don’t know how these two things are connected, that kind of signal goes away. But then if you know that A-dash and B-dash go together, A-double-dash and B-double-dash go together, then you are able to actually resolve that statistic where things are co-evolving across protein A and protein B.

Abhi: Yeah. When you say like they co-evolve... Is that translate to like they’re close in the embedding space?

Yunha: No. So co-evolve literally meaning if one residue changes in A then another residue that it’s in contact with changes too because of the biophysical sort of...

Abhi: Oh, so this is like relying a little bit on gLM2’s ability to generate genomic sequences or...

Yunha: So it doesn’t generate. So yeah. So basically if PLMs, or ESM, essentially learns the compressed MSA [Multiple Sequence Alignment], right? gLM2 also learns compressed MSA, but it’s paired. Which means... if you just had A and B together and then you concatenated them and then ran MSA, you would actually get similar signals. So you’re basically finding that kind of signal because you’re incorporating genomic context into modeling.

Abhi: Sorry, I’m just like mentally trying to walk through, because I think in... I may be incorrect, but like Sergey, the Categorical Jacobian paper was like mutating residues and seeing like what does the model think. But here it seems like it’s something different. Oh, it is the same thing.

Yunha: It is the same thing. Yeah. It’s just... think of it as like a sort of interpretability method.

Abhi: Okay. Okay. That makes sense. Yeah. I would define the Categorical Jacobian thing as like a mech interp, outside of that... was there anything else interesting you could pop out? You also mentioned you were able to derive like RNA-protein interactions. Is it using the exact same method?

Yunha: So we have seen some evidence of RNA... yes, using the same method... RNA-protein interactions. We haven’t been able to validate them. So I can’t speak for it.

Abhi: I was gonna ask like, how are you with identifying genes that’s perhaps a little bit easy... How do you identify purely RNA coding regions?

Yunha: Yeah. So small RNAs and tRNAs have such conserved structure. It’s actually very easy to spot them using GLM’s way of looking at the data. So if we ran like Categorical Jacobian on a stretch of DNA that contains RNA coding region, I guess RNA sequence, then it lights up immediately because of the hairpin structures. That’s really salient in RNA.

[00:33:08] Semantic deduplication and evaluation

Abhi: And one thing we have been continuously talking about is like models like these are potentially really useful for annotation of existing metagenomic sequences. And I think there was this really interesting thing you did with the OMG dataset that actually relied on the gLM2 model that it was trained upon called semantic deduplication. Would you be able to like just walk through what you did there?

Yunha: Yeah. So this was to tackle the exact problem where we have arbitrary chunks of DNA. And because of the way... so the classical way of deduplicating would be sequence alignment. So that’s what protein language models do. So you cluster and then you cluster using basically sequence similarity, and then you pick from the cluster. And that’s one way to make sure that you’re not over-representing your training data with one cluster. So you can’t really do that with arbitrary chunks of DNA because... assume that you have a chunk of here and then you have a chunk of that’s like this. It will align here, but it won’t align there. And also, because it’s so long, you can’t align... you can’t cluster DNA so quickly because alignment gets really expensive as you increase the length of the DNA.

So that was like a problem that we needed to solve in order to de-duplicate or de-bias the data set as much as possible. So we were actually looking at computer vision literature, and they have the same sort of problem where there’s just a lot of images and how do you make sure that you don’t have a model that’s only trained on like cats and dog images, because that’s what people like to take photos of? Then you need to like either classify them or... but then if you just classify everything as dogs, then maybe you wanna keep some of the diversity in dogs. So there is... how do we de-bias the data set with as little bias as possible, as little human bias as possible?

So I think one method that people have used in DNA language models is, okay, let’s just like use taxonomy as label, and then we’re just gonna sample one from this genus, one from this genus, which I think is a fair thing to do. But the problem with metagenomics sequences is that you don’t always have taxonomic labels. You’re literally getting sequence from like a pile of dirt.

Abhi: Which may include like brand new genus, is that right?

Yunha: Exactly. And you don’t wanna bias against those either. So we wanted to basically... we trained a small model that was essentially the same thing as gLM2. And then we embedded all of these contexts and then we sampled from those contexts in order to be able to de-bias the model as much as possible.

Abhi: How do you judge whether this like works?

Yunha: Yeah, so we had a benchmark. So we basically designed a benchmark that was actually quite a lot of work because if you just rely on existing benchmarks, then the model seems to be doing worse once you make the data set more diverse. And the reason is... the data itself is over-represented with e.g. E. coli because that’s what’s most studied, but also what the benchmarks are based on is also E. coli. So then might as well just train an E. coli model. Why do you even go about training a metagenomic model? So what we did was we actually, before we even trained the model, we actually worked on getting together a really diverse set of embedding benchmarks. And this is really like going... we are, when we are sampling sequences, we’re sampling across the tree of life, not just from E. coli. And that was like a very deliberate thing that we did before we even started training gLM2.

[00:37:33] How does benchmarking work for these types of models?

Abhi: What does benchmarking even look like for a microbial language model? Are you purely measuring yourself by your ability to reconstruct the genome or is there something else?

Yunha: No. So we don’t even actually consider perplexity as like a good metric. So what we did was... we looked at how good the representations were, or as in the embeddings were, for various tasks. So one is a classic task of: does it actually capture phylogenetic relationships between sequences. So there are statistical models that you can use in order to resolve the phylogenetic distances between sequences. And I guess the important thing to do there is to make sure that these sequences are sampled across the tree of life. So we did that and then we basically compare the embedding distances to phylogenetic distances between the sequences. That’s one benchmark. Another benchmark is: can this embedding represent... can this representation space actually compress information such that sequences that are far away in sequence space or structured space, but actually do the same thing in function, bring them closer together? So that’s... you’re using like metric that is like “nearest thing in space” in order to retrieve. So it’s a retrieval-based benchmark in order to be able to find things that are similar in function that we’ve hand curated across the tree of life, to see if you can do that using embeddings only. So we are benchmarking against ESM and other types of embedding to see if it performs as a retrieval task.

Abhi: The thing I would be like very... I think like protein, like, RMSD benchmarks are oh, like fairly trustworthy. ‘Cause you can trust that the x-ray crystallography was like correct. With function annotation, how much can you trust that like these papers that you’re pulling the functional annotations from actually did their job correctly?

Yunha: Yeah. That’s a good question. So we... I mean it’s like really hand curated. We do look at the papers. We make sure that the function that we are looking at is correct. So for instance, like enzyme functions. So people... so I think that was actually one of the benchmarks. So given the sequence, you’re trying to predict the EC number, which is an Enzyme Commission number, which represents what kind of reaction it can catalyze. But the problem is there is positive data... but one enzyme can actually do multiple enzyme reactions depending on the context. So just because it wasn’t documented doesn’t mean it’s not possible, right? So it’s actually very common for a single sequence to be able to confer multiple enzymes in certain hierarchy, that are in the same class, but different substrates. So it’s... so there are cases where our model actually predicted, “Oh, this sequence is likely to conduct both of these reactions with equal probability or similar weight to each of these reactions.” And there’s only data for one, but not the other. So we cannot really say for sure that this is wrong. There are definitely gaps in the data that we need to be aware of, even when you’re really carefully curating this data set. But that’s also an interesting case to look into because it’s... yeah, it’s spotting things that we didn’t spot it before.

[00:41:31] Gaia: A genomic search engine

Abhi: That makes sense. Yeah. And actually OMG and gLM2 are actually some of your earlier work. I think your latest paper is about another genomic language model called Gaia. Could you walk me through what Gaia exactly is?

Yunha: Yeah. So Gaia is actually not a genomic... actually it’s a... I would call it more of a system that’s built on top of gLM2. Gaia is essentially a search engine. So what we wanted to do was demonstrate that gLM2 embeddings can be used to find sequences that are similar in function. And the way we did that was: okay, it needs to definitely find sequences that are similar in sequence, because otherwise... that’s like the least you can do. And then you should find sequences that are also similar in structure. But also you should find sequences that are similar in context. So that’s what we wanted to do. And gLM2 representations were suited for that because it has all that information as part of the training. So Gaia stands for Genomic AI Annotator. And the first thing it does is it retrieves sequences that are similar in gLM2 embedding space. And then the next thing it does is it actually maps that embedding to text descriptor so that we can annotate more rapidly.

Abhi: So you input in a genomic sequence, you find all the nearest proteins via gLM2 embeddings. And then how do you convert that to text? You just like pick the closest protein?

Yunha: Yeah, so we use Swiss-Prot as our sort of golden dataset. That’s probably the best curated data set that we currently have. And so that is pairs of protein sequences to a text descriptor, right? So we train a CLIP model on top of that.

Abhi: Yeah. Okay. And so like... so you’re relying on the full universe of proteins in Swiss-Prot to represent also the full universe of possible functions.

Yunha: Yes.

Abhi: While that very well may be valid... do you suspect that there are possibly like microbial proteins or inter-genomic elements that are not cataloged within Swiss-Prot?

Yunha: Yes, certainly. Vast majority. That’s the whole point. Yeah. And we also choose not to... there is a threshold where we say “no function” or “no known function.”

[00:44:18] Even ‘well-studied’ genomes are mostly unannotated

Abhi: I am not aware of this literature at all. How often is it that people find some weird microbe able to do something that no other microbe can do?

Yunha: Very often. So if you look at a microbial genome, and even for really well-studied microbes such as E. coli and Mycobacterium tuberculosis, you’re finding half to two-thirds of their genes being unannotated we just don’t know what they do. And that’s not even including things that are just like “this is a membrane protein,” which still doesn’t tell us anything about the function. So there’s that problem. But if you look at a random microbe from soil, 80%, 90%, if not 95% of their genes will have no annotated function using basic like sequence-based methods.

Abhi: ...when a microbe can do something that’s never been observed before.

Yunha: Yeah. So that happens. I would say that’s why environmental microbiology was so interesting. There were literally microbes that were being discovered like left and right that can do crazy chemistry. Like literally live off of... it breathes rock as opposed to oxygen. Or it converts disproportionate sulfur, like elemental sulfur, into like sulfite and sulfate... that kind of reaction. We just don’t even know how to do What else? Just things that are like living off of uranium and using that energy, or harnessing that energy to live. Microbes that just live for a million years and we don’t know why and how.

Abhi: There seems to be like two elements here. Like one is trying to annotate functional genomic elements that we reasonably understand... like, “what’s this?” like “this exists somewhere else in the microbial kingdom. Maybe this does it in a different way, but like the function is conserved across other domains of life.” And on the other side, which feels like the far more interesting bit, is that there are microbial functionality that exists uniquely within the species and exists nowhere else. How common is that latter bucket? Like you mentioned like uranium eating bacteria, like rock eating bacteria. Is it usually there are very specific species that do this exact thing and nothing else does it?

Yunha: So interestingly, there’s more and more cases of convergent evolution happening where there’s multiple ways of doing the same thing, which is not that surprising if you think about it. So that’s why I think this idea of compression is actually an interesting idea. If there is like more like a sort of layer to biology that we didn’t fully understand... so we know how to look at sequence pretty well now. But if there are patterns underlying those, and then if we can use those patterns to actually match functions, so that we can actually discover new functions that have conversely evolved to do the same thing. That would be really cool.

Abhi: Going back to Gaia now... I imagine you have this setup for turning the pre-trained GLM embedding into like functional annotations for this like dark universe of microbial genomes. Have you done that? Have you gone through every single un-annotated genome, applied Gaia to it, and is that all just stored somewhere?

Yunha: Yeah. So we did that experiment with Mycobacterium tuberculosis, where two-thirds of the genes we don’t know what they do. And we actually developed... so it was like hard to do this manually, because it’s still... you’re still looking at 2000, 3000 genes, and then you’re trying to figure out what it is using Gaia annotation. So we actually built like “Gaia agent,” which would then try to validate what the Gaia annotations are, given the context. So we basically ran the whole pipeline in order to discover new sequence functions in this really well-studied microbe that thousands of labs have studied for tens to hundreds of years. And yeah, we were able to find four proteins that we could actually validate in silico. And I’m like, “Why didn’t we know this before?”

Like one example is... it’s two proteins that each were annotated as uncharacterized protein in literally every single database that we looked at. And then when you search it individually, you don’t get any matches. But then if you fold them together and search, you actually get a match to an Archaea, which is an entirely different domain of life that have diverged billions of years ago. And you get very little sequence similarity to the extent that you won’t be able to find it using typical tools. But if you look at the structure, it’s actually almost identical. And that’s like a membrane transport protein complex. And then another one was... that one was really interesting because it was like a very small ORF that was never annotated in Mycobacterium tuberculosis because it was really small, but then it also had two other proteins that transforms that tiny little protein peptide into something that’s antimicrobial. So that’s something that’s three systems that we weren’t able to identify previously because we are only looking at each one separately instead of looking at the full picture.

[00:50:51] Using agents on Gaia

Abhi: Could you walk me through Gaia as a platform... makes a lot of sense to me. What does Gaia agent exactly do?

Yunha: Yeah. So Gaia agent, what it does is what a really good microbiologist would do in silico, but just automates the whole thing. So Gaia agent looks at the full context, which is what microbiologists would do. So you see a protein and you look at its annotation. You look at all the motifs that this protein has alongside all the motifs for other proteins, and all the DNA sequence motifs. And then you’re like looking for patterns across the tree of life. Oh, these two things co-occur together, or there’s a co-orientation and very small spaces between the genes, which likely means they actually travel together. And then you’re doing reasoning across the functions of... “this reaction happens and this reaction happens. Most likely this gene is probably doing the reaction that goes from this product to this substrate.” So if you have a reaction chain, for instance, then you can actually figure out... So you have product A and then substrate A going all the way to product D, and then there’s steps B and C. And we have reaction enzymes for reactions in the first part and then the last part. But we don’t know what’s doing the middle part. You can make a reasonable guess that the protein that’s found somewhere near those two proteins might be doing that particular reaction. And you can actually use that kind of reasoning to be able to essentially fill the gaps and de-orphan this particular enzyme reaction.

Abhi: So does the reasoning... so like Gaia agent treats like gLM2 as a tool alongside like the rest of the literature?

Yunha: Yeah. And also other tools such as like FoldSeek. So we give it FoldSeek and you give it other types of bioinformatic tools that you know you can access in silico. Ideally you also have access to like automation labs. We’re not quite there yet.

Abhi: Why is like... is it just like too computationally expensive to just let this rip over the entirety of all un-annotated microbial genomes?

Yunha: Yes. It’s not cheap to run this. And we’re looking at a lot of genes. So one thing we’re actually looking into doing right now is we are gonna look at a few hundreds to like few thousands of genomes that are like on the wishlist of all of these biologists. So we’re just gonna run it and then see, and then also share that result so that people can use it.

Abhi: I’m curious... I’m completely unfamiliar with what the typical metagenomic workflow of a biologist looks like. What’s the fundamental difference between just like providing a gene sequence into gLM2, seeing what proteins are nearby in Swiss-Prot and like nearby in the embedding space... picking up the nearest Swiss-Prot protein as “okay, this is what this protein does”... versus using Gaia agent? Why do you need reasoning on top of that?

Yunha: Yeah. So if it’s a sequence that has a good match to a Swiss-Prot sequence, then you know...

Abhi: You go home after that.

Yunha: Yeah, you don’t need to even run Gaia. You can just do this with BLAST. I think the problem is for a vast majority of genes, you don’t even have that match. That’s why when you run a typical genome into like genome annotation tool that relies on BLAST, you will get 80 to 90% of the genes as unannotated or something that’s meaningless. So how do we make that 50% or 40%? And that’s done by compressing that space so that we can make more associations faster.

[00:54:53] Will genomic language models reshape the tree of life?

Abhi: You had this offhand comment about like how you discovered an Archaea-esque protein within this very well-studied protein that is distinctly not Archaea. And you’ve also mentioned in the past that like how potentially models like these can dramatically change our understanding of what the Tree of Life or phylogeny in general looks like. I’d love to get just like your take on that subject.

Yunha: Yeah, so I guess on the sort of Tree of Life side... So I don’t think the language models will replace phylogenetic trees. Phylogenetic trees are a lot more complex... I mean this is a whole discipline that’s built on top of like how things mutate, what are sort of models of mutation that we should be using...

Abhi: But still all sequence based, right?

Yunha: It’s all sequence based. Yeah. But there’s just a lot of modeling that’s there. And, yeah, I think you should almost see the phylogenetic trees as almost like ground truth to how things evolve. Just also because these things also take a long time to compute as well. So I think there is a future where we can get like cheap and easy phylogenetic trees using language models and embedding spaces, and that would be like an easy way to get a quick look at how things are related. But in the end, phylogenetic analysis have its own space in science literature and science analysis.

I think what’s changing though is as new sequences come about, and as we sample more, the tree is shifting. Because you are only constructing trees based off of what we can sample right now, right? But if you add new branches, the branch structure changes. So for instance, like an example is... we don’t know if eukaryotes... the traditional way of thinking about the Tree of Life is that there’s bacteria, there’s Archaea, and then there’s like a special branch of eukaryotes. What we were actually realizing is that actually the Eukarya are just like a single branch from Archaea. And that has like fundamental change in how we think about the Tree of Life. And that only happened because we actually sampled this hydrothermal vent that contained this Archaea that was closer to eukaryotes, but also still part of the Archaeal tree. So now humans and eukaryotes, the entire branch of eukaryotes, belong to Archaea technically.

Abhi: That sounds like a dramatic reshaping of how we think about... so in that sense, why don’t you think the same thing will happen if you bring in genomic language models? Like why won’t it dramatically change that tree of life in a similar way to that Archaea discovery?

Yunha: Yeah. So because I think that discovery, the amount of information that both models, whether it’s a language model or a phylogenetic model has access to, is the same.

Abhi: So sequence alone gets you like 80% of the way there and like whatever genomic language models bring to the table... it’s probably not like a massive amount...

Yunha: Yeah. I don’t think it’s gonna shift the shape of how things evolved. And we also don’t have a way to validate any of that.

Abhi: Interesting. Do you think you’ll ever want to do phylogenetic research?

Yunha: So I did some of that when I was more in the environmental microbiology research. I think it’s really fascinating, the kind of work that you can do in retracing what happened across the tree of life and the history of Earth. I think that’s really cool. I do also find it a little bit frustrating that you can’t be entirely sure, because you can’t go back in time. But it’s... I think there’s really cool science that comes out of doing phylogenetics.

[00:59:18] Current limitations of genomic language models

Abhi: It’s interesting ‘cause I think also like Sergey [Ovchinnikov] has an evolutionary biology background. It’s interesting how these paths are converging a little bit. One thing I did wanna ask is we’ve talked a lot about the extreme promise that all of these models have. One thing I’m wondering about is where do they currently fall apart? What particular like species genomes problems do these models not currently work well today in?

Yunha: Generally they don’t do well when the training... when it’s on a problem where, or on a genome where it’s not well sampled in the training set. So that’s... I think everyone knows that now. There’s no surprise there.

I think in genomic language modeling, DNA language modeling, what we wanna do with these genomic language models are not still clear. And I think that’s largely because we don’t have a lot of paired data. So when we think about protein language models, it’s pretty clear how you can assess the quality of the protein language models because you’re trying to go... there’s a pair data of structure, right? So you have a lot of protein sequences and there is really good set of structure from very different systems and so on. So you can actually benchmark against structure. But for genomic language models, I would argue we don’t have that data to benchmark against. And I think everyone likes to talk about function, but I think that data set is still very much limited and extremely biased. And it doesn’t really... it doesn’t like do the justice of showcasing that GLMs are learning functional information. It’s just impossible to utilize this model because there’s nothing to pair it to. So like for protein language models, you can use it to design a new structure or new sequence. But for genomic language models, because we don’t have this other modality to condition it on, we don’t know how to use it yet.

Abhi: Do you think we’ll ever get to the world of like single “model to rule them all” ? Like maybe gLM2 also spits out protein structure and like maybe that’s an area you can like check. Does that make sense? Like you have these auxiliary outputs that help you ground... help you understand what is the model able to understand versus where it’s like a little bit up to vibes and like you’re unsure as whether it’s understanding it.

Yunha: Yeah. I think that’s how we’ve been benchmarking a lot of these models, right? Like Evo and gLM2... we can make gLM2 generative as well, and then we basically generate a protein and see how good the protein is. And then we benchmark against the protein language models. We can do all of these things, but what’s the point? Like you can just have also a protein language model. So I think we’re still figuring out like... what is the problem that we’re trying to solve with genomic language models? For us, we’ve been focused on like annotation. How do we make annotations better? How do we make representations better? But one thing that we’ve realized is, yes, we can make representations really good, but we still need better golden data set in order to make a bigger dent in how we are understanding genomes. So it’s like a... you need to attack it from both angles, like more labeled data, better models and keep going in both directions. So that’s one sort of area that people can work on. I think there’s also like genome design, is another. I think the same problem comes into play. Like what is a “better” genome? For proteins, I think you can... there’s an axis that you can optimize on. I don’t know, like binding affinity or something. Thermostability. Like things like that. For genome, I think that’s a lot more... I think there are ways to fine-tune it to do one thing. But there’s no general sort of axes that you can like optimize generations for.

Abhi: I know that this is something you’ve mentioned in the past about how like microbes are often capable of chemistry that is either almost impossible for us to do, or straight up just impossible for us to do. Is it not a clear benchmark, just being able to generate a microbial genome, which like innately allows you to sustainably produce something that we otherwise cannot do outside of that microbe? Do you think like we are close to that at all? Like for gLM2, how good is it at generating microbial genomes outright?

Yunha: So in order to do what you said just right before—which is, wouldn’t it be the benchmark to be able to show like, “oh, this generation can do something that nature cannot do, or something that we wanted it to do, that doesn’t already exist”—then you need to be able to condition.

Abhi: It needs to be in your train set.

Yunha: Yeah, but what I’m trying to say is that conditioning signal or conditioning dataset doesn’t quite exist at its full scale to be able to do that.

Abhi: Let’s say that you just wanna replicate something. Like there is like this one microbe that like feeds off of uranium. You wanna be able to create a microbe that is very much like it, but perhaps is as easy to grow as E. coli or something. How well can you do that today?

Yunha: Yeah. That’s a great question. I think that still comes back to the annotation problem. Where given an your like microbe that can feed off of uranium, we don’t know which parts are important. Which parts are not important.

Abhi: Yeah. I guess this is why you potentially would want to max out the context length of a model like this. So you can just feed in... either you can get the model to spit out an entire genome and then you don’t need to know what is important, what isn’t important. Is that a fair way to think about that?

Yunha: Yeah. So then... what would the training objective look like? You will have genomes that can do a like chemistry X. And then you need to generate a sequence given this like chemistry X and then you need to make it also like E. coli.

Abhi: Yeah. I think that second part’s a bit difficult.

Yunha: Because otherwise if you just say, okay, like we already know this genome Y can do chemistry X. And if you tell the model to build a genome that does chemistry X and it will just output something that’s similar to genome Y, and you could say, “Oh, that works.” Like maybe you get really lucky and it’s a few mutations, synonymous mutations away, such that it doesn’t actually change the biology at all. But all you’ve done is just like maybe I don’t know, learn synonymous mutations.

Abhi: One thing I was surprised by by the Evo-2 paper and perhaps all genomic language models is that it is difficult... there’s no way currently to condition it on anything other than sequence. Why hasn’t someone built a model that could be conditioned on function?

Yunha: Yeah. Because there is no good pair data sets.

Abhi: But there’s some. You’re just saying like there’s not enough?

Yunha: Yeah. There’s not enough. And also I think paired dataset exist for proteins. Not really for genomes or segments of genomes, right?

Abhi: Especially for segments of genomes. But if you have a model ingest the entire genome, maybe the functional annotation could just be like: “Eats this, grows this amount.”

Yunha: Yeah, I think that... so if somebody curated that data set and did it, and it’s accurate, which I think is a big if, then I think it’s possible. You can basically build a database of natural language description of a genome. But that also relies on us understanding the genome, right? So okay, so you have a genome and you’re like, okay, there’s a cellulose degradation pathway. There is like a carbon fixation pathway. So you already know okay, this organism is gonna grow like this. So then in order to condition a generation on that function, then the only vocabulary that you can use is the vocabulary that you’ve used to annotate that genome. So you’re completely limited by the capacity to be able to annotate that genome, which comes back to the annotation problem.

[01:08:54] Directed evolution as training data

Abhi: Have you heard of like Pioneer Labs? This like forcing microbes to evolve down a certain path. And then evolving... observing like what the genome looks like after that. Do you think that’s a particularly interesting way to gather data and it’s maybe like what more people should be doing?

Yunha: Yeah, I think... so like more on the directed evolution side?

Abhi: Maybe I’ll give like a quick description of what Pioneer Labs is. It’s a company that basically wants to create microbes that are able to survive... in I think Mars-like environments, which is just basically just extreme environments in general.

Yunha: Yeah. I think it’s really interesting because it gives another sort of dimension to the data that we didn’t have readily available. So it’s the same thing as if you’re learning how to drive a car, it’s much better to see how the car drives than see the final state of where the car is. Like I think you could potentially learn how the car drives by seeing a lot of photos of cars in different contexts.

So that’s what we’re doing. But then if you had more trajectories and you learned more from trajectories, I think there is a path forward in learning something that’s more meaningful. And that can be modeled better. So I think that... I think there’s a lot of potential there. I think one caveat there is you can’t do this kind of directed evolution for all types of functions, nor all types of organisms. So you’re... but I think that’s fine. It depends on the question. If your application is in an organism that can be cultivated and for a function that can be optimized for, then it’s the right approach to do it. You just can’t apply that for Archaea where it doesn’t grow.

Abhi: Makes sense. How much of your research... I think you’ve focused on the kind of two different axes of this like genomic language modeling problem. Like one, like the data’s not fantastic, we need to get better data. Actually maybe three. The second is like maybe the modality, like we need more modalities of microbial genomic data. And the third is the models which, Gaia agent is maybe like an improvement over just like gLM2 alone. Which of these three are you most interested in personally pushing forwards?

Yunha: Sorry. The three were... one, what was... yeah, sorry.

Abhi: The one is like the total quantity of like labeled genomic data.

Yunha: Oh, quantity of labeled genomic data. Yeah.

Abhi: Or potentially unlabeled as well.

Yunha: Oh, yeah.

Abhi: The second one is like modalities beyond genomics. Third is like the model itself and pushing on that direction.

Yunha: I think they’re all tied. Because the label data is like... you’re labeling and therefore you’re adding another modality to your dataset.

Abhi: That’s fair. Yeah.

So yeah. One and two are the same.

Yunha: Yeah. Yeah. So I think for me, I guess adding new sort of data modalities to genomic data, I think is the most exciting path forward because then you can start actually conditioning things on function, like you can actually imagine being able to do things that we can’t do with the toolkits that we currently have and the knowledge that we currently have. I think that’s just the most exciting path forward.

[01:12:35] What is Tatta Bio?

Abhi: Yeah. That makes sense. And so yeah, we’ve talked about OMG, gLM2, Gaia and also Gaia agent. Many of these things were spawned from Tatta Bio, which you’re one of the co-founders of. It’s a scientific nonprofit dedicated to developing like tools for genomic intelligence. Why is it a nonprofit?

Yunha: Yeah. Tatta Bio is a nonprofit because we’re trying to tackle a problem that maybe too big to tackle for an academic lab in an academic setting. And also very interdisciplinary in terms of... it does require a lot of software talent and machine learning talent, which there are plenty in academia, but it’s difficult to just organize that team in an academic setting. But also there’s no immediate incentive for the market forces to solve this problem. So, say for instance, like the annotation problem... It’s clearly a really important problem because it limits what we can study and what we can understand, and it obviously is gonna underpin new research directions that have unknowable like value. But neither the market nor academia are tackling this in the sort of the scale that we wanted to tackle it at. So that’s the reason why we are a nonprofit.

Abhi: And what is the actual... like I mentioned like Tatta Bio is developing “genomic intelligence.” I think that’s straight up like on the website. What is the... what do you consider the purpose of Tatta Bio to be in terms of what is it delivering to people?

Yunha: So what it’s delivering to people right now is helping people to better understand their genomic sequences. I think it’s clear that genomic sequences cannot be understood by humans. So human-machine sort of collaboration has always been the case for understanding genomic sequences. And how do we make that better? How do we augment that? So that’s the big mission that we have. So that’s how we... what we mean by genomic intelligence. Being able to truly, truly understand genomes, but not necessarily in the sort of like the rational sense that we have. It’s like “this part does this and this is evolved because of that.” It’s really being able to harness the genomic information that’s currently available and engineer it and modify it in the way that makes sense for applications. So yeah, so that’s what we are currently doing. I think within that there’s like the tool building, there’s infrastructure building, there’s community orientation. Like all of those things are sort of part of our mission.

Abhi: Actually one question I wanted to ask for a while, why is it called Tatta Bio? Because actually when I’ve brought up the company to other people, they thought “oh, is it tied to that one like Indian consultancy company?” [Tatta Group]. Why that name?

Yunha: Yeah. It’s... I guess it’s like reference to “TATA box”. And TATA box is like a literally a sequence motif in DNA that’s rich in TA or T-A-T-T-A in this case, that signals the start of a gene or like a reading frame.

Abhi: Yeah, that was a good name.

Yunha: I don’t think everyone got that memo.

Abhi: What would you... what would make you think that like we’ve succeeded at Tatta?

Yunha: Yeah. For us, if we could... I say for instance, if we could double the number of sequences that can be annotated. I think that is a success.

Abhi: To some degree it feels like with Gaia agent, you can do that today; you’re almost like just like compute limited. Is that fair to say? What else needs to be really be done?

Yunha: Yeah. I think there are just real dark patches of the sequence space that we haven’t fully explored. And I think... so if you can imagine like if it was literally just a map and there are complete dark map patches, and if we can figure out a way to generate hypothesis for any one of those sequences, that’s gonna make a big impact because now we’ve already built a very good way to compress that information so that we can propagate that information really quickly. So then... yeah, so then there are definitely like areas that we should really be studying because it’s gonna make a big impact in how we understand sequences. So that is how I see it as a sort of next step. How do we identify those areas that are really poorly characterized, but has high impact potential, and go about experimentally validating some of these sequences and functions.

Abhi: So is it like... I guess I keep returning to this question. The reason you don’t wanna let Gaia agent just run over the entirety of un-annotated sequences is that you’re unsure about the validity of any one of those given predictions, and there’s like more work to do as to figure out like where is Gaia agent reliable and where is it not reliable? Or is there something else?

Yunha: So well, I guess like you can always generate hypotheses. But the question is how many of these can we actually validate? And how many of these is it worth validating given the sort of resource limitations that we currently have?

Abhi: Like I imagine one thing you could do is like let it run across all microbial genomes and then just give that information to the community. And see what they’re able to come up with.

Yunha: Yeah. Yeah. So we are basically trying to do that. But we can’t do it across the entire trillions of genes. So we’re making... we’re trying to make a good selection of either genomes or genes that are like on the wishlist of people and scientists.

[01:19:02] Building Google for genomic sequences (SeqHub)

Abhi: Do you imagine like... FROs [Focused Research Organizations] have a specific like specific like length of time they exist before which they become for-profit? Or they just die entirely? Because they fulfilled their mission. What do you think the future of Tatta is? Yeah, eventually there’s a for-profit or at the end of it, it just like winds down because you’ve annotated the sequences. You’re done.

Yunha: If we could figure out a way that we annotate every single sequence, which I think is very ambitious and probably not possible in the next X years, then that should be our goal. We take a stance that this is going to be an evolving like database of sequences to function and how do we best optimize this database so that things don’t get lost and things are optimally propagated across scientific literature and across scientific discourse.

One of the sort of like latest projects that we’ve been working on is called SeqHub. It’s literally like GitHub for sequences or Google for sequences. So in an ideal world, you can type in the sequence and you get all information, not just the annotation, but what papers refer to it, who are the best people to ask about it, what kind of discussions have been had about this particular sequence and what obviously what other sequences exist that are in that provenance and what kind of genomic context is found in. So we with Gaia, we tackle the genomic context problem. With SeqHub, we’re basically tackling other types of sort of infrastructure problem, because way too often people make discoveries all the time, but it’s not... that information cannot be propagated like readily, because it doesn’t fit into certain database that people have built like 10 years ago. And it just doesn’t fit. And that database doesn’t get propagated to what people use all the time.

So how do we build this more real time understanding of sequences? So that’s a big part of our mission. How do we build a better software infrastructure for sequence understanding and data sharing? And so as part of that mission, we can’t... if we wanted to fully fulfill this mission, and we have the assumption that this is gonna take a long time, we actually want to maintain this infrastructure for as long as we can fulfill this particular mission. Which... so as part of that, I think what we still need to figure out is how do we build sustainability into our operation and business model. And our goal is to remain fully non-profit, and still build in ways to generate enough revenue so that we can maintain this scientific software and infrastructure, which by the way, has been very difficult to maintain in this current funding environment. Traditionally I think it was funded by the government. But that also means certain types of innovation is difficult to switch. You can’t build a fast-paced team in a lab that is either getting funding that is not enough to do this kind of work. So we are also like thinking really creatively about how do we maintain scientific infrastructure and software infrastructure because so often good softwares get made, but are not maintained. Or good ideas transpire... like okay softwares, but doesn’t get scaled up and deployed into production level software. So this is another sort of aspect of work that we’re currently doing.

Abhi: I’m not sure if you’re like able to talk about this, but... PyMol was a really great piece of software, Schrödinger just acquired it... they have a private version that you have to pay Schrödinger to use, but they also have this very nice open source version [PyMOL] . Do you think you could imagine Tatta Bio going down that route where they’re acquired by some existing like Basecamp or someone who really cares about the information that Tatta is gathering and they allow this shaved off like open source version?

Yunha: Yeah. I don’t know. Yeah, we haven’t fully thought about that. I think what right now we’re more focused on is how do we become entrenched in this like scientific ecosystem. And I think a key sort of difference here is it’s not just a software. If it’s software, then you can just copy it and then you can improve it, and then you can share it. But if it’s an infrastructure that needs the community to deposit data, share data, then as soon as you close source any part of it, then the value of that particular infrastructure goes away. I think the only sort of big... the only sort of parallel that I can think of is like PDB. Or you could argue the same thing about Google. If you didn’t have Google that was free... to just deposit in the internet was free. But then you can’t build LLMs if you didn’t have that, the internet. Same thing with like AlphaFold and PDB. So yeah,

Abhi: Like all of it needs to be open sourced for like the network effects to actually start thinking...

Yunha: Yeah.

That’s how I think about it. That’s why I think it’s really important for us to stay open and stay like free for the vast majority of the functionality.

Abhi: Have you seen the XKCD comic? That’s like, you identify some universal problem everyone has and that says “I’m gonna build a solution to it”... and now you’ve just added another universal standard to the 13 others that existed prior. Like what other quote-unquote universal standards are there besides SeqHub and like where do you think they fall short?

Yunha: So in the space of like sequences, I think UniProt is a great example. It’s what people go to when you have a protein sequence.

Abhi: Sorry, specifically for genomes.

Yunha: Oh, genomes. Oh, like specifically like a... Oh, I see.

Abhi: What almost like network territory is SeqHub encroaching on? Are there any... or is like SeqHub unique and there is no other... there’s no other platform for something like this?

Yunha: The only other genome centric like existing platform that’s widely used is NCBI.

Abhi: And that’s not... there’s not really network effects there.

Yunha: No. Yeah.

[01:25:46] How to create communities around scientific OSS

Abhi: Okay. That makes sense. Okay then yeah, it seems like ripe territory to capitalize on. How do you... how have you typically found the process of gathering a community around a brand new piece of open source software? I imagine it’s like a relatively new experience for you.

Yunha: Yes. Yes. Certainly.

Abhi: How has that been?

Yunha: Oh, very interesting. A lot of learning on our side. It’s... yeah, it’s different in that so it is a self-serve software. And it is also B2C in some ways.

But it’s a very small community of people. We’re not tackling the general public here. We’re also currently really focused on microbiology community. And hopefully we can expand out to other communities like in plants and fungi and so on. So that’s our sort of roadmap.

Yeah, but it’s... we need to get in the head of scientists and think about what... why do we do what we do and why do we want to contribute? And how do we contribute and where do I spend most of my time? And what are the most biggest pain points that we have? So all of these things that we need to think about when we design the software and the platform. And building good software is one thing, but building a community is just an entirely new thing that we’re literally just figuring out as we speak.

Abhi: Especially if it’s yeah, like you mentioned, the community is so small. Like I can’t imagine the people who like would actively be power users of the software number more than a few thousand people worldwide. How do you like... how do you get in touch with all of those people and tell them like, “oh, you should be using this thing that we built.” Like how do you convince them that this is worth their time?

Yunha: Yeah. For us, it’s truly... so I think there’s been a lot of attempts at encouraging people to deposit data better, add more data, metadata, blah blah blah. I think one thing... we need to make it really easy. So it should be depositing data should be super easy.

And we shouldn’t require them to do a bunch of things, so that’s just a basic thing that we can build in. Another is we need to give them what they really want the most, and for us it’s better annotations. When I was a student, it’s like the most frustrating thing when you have sequences that you’ve waited so long to get into your hands and you look at it and so much of it is just hypothetical and you’re like just banging your head against the wall to understand what these sequences do. And that is the biggest motivator. If we can give them better annotations, if we can give them more insight into what they’re looking at, that’s what’s gonna bring them here. And those are gonna be the people who are gonna be the most incentivized to contribute because it will come back to benefit them and the community. So that’s our hypothesis. We’ll see how that goes.

[01:29:06] What’s the purpose in the centralization of the software?

Abhi: That’s fun. Like you have this platform which is really hard to populate to start off with, but the draw... like the reason you’d want to interact with that at all is because you get access to Gaia, basically. As like a way to help you interpret what’s going on.

Why... what’s... this is maybe something I should have asked before. Why even care about having something like SeqHub? Is it like... yeah, like maybe you want more people to use Gaia, but like alternatively Gaia could just be like a standalone GitHub thing? Like why do you want a central place to deposit sequences?

Yunha: Yeah. Yeah. That’s a great question because we’re trying to expand this labeled data set. This gold standard data set that we have, which is currently Swiss-Prot... we think there is actually quite a lot of information that’s outside of Swiss-Prot. Swiss-Prot is human curated by the way, which is incredible. There are curators whose full-time job is to look at papers and validate, “oh, this is like a new sequence. We should add this to Swiss-Prot.” I think there’s just a lot of knowledge that’s hiding in labs and hiding in people’s brains and hiding in papers and supplements that can be organized a lot better so that we can actually improve sequence annotation without even having to do any experiments. And I think that is like... if we organize ourselves properly, with infrastructure that is up to date and with correct incentivization schema, then I think we can... we might be able to like double the number of sequences that we can annotate without having even having to do any experimental workflows. And I think that is like what we’re trying to build right now.

Abhi: What’s the... yeah, you said Swiss-Prot is human annotated, which makes sense why it’s so low throughput. I’m curious like how much realistically... how much knowledge is like hiding in the heads of people at these microbial genomic labs who simply like don’t have the results necessary to write a paper about it and get it like deposited somewhere? So like how strong... what... when you talk to these people, is it usually that they have like tons of things in their head that like they’ve been thinking about it for decades, but like they just don’t care enough to write a paper about it?

Yunha: Yeah, I think that definitely exists. And I think this is also byproduct of the publication system. As in, if it’s not a big story, then where do you share this information? And when it’s not gonna be really cited, and when things are not gonna be discoverable... so there’s no incentive to write a single paper just to say “this is something.” You might be able to say, “oh, like we have experimental results.” But it’s just not gonna be a very highly cited paper. So what happens typically is either it’s like a tiny little section in a large like paper. So you write a whole paper and then there’s like a tiny little thing. It’s “oh, we think this is this, or we have like high confidence this is this, based on this tiny little supplemental figure that no one looks at.” And that never gets propagated to central database.

Abhi: Is it like the Swiss-Prot annotators just have so many other things they want?

Yunha: Yeah. So there’s that. And then there’s just internal knowledge. Like people do experiments all the time. Like we do a lot more experiments than what gets published in the paper. So I think there’s both of those sort of like at play, in terms of what is a publishable unit, how can we make knowledge transfer be more efficient across people. So imagine if you had to write a publication for every single bug fix in software. That just doesn’t make sense.

Abhi: And so like SeqHub, I think you guys officially released a month ago. Am I correct? And so a month has passed. What’s next on the roadmap? What do you... what have you seen the use cases are so far?

Yunha: Yeah. So what we... so we launched SeqHub about a month ago. And a key sort of difference between SeqHub and Gaia is that SeqHub can do like whole genome annotation. And it’s also a place where you can deposit data.

Abhi: Sorry, how does it do whole genome annotation? Just split it up into...

Yunha: Yeah. So basically, you can pull a... so Gaia is a sequence, like protein search. But we’ve extended it across like the full genome. So if you put multiple sequences, which is a genome, then it does automated annotation.

Abhi: Gotcha. Okay.

Yunha: So then now you can automatically create collections or data sets, right? So you have a data set for each genome, and then now we’ve integrated Gaia agent into SeqHub agent, that can do multi-gene reasoning in a genome that is native to your particular data. So, given a genome that I’ve sequenced from soil. I have high conviction that this soil... this genome can produce a molecule or degrade a molecule. I can ask SeqHub agent, “go through 5,000 genes that I’ve sequenced here, in this particular order that is found in... use all the tools that you have and find me the set of genes that’s gonna be involved in degradation of this particular compound, or synthesis of this particular compound.”

Or “this thing is found in this kind of environment.” So you basically can do reasoning that’s a lot more complex than “what does this protein do?” So that’s something that we’ve implemented for SeqHub. Essentially all of that is just... it’s aligned with our mission and that we wanna help people understand their sequences better, but it’s also to make sure that we can bring in this community of people who really care about their sequences and want to share their knowledge. So the next step for us is to build this community of scientists who will generate this paired information with sequences to either human understanding or experimental data or sample data. We’re just trying to get as much information as possible publicly for sequence to a label that matters in science.

[01:35:37] How will the way science is done change in 10 years?

Abhi: When we last spoke, you mentioned that you think the way that science gets done will look very different in 10 years. What do you think changes?

Yunha: So one idea that I have... I don’t know, like this is changing all the time... but I think there’s been a lot of focus on scientific narrative. So, how you tell the scientific story is really important in science, in the scientific enterprise. So even when it’s like a small finding, you write a whole like narrative...

Abhi: Amplify it.

Yunha: I think... you contextualize it so that it’s impactful and that’s really important. You might find like “this protein does something” and alone that’s just “okay, sure.” But if “this thing does something, then this means that this can do something else and then that means we can use it to do fix this particular problem.” So that’s contextualization of scientific discovery. And that narrative has been really important. And I think almost overemphasized. And I think that’s also... I think that’s not a... maybe in to the extent that I think it’s overdone.

And I think in the future as machines are more involved in scientific discovery, perhaps data is gonna be a lot more important. And how we... I think currently the narrative is more important than the data. Data is just like a zip file, and then people read the narrative and AI agents read the narrative, right? So that is... that’s become really important part of science. But I think as we do more science with the data itself, not with the narrative linking, I think the data sets are gonna be a lot more important. And maybe in the future we’re just gonna be like depositing data and calling that a scientific product, which is not something that’s being done today. And the sort of innovation is in how you generated that data, how meticulous you are, how innovative you are. I don’t think like the human role is gone, but it’s just the data generation is done in a way that’s so sophisticated that it has a big impact on the conclusions that we can draw from that particular data. That is like scientifically salient.

Abhi: Do you think we’re like currently poking at that with the release of Future House’s Kosmos? Like the existing like AI co-scientist stuff... and were you gonna just plug in your data? How much do you... have you used those? How much do you trust them today?

Yunha: Yeah. So I think it goes back to the same question of like human language and narrative, and how much emphasis we wanna put there. I agree that language is a like a very important medium in which we understand things and then link concepts. But overemphasis on narrative and using only agents to like natural language agents... I’m not saying the current agents are like this... the worst case scenario is the AI agents only read and it doesn’t do any data analysis. I’m sure it’s still gonna find something new, right? It just read a lot of papers and then you chat with it and you’re like, “oh, like what does this protein do?” It probably doesn’t... it probably does this.

I think in an ideal world, there’s more emphasis on the data part and the understanding of the data without the sort of biases of language. Whereas the language is how it communicates with humans. So I think we’re not quite there yet in terms of how do we build like scientific systems.

I’m not even gonna call them agents because I think that places too much emphasis on the narrative. But how do we build systems that can conduct science and scientific inquiry that can go beyond like human narrative and human understanding. So that’s... yeah, I don’t know. I still think about it a lot.

Abhi: In some sense, like I almost imagine the natural language agents are like—also like Gaia perhaps, or Gaia agent perhaps—are like somewhat poisoned by the fact that they have read narratives and have like hyper-focused on certain things that perhaps not actually that useful or interesting. When you look at Gaia agent’s reasoning traces, how much do you see this, that it’s like focusing on what you personally would not have focused on?

Yunha: I see. Okay. Yeah. And sometimes that’s a good thing. Sometimes it’s not a good thing. I think, yeah, so I’ve seen cases where Gaia agent just doesn’t focus on what it’s supposed to focus on. And there’s no reason for it, like it’s just doing what it wants to do. And I can’t really... I don’t know if this is something that can be solved with like better prompt engineering, giving it more tools, and how to rescue it going down a path that is just too obvious or too... yeah, like how do you make it more like rebellious against the existing knowledge? I don’t know, because it’s so reliant on what it knows. So I think I’m sure there are like a lot of like agent-based research for how to make agents more, yeah, more creative I guess. So I think there’s like definitely work that can be done.

Abhi: Have you seen that one like Andrej Karpathy tweet about him really desiring some LLM that knows nothing about the world, but is like maximally intelligent and is able to go out and gather information as it needs?

Yunha: Yeah.

Abhi: And I heard that like GPT-OSS was actually like this, it had incredibly low benchmarks on like general world knowledge. But it was really good at math. And it was really good at just like the CodeBench or the software engineering stuff. I’m curious, have you tried GPT-OSS in Gaia agents?

Yunha: Okay. I have not.

That would be pretty interesting.

Yeah.

Abhi: Cool. I think that’s all the questions I have. Thank you so much for coming on.

Yunha: Cool. Thank you.

Human art in a post-AI world should be strange

Abhishaike Mahajan — Tue, 02 Dec 2025 23:21:41 GMT

Bubble Tanks is a Flash game originally released on Armor Games, a two-decade-old online game aggregator that somehow still exists. In the game, you pilot a small bubble through a procedurally generated foam universe, absorbing smaller bubbles to grow larger, evolving into increasingly complex configurations of spheres and cannons. Here is a reasonably accurate video of the gameplay, recreated in beautiful high-definition:

Bubble Tanks was first released in 2007, with a sequel out in 2009, and another sequel in 2010. Back when I first played it as a child, I was convinced, absolutely convinced, that there was someone in the world whose entire life was nothing but Bubble Tanks. This person—and I took it on faith that they were real—woke each morning and immediately, before coffee, before the basic animal functions of evacuation and sustenance, played Bubble Tanks. They posted on obscure forums, arguing bitterly over tank builds and bubble physics with three other people who had the same devotion. I knew that their room was disgusting, repulsive. This was essential to the vision, that their stained clothes lay across their floor, worms crawling over them. They were either skeletal or enormously bloated, monastic asceticism or excess gluttony, one or the other. Bubble Tanks was single-player, so they did not do all this for fame or glory, but for love, or for something even deeper than love. Everything had been sacrificed for this game, and excelling at it would be all that they had ever done, all that they would ever do.

And what if this person were just the start? What if this Flash game became the organizing principle of human civilization? The economy would shift to accommodate. Bubble Tanks coaching would become a viable career path. Parents would discuss their children’s talents at the game over dinner. Political candidates would be asked about their Bubble Tanks records during debates, and one would lose an election after it emerged that he never evolved past the third tank configuration.

Looking back on this fever dream I came up with a decade-and-a-half back, one thing that immediately strikes me is that being creative in such a world must be monstrously difficult. Not because all creativity must be ultimately tailored towards Bubble Tanks enthusiasts—that much is obvious, and is not especially different from creativity in the real world, which must tailor itself towards enthusiasts of human-understandable concepts—but rather because there would be astronomical amounts of Bubble Tanks content already in existence. In the latest stages of this civilization, billions of people have devoted their lives to Bubble Tanks. Millions of them are creative. Hundreds of thousands have genuine talent. Tens of thousands have produced work that is, by any reasonable measure, brilliant. The Bubble Tanks epic poem exists in fourteen languages. The Bubble Tanks symphony has been performed at concert halls on every continent. There are Bubble Tanks novels that have won Pulitzers, Bubble Tanks paintings that hang in the Louvre, Bubble Tanks films that win Oscars. All the obvious ideas have been executed. All the non-obvious ideas have also been executed.

I have wonderful news. You live in the earliest innings of this universe, at the start of it all, just as more and more of the population is beginning to wake up to how great this Flash game is. Even more fortunate for you, it is not only Bubble Tanks that is the object of human devotion. It is everything.

Humanity has been producing art for somewhere between 45,000 and 100,000 years, depending on how generously you define “art” and “humanity.” For most of this period, the constraint on creative output was not imagination, but production capacity. The printing press changed this, then radio, then television, then the internet, and at each stage the volume of creative work accessible to any given person increased by orders of magnitude. Today there are more novels published each year than a human being could read in a lifetime. There are more films, more paintings, more poems, more essays, more podcasts, more YouTube videos, more TikToks, more tweets, more everything than anyone could ever hope to consume.

And as more art is produced, the more we must learn to discriminate. Consider stories. They existed for millennia in the form of epics, religious hymns, folk tales. But with the rise of printing presses that allowed a wider variety of stories to circulate, we were forced to develop something very dangerous: filtering technologies. Genre is a filtering technology. It emerged because no one could read everything, and so readers needed a way to predict whether a given text was likely to satisfy their immediate demands. “Romance” is a promise: there will be a love story, probably with a happy ending. “Mystery” is a different promise: there will be a puzzle, and it will be solved. Both are not really descriptions, but more accurately a contract signed by the author about what the book will do for you. And like all technologies, genre has evolved to become more precise as the volume it must filter has grown. “Sci-fi” was once sufficient. Then it fractured into hard sci-fi and soft sci-fi, into space opera and cyberpunk. Brand awareness is a different filtering technology. Netflix originals have the flavor of something that will likely be decent, but also homogenous, whereas A24 movies have an art-house sensibility with a certain color palette. Each subdivision represents a refinement of the filtering mechanism, a narrower promise to a narrower audience.

Why are filtering technologies a problem? Aren’t they great? We’re getting increasingly good at giving people what they want!

Well, it wouldn’t be an issue if the creative process were limited by human scale, but we’re getting close to leaving that world. I feel pretty comfortable saying that, at this point, LLMs can handle nearly every sufficiently-chunked-up bit of music production, graphic design, video editing, background illustration, character concept art, voice-acting, essay writing, and a lot more. The list extends as far as creative production itself extends, which is to say: everywhere. Every domain that humans have developed aesthetic traditions within is a domain where AI can now perform the components of that tradition with reasonable competence.

One could imagine that in the near future, there will be a new button on your television, one with a sparkle animation. After clicking on it, it will offer you a QR code, politely asking to scan it with your phone. Upon doing so, the button will give you one of the ultimate promises of our new frontier-AI-lab-centric economy: a text box, that will generate a feature-length film using whatever prompt you enter into it. We have arrived. The long march is over. This is the ultimate final utopia that our filtering technologies have been building towards since the first monk started to distribute the Gutenberg Bible. What will we make? What wonders await us?

I suspect the answer is: mostly nothing. Or rather, mostly more of what we already have.

The problem with filtering technologies, one that becomes catastrophic precisely at the moment of their perfection, is that they assume you know what you want. The entire apparatus presupposes a subject who arrives at the interface with desires already formed, preferences already crystallized, a little homunculus sitting in their skull who knows exactly what kind of story it wants to hear tonight. And, to be clear, this actually works remarkably well in the case of a finite set of existing objects. When there are ten thousand, even a hundred thousand films in the Netflix library, the algorithm’s job is merely to surface the handful you’re most likely to enjoy from a pool that already exists. You don’t need to know what you want with any precision. You only need to recognize it when it appears before you, to say “yes” or “no.” Really, the algorithm is not an algorithm at all, but something even more basic: an ophthalmologist. It flips between lenses: better, or worse? This one, or that one? You do not need to understand the properties of curved glass or the anatomy of your own defective eyes. You simply must obediently respond to the question you are asked.

This all breaks down the second you are placed in the driver’s seat, because you do not actually know what you want. How could I make such a proclamation so confidently? I can’t, but I will anyway: what you want most, more than anything else in the world, is stuff that you never realized you wanted.

I realize that this is a tired sentiment, subtweeting the apocryphal Henry Ford line about faster horses. “If I had asked people what they wanted, they would have said faster horses.” The implication being: I, the visionary, know what you want better than you do. And I, despite the dullness of my audience, will give you the automobile. You would think, reading this essay, that I am making a case for the artist: the sacred figure who reaches into the void and pulls out something none of us knew we needed.

But I am saying something much worse, which is that nobody knows. Not you nor the visionary. The Ford line is wrong not because customers actually do know what they want, but because, if we’re being honest with one another, Ford didn’t know either. It was a happy accident that he later (again, apocryphally, because I don’t think he actually said it) narrativized into inevitability, because that is what popular culture does with fixations that turn out well.

You may guess where this is heading. It’s time to discuss Being John Malkovich.

Being John Malkovich is a nearly two-hour movie, filmed in 1999, directed by Spike Jonze, written by Charlie Kaufman. It stars John Cusack as a failed puppeteer named Craig who takes a job as a filing clerk on the seven-and-a-half floor of a Manhattan office building—a floor with ceilings so low that everyone must walk in a permanent stoop. This detail is never really explained, other than a vague mention of how the original building owner had a wife who was a dwarf, which raises far more questions than it answers, Did he build the entire floor for her? Did she work in this office? Was this an act of love or an insult? By the time these questions have been raised, the film has already moved on, and it is never mentioned again. One day, while filing, Cusack’s character discovers a small door behind a cabinet. He crawls through. In doing so, he finds himself inside the head of John Malkovich, the actor, experiencing fifteen minutes of Malkovich’s life from behind his eyes, before being ejected onto the muddy shoulder of the New Jersey Turnpike.

This is the basic premise, all introduced within the first half-hour of the film. And I have not yet mentioned the chimpanzee.

There is a chimpanzee, who has a reasonable amount of screen time. She belongs to Lotte, Craig’s wife, played by Cameron Diaz. The chimpanzee has intense psychological trauma as a result of being torn from her mother at an especially early age, a fact that was shown entirely through flashbacks. What is the purpose of the chimpanzee being traumatized? It is unclear, because it is never actually a relevant plot point. Why is Lotte taking care of this chimpanzee? Is she an animal therapist? She is not. She works at a pet store, and stores a wide variety of animals beyond just the chimpanzee in her (and John’s) small New York apartment for seemingly no reason at all. Why is the chimpanzee in the film? It seems to be for the sole purpose of a pivotal moment in the film which requires using the chimpanzee’s cage, but this moment does not actually need the prodigiously large cage for it to work, and one could imagine a thousand other more reasonable ways to accomplish the same narrative beat. Despite all this, the chimpanzee is there.

How did Charlie Kaufman, the relatively unknown screenwriter and driving force for the film, even come up with this plot line? In an interview, he says this:

I wrote Being John Malkovich while I was waiting for [the next sitcom] hiring season. My idea was that I would write a script and use it to get work. I had this idea that someone finds a portal into someone’s head, and I had another idea that somebody has a story about someone having an affair with a co-worker. And neither one was going anywhere, so I just decided to combine them.

Oh yes, there’s an affair too. But it gets even funnier. Why is John Malkovich the chosen victim of the portal? Kaufman also gave the answer in a different interview:

I don’t know... I thought it was funny. It’s hard to explain, but I thought it was funny, but not jokey. Because [John Malkovich] is a serious actor, he is a great actor, but there is something odd about him and there is something behind his eyes that you can’t see. And I thought that was a good person for this.
And then I think his name is perfect for the title...

Being John Malkovich is a worrying movie for a filtering technology maximalist, because it is both incredibly good, benefitting from both the insane premise and bizarre details, and is also something that nobody ever asked for. What is the film about really? What is the emotion that it is intended to evoke? It is about identity, I suppose. Also about desire, and the way desire makes puppets of us all. It is about the loneliness of being trapped behind your own eyes. It is also about John Malkovich, specifically, for no other reason other than it being an apparently funny choice. There are a lot of very strange, but ultimately invaluable, stylistic decisions made for this movie, all of which ostensibly made because Kaufman got a kick out of it.

To be clear, I am not saying something like ‘a sufficiently well-prompted AI could not come up with Being John Malkovich’. What I am saying—which actually feels like a pretty defensible viewpoint—is that very few people would ever think to assemble together a prompt to create Being John Malkovich. This opinion does not require any sort of humanist romanticism, or belief in some vague notion in ‘soul’. What it is grounded in, really, is a fairly basic observation about the structure of human desire; which is that desire is not a fixed quantity that exists prior to its satisfaction, but something that is frequently created retroactively by the very thing that satisfies it.

This would not be so bad if it were the only thing happening. The Kaufmans of the world would continue to write their chimpanzees, and the prompt boxes would continue to produce competent variations on existing themes. The end result would simply coexist alongside each other, one being deposited directly in the multiplex, the other in the art-house cinema, each serving its respective demographics.

But there is a second thing happening, and it is happening simultaneously.

Since AI is quite good at producing the art that isn’t too strange, I imagine nearly everyone will be, in due time, happy to hand over their consumption directions over to it. Soon, Suno will produce everyone’s music, Midjourney will make everyone’s phone backgrounds, and so on. Yes, it will be slop, not because it is bad, but because it repeats. Generative models are, by their nature, interested in modeling distributions—trained on everything, they converge toward the most likely areas of their distribution, which means that even when you prompt for something unusual, you are pulling against a gravitational force that wants to return you to the most common areas. The result is that the most common forms of AI output have a flavor, a kind of statistical residue that accumulates across pieces. But most people don’t mind this. They are happy to let the model play the same ophthalmology game with them, because they know they can play that game well, and the results will probably be roughly as good as the last algorithm they played the game with.

And herein lies the problem. Now, there is no longer any reason for the multiplex to exist, because the multiplex is not meant to be genuinely unique, and whatever is not unique can instead be entrusted to our own personal, finely tuned filtering technologies, combined with the infinitely patient AI. Is this bad? Not for the consumer! But it does put the artist in a pickle, because it now means their last remaining way of being seen at all, much less standing out, is to create something like Being John Malkovich. This cannot be easily made by the AI alone, because it does not submit to the ophthalmology game as easily. And creating something like Being John Malkovich is, I imagine, challenging.

Of course, strangeness has always been a useful strategy for art. Even beyond Charlie Kaufman, the greatest artists from the last century were all a bit off. Joan Didion had an unnerving flatness, describing a woman’s suicide from a sexless marriage in the same sentence as a shopping list. Hunter S. Thompson decided that the reporter should be more interesting than the thing being reported on, and shoved his demented, drug-addled brain into everything he wrote. David Lynch made movies in which the nightmarish mysteries refused to be made legible, just rather something you were forced to marinate in during the film. Importantly, nobody was strange in the same way. Didion’s strangeness is one of temperature. Thompson’s is one of proportion. Lynch’s is one of epistemology. What they share is not really a style, but a willingness to identify the thing that everyone else in their field was doing automatically, unconsciously, and to ask: what if I didn’t?

During their times, these people were made rich, famous, immortalized for doing something as brave as this. Things are different today. Now, to be above the crowd is the minimum required to be visible within it.

This is a very stressful situation, and one that all young artists born into the Bubble-Tanks-obsessed universe could likely sympathize with. They too live in a civilization that is utterly consumed with infinite creative production alongside the dimensions that matter—for them, Bubble Tanks—and are forced to produce something underneath a sky that has already seen it all. One can only imagine how strange their work is. Importantly, we occupy the antechamber of this world. What is coming next can be seen from where we stand, and our distance from it is decreasing at a rate that makes projection trivial; five years, ten, and the gap collapses into nothing. There is genuine cause for throat-closing anxiety at this prospect.

You can imagine a rather bleak future is the end result of this. One in which someone sits at their screen, asks their friend.com pendant to create an eleven-season series about a 45-year old Japanese woman and her tsundere relationship with a coworker at the glue factory she works at, and watch the end result with rapt attention. In this hyperatomized future, capital flows only to the frontier model companies and no one else, where nobody has common language to describe the media that they consume to anyone else, since every single piece of media has a singular person as part of both its creation and distribution.

But perhaps something better is possible. Consider an alternative future. This one is exactly same as the first one, with one minor difference: people have moments of intense boredom with what the machine spits out to them, and they decide to go out searching for something else that someone else has made, one that does not taste like something that they, in a million years, could’ve ever come up with themselves. Not because they do not have the technical talent to do so—technical talent is precisely the thing that has been commoditized—but because they lack the particular configuration of a life that would lead someone to write that, to make that choice, to include that detail that seems inexplicable, right up until they encounter it and realize it was obvious all along.

I am increasingly optimistic that the second version is the more likely one, only because it feels as if popular art is being increasingly dominated by the strange, the unmistakable, the ones that have an auteur-esque energy infused into it. To be clear, this is not new. But it used to be a privileged position, something you earned after decades of clawing your way through the studio system, or something you were granted by virtue of being from the correct lineage. Now the privilege has inverted. Now everyone must leave their own distinctive, strange smack on their work, or else disappear entirely. Just take a look around you. The auteur is increasingly colonizing forms of media that once operated on entirely different principles. Substacks, podcasts, technical news; many of the most promising ones today are largely held up by the specific and irreplaceable neuroses of the person producing them. This is strange and new and also very old. It is a return to something like the bardic tradition, in which the story and the storyteller were inseparable.

Of course, none of this is to say that the auteurs are rejecting AI. In fact, the best ones may use it more than anybody else does, as the speed at which production will be demanded in the new world will necessitate it. What makes auteurs so special? It is not the case that they, in their production of the strange, have any claim to particularly fine taste or even soul. In fact, their primary good fortune, often their only one, is that they want something, that they desire to tie great iron chains around some particular, ugly concept and drag it behind them wherever they go, clanking and scraping against the pavement, alerting everyone to their embarrassing presence. The machine has no such desire. It is capable of anything and interested in nothing. And the desire of what is uncommon, it turns out, is the only part of this whole system that struggles to be automated.

Bringing organ-scale cryopreservation into existence (Hunter Davis, Ep #6)

Abhishaike Mahajan — Mon, 24 Nov 2025 14:56:44 GMT

Sponsor note: the supporter of this video is rush.cloud. If you are at all involved with doing preclinical drug discovery and would benefit from computational tools, you should check out their platform + beautiful website here: rush.cloud.

If you’re at all interested in working together for future episodes, reach out!

Listen on Spotify/Apple Podcasts/Youtube:

Introduction

This is an interview with Hunter Davis, the CSO and co-founder (alongside Laura Deming) of Until Labs, which you may also know by its prior name, Cradle. They are a biotech startup devoted to organ-scale cryopreservation. They raised a $58M Series A back in September 2025, and are backed by Founders Fund (especially interesting!), Lux Ventures, and others.

In this interview, we mainly talk about the engineering and scientific difficulties in the cryopreservation field, including some background details on their September 2024 progress report on neural slice rewarming, how they characterize tissue damage in their attempts to do kidney cryopreservation, the potential economics of future cryopreservation protocols, and lots more.

One of the most interesting conversations I’ve had in a long time. If any of this work seems interesting, Until Labs is actively and aggressively hiring!

Enjoy!

Timestamps

[00:01:50] Introduction

[00:05:00] Why don’t we have reversible cryopreservation today?

[00:07:05] Why is freezing necessary at all for preservation?

[00:08:23] Let’s discuss cryoprotectant agents

[00:14:09] Until Lab’s 2024 progress report on neural tissue cryopreservation

[00:20:28] How do you measure cryopreserved tissue damage?

[00:22:34] Translation across species

[00:26:04] Why was the cryopreservation storage time so short in the progress report?

[00:30:47] Nuances of loading cryoprotectants into tissue

[00:37:03] Let’s discuss rewarming

[00:43:02] What scientific problems amongst vitrification and rewarming keep you up at night?

[00:45:58] Why are there so few cryoprotectants?

[00:48:11] How can you improve rewarming capabilities?

[00:53:03] What are the experimental costs of running cryopreservation studies?

[00:57:49] What happens to the cryoprotectants and iron oxide nanoparticles after the organ has been thawed?

[01:01:34] Cryopreservation and immune response

[01:03:25] How do you filter through the cryopreservation literature

[01:05:54] How much is molecular simulation used at Until Labs?

[01:10:04] What are the (expected) economics of Until Labs?

[01:14:49] How much does cryopreservation practically solve the organ shortage problem?

[01:17:04] Synergy between xenotransplantation and cryopreservation

[01:21:12] How much will the final cryopreservation protocol likely cost?

[01:21:58] Who ends up paying for this?

[01:23:28] What was it like to raise a Series A on such an unorthodox thesis?

[01:27:49] What are common misconceptions people have about cryopreservation?

[01:29:58] The beginnings of Until Labs

[01:34:07] What expertise is hardest to recruit for?

[01:39:27] What personality type do you most value when hiring?

[01:44:17] Why work in cryopreservation as opposed to anything else?

[01:46:26] Until Lab’s competitors

[01:49:30] What would an alternative universe version of Hunter worked on?

[01:51:33] What would you do with $100M?

Transcript

[00:01:50] Introduction

Abhi: Today I’ll be talking to Hunter Davis, the Chief Scientific Officer of Until Labs, previously known as Cradle, which is a startup working to build reversible cryopreservation for use in organ transport, and eventually medical hibernation. Prior to starting a company, Hunter received a PhD in physical chemistry from Caltech and got a postdoc in neuroscience at Harvard. Today we’ll be talking about the recent progress report that Until Labs put out, what cryopreservation problems keep Hunter up at night, and a lot more. Thank you for coming onto the podcast, Hunter.

Hunter: Thanks for having me.

Abhi: So just to set the stage, I’d love if you could give me a very general overview of what cryopreservation exactly is and the primary roadblocks to outright solving the problem.

Hunter: Sure. If we look back to the origins of using temperature as a knob, you can go all the way back to 1776. At this point it was observed that sperm under a microscope could be cooled down to hypothermic temperatures and then motility would decrease. And then when it was rewarmed, it would come back. And this led to some hypothesizing about what might be causing this. Fast forward to around 1950, and people discovered that not only could you cool things to hypothermic temperatures, you could add cryoprotectants into these molecules and you could cool them entirely. Go down to the point that you completely arrested molecular motion. Fast forward again to more contemporary processes. And what you’re trying to do is slow down molecular motion inside of cells in a variety of different contexts. So one that you might envision is for a hypothermic use case. You have a patient who needs a surgery. In many of these cases, the ischemic time is far too short to complete the surgery at 37 degrees.

One example of this is aortic arch surgery. This is a heart surgery that can’t be bypassed and by default what would happen is the brain would go through ischemia because you can’t actively perfuse oxygenated solution into the brain during the surgery. So what the surgeon will do is they’ll take the patient and they’ll cool them down to around 15 degrees Celsius, and then they can complete the heart surgery and then rewarm the patient. And it shows no long-term neurological side effects. Then you can look at something like an organ. Here we might want to slow down the metabolism of an organ during transport to be able to increase its viability window. Here we can take the organ and literally just put it on ice. And what this does is it just decreases the rate of all these metabolic processes that are happening inside of the organ or in the case of the patient, inside of the body. Just reduces all of those by reducing the temperature.

. You can think of it as between zero and minus 130 is a danger zone where ice can form; below minus 130 is safe perpetually.

So what does the process look like in practice? You could take an organ, you load it up with some cryoprotectant molecules that reduce the rate of ice formation during this danger zone. Then you rapidly cool your sample from zero to below minus 130, store it there, rewarm, and then unload the cryoprotectant from the organ or tissue. And then you can bring it back up to 37 degrees and biological function will resume.

[00:05:00] Why don’t we have reversible cryopreservation today?

Abhi: What, why don’t we have this today?

Hunter: Yeah. There’s a lot of challenges with scaling this up. We do have it today in really simple systems. One example of this would be cryopreservation of embryos for in vitro fertilization. Here you have something that’s very small. Normally they’re operating on embryos at this stage where it’s four to six cells. And here you can take the embryo, dip it into DMSO, dimethyl sulfoxide, and then quickly dip it into liquid nitrogen. This rapidly cools it through this danger zone and embryos can be stored for decades in this state and then rewarmed and implanted and using the contemporary methods, the viability of embryos that are going through this process is a little over 95%. So we have seen that the live birth rate that occurs after vitrified embryos has started to exceed those of fresh implantation. And the reason for this is the allowance for additional genetic testing while the embryo is cryopreserved.

So this does exist for really simple systems. There’s also been some proofs of concept of bringing it up to more complex systems, like a rat kidney. The John Bischof lab at the University of Minnesota has shown that you can take a full rodent kidney, vitrify it, bring it down below this minus 130 degree state, bring it up, reimplant it into the rodent, and then it’ll support life for this rat. The challenge is that as you scale up, as you try to go up to something that’s the size of a human kidney, all the thermal transport becomes much more complicated. If you try to imagine cooling something that’s the size of an embryo, I can go really fast. If I do something that’s the size of a human kidney, which is around 150 grams, that’s going to be much slower. So then in this competition between you getting down to the safe zone, minus 130, and the rate of continuous ice formation in that zero to minus 130 range, you just start to lose out because you can’t cool fast enough. And similarly, rewarming becomes a challenge.

[00:07:05] Why is freezing necessary at all for preservation?

Abhi: When you mentioned earlier about how you can’t just perfuse a transplant organ with oxygenated blood to prevent ischemia, intuitively, why is that the case? Why do you need to freeze to preserve the cellular state of the organ and why can’t you just continuously pump oxygenated blood through?

Hunter: Yes, that’s a great question. There are technologies that exist on the market right now that will perfuse hyper-oxygenated fluid. And they do this in either a cold perfusion context traditionally. There’s a bunch of these products that exist on the market that can increase the viability of the organs that will then be donated. But all of them still have time... organs still time out.

At some point.

Abhi: I guess what are the specific reactions?

Hunter: What are the mechanisms that can account for here?

So one thing is that now the organ is in isolation. So let’s think of a kidney in isolation. Now I don’t have a liver. So any toxins that need to be cleared by the liver, the circuit isn’t going to be able to take care of this. We don’t have dialysis for a liver, for example. So I think that part of this is that you have taken the organ out of the context of being in a multi-organ system that is responsible for clearing a bunch of the toxins that are generated from the metabolism of any one organ outside of the concert of the other.

[00:08:23] Let’s discuss cryoprotectant agents

Abhi: That makes sense. And moving a little bit onto the cryopreservation step. First you have this vitrification step, second you have the rewarming step. Focusing on the vitrification part. You mentioned that you physically stop the water molecules from interlocking with one another and one way you can do this is by freezing really fast, such that they don’t have time to lock into one another. Another way is adding in cryoprotectant agents, which prevent the water molecules from doing that. What are cryoprotectants doing beyond that very simple mental model?

Hunter: Yeah. Yeah. That’s great. I think there’s a couple things to keep in mind here. Most cryoprotectants that you’re going to add are not going to completely prevent the formation of ice in perpetuity. You can think of it as reducing a reaction rate. If you think of the liquid to solid water as a first order rate equation. And this is the there’s this thing called classical nucleation theory. It predicts the rate of conversion from liquid water to solid ice.

Cryoprotectants increase the activation energy of that liquid to solid transition. There’s still a very molecular question of how do they do that?

There’s a couple of different mechanisms of action here. So one that you can look at is direct hydrogen bonding of water molecules. You can imagine the rotational tumbling that’s required for water molecules to align into hexagonal ice. If I can just slow that down by hydrogen bonding to the water, then I’ll reduce the rate of that alignment and buy myself more time to be able to cool down below minus 130. Other things is just generally increasing the viscosity of the solution. If you look at the Stokes-Einstein model for diffusion in a fluid, it’s obviously inversely proportional to the viscosity of the fluid. So these cryoprotectant molecules tend to directly hydrogen bond water and be viscous.

There’s also some interesting alternative mechanisms of action that you can explore. Looking to nature, for example, there are these things called antifreeze proteins. So instead of directly hydrogen bonding to liquid water, what they do is they preferentially bind to solid-phase water. So they allow some ice nuclei to form, and then by cooperatively hydrogen bonding, these macromolecules will stick to the ice surface and prevent it from extending. So there’s a few different mechanisms of action, and you could think to exploit all of them together in a cocktail.

Abhi: And I imagine the type of cryoprotectant agent that Until is most concerned with is the former category of directly binding to hydrogen and you don’t care too much about the antifreeze proteins.

Hunter: Yeah, I think that we’re open to exploring all of these mechanisms of action. I think in the end, when you want to do vitrification, the thing that’s going to have to do the dominant work is going to be these colligative agents. These are the small molecules that are actually directly interacting with liquid water. If you think about the antifreeze proteins that nature uses, most of the time, always, these organisms don’t care about surviving down to minus 130. What they care about is surviving in equilibrium at minus five degrees. These are very different processes. To try to prevent the extension of ice in a supercooled state, just below zero, is just a fundamentally different problem than trying to survive all the way through down to minus 130 and back. So yeah, the primary focus is on these colligative agents.

Abhi: When you look at the cryopreservation literature, has there been any evidence to suggest that using antifreeze proteins and being okay with just mildly arrested, or mild cryopreservation, is good enough in some cases or there hasn’t been too much work in that direction?

Hunter: Yeah, so not specific to antifreeze proteins though. I think that there are some applications here where people have been trying to use either things inspired by antifreeze proteins is mostly it. But there is a whole stack of products that will hit the market that are these supercooled organ solutions. I think of this as an extension of hypothermia where you can go instead of to four, now you can go to minus four. And there’s a few different tricks that people have used here. All of them have their trade-offs. But I would imagine that similar to how Until will be bringing a vitrification product to market, I would imagine there will be some near-subzero storage products as well that will have their own trade-offs. And in the end, I think the thing that’s going to matter for what wins out and becomes the dominant way of transporting organs is going to be the thing that gives the patients best outcomes.

Abhi: Yeah, that makes sense. Amongst the cryoprotectant agents that you guys are actively developing, how much work goes into improving those agents versus improving the next step, which is rewarming?

Hunter: Yeah. So we view this as a very multifaceted problem, and I think that one of the things that originally fascinated me about it is that it’s one of the few scientific problems that I’ve seen before that brings together things from applied physics, chemistry, transplant, biology, all the way to hardcore electrical engineering and power electronics. And we work on all of these things simultaneously. I view them as relaxing each other’s constraints. If you have devices that are good at rewarming quickly, then my molecular agent doesn’t need to be quite as performative and vice versa, right? If I have the killer molecular agent, then maybe I need a very facile, easy device. In the end I think these constraints are pretty hardened, so it’s going to be some combination of these solutions in the middle. But yeah, we work pretty intensively on both improving the cryoprotectant agents and on improving the devices that do the cooling and rewarming.

[00:14:09] Until Lab’s 2024 progress report on neural tissue cryopreservation

Abhi: Okay. I’m going to wrap back to these questions about cryoprotectants and rewarming in a bit. But what I really wanted to talk about, what actually started this conversation to begin with, was you guys released a progress report in September 2024, alongside your series A announcement, that described how you recovered electrical activity from a slice of mice cerebellum, I think, that were frozen, rewarmed, and you observed some level of electrical activity in the neurons there. This is obviously incredibly impressive work. But you did mention in the paper, in the progress report, about how there is more to neural functionality than simply recovering neural activity. What else is there?

Hunter: Yeah, so I think that, first of all, I think it’s maybe worthwhile to go into why we did that as our first POC. When Lauren and I met and we were talking about the idea of reversing, reversibly pausing biological function eventually for a whole organism, we had the initial conversation of what are the falsifiable hypotheses here? What are the experiments that we could do that would prove that this will not work? I think one of the first ones that we wanted to do was look at these very delicate pieces of tissue, being neural tissue. And so the easiest thing that we could come up with was doing these cerebellar slices. It’d be very easy to load these diffusely. We didn’t need to do any perfusion. You could cool and rewarm them very quickly. And the cerebellum is known for having very periodic firing. So you would be able to know if you got, at least on the single cell level, you would be able to tell if you were getting any action potentials that made sense. And so we performed that experiment and like you were saying, we saw the recovery of some electrical activity of action potentials in the slice.

But yeah, there’s the question of what comes next? Actually, a piece of very impressive work that developed along this axis came out of Alex German’s lab in Germany this year. And he was able to show using a very similar protocol that you can recover long-term potentiation from these acute slices. So this is... you take one canonical example would be like take the Schaffer collateral in the hippocampus. Here there’s a bundle of axons that are all synapsed in a very similar location in the hippocampus, and you look for potentiation of those synapses. This is like a bit flip for memory. And what he was able to show was that you can recover LTP in these slices. So this is all really interesting and I think it’s useful as a micro-circuit level problem, but if you were to talk about what does it take to preserve full neural function? This is a much more complex question than the micro-circuit inside of a slice. And eventually you have to go to the brain as a whole and to the organism as a whole. But this is a very deep challenge that I think will take quite a bit of time to get to the point that you can make traction against it, is my presumption.

Abhi: Yeah. One pretty shocking thing I found after we had our initial conversation a while back, is that the ultimate goal, or at least the short-term goal, of Until Labs right now is not whole brain preservation. Right now it is, I think you said kidney preservation. Why switch to kidney? When I read the report, I thought, okay, this is one step to whole brain cryopreservation.

Hunter: Yep.

Abhi: Why move to a different organ?

Hunter: Yeah. So I think the other thing that we have been interested in as the moonshot from the beginning is reversible cryopreservation of an entire patient. And this would involve allowing someone to pause their biological time as a whole. There are some really useful things about going through the process of doing this on an organ-by-organ basis. First one, this allows us to get therapies to patients in need in a very immediate way. You can deliver care to people who need it and use cryopreservation to do it. There’s also a natural scientific roadmap that’s built out of this, where you can start to learn technologies on isolated test beds where you can learn about how to preserve kidney, heart, lung, these kinds of things. And we view this as a natural foundational platform on which we can build towards our eventual goal of being able to do hibernation of an entire person.

Abhi: Why choose kidney over anything else?

Hunter: Yeah. So we’re actually pretty organ-agnostic.

Abhi: Okay.

Hunter: I think that kidney is the one that’s commonly talked about and I’m happy to discuss nephrology as a particular application. But I think one thing that’s nice about vitrification is because it leverages physics instead of going after very specific biomolecular pathways, it’s somewhat organ-agnostic. Okay. Which is, I think, exceptional compared to some other chemical strategies for preservation.

Abhi: Yeah. Returning back to the progress report, just because I imagine a lot of the questions, at least external people may have about cryopreservation, are probably answered or at least raised by the report. In the report, you did cryopreservation across four mouse samples and found, I think the primary results were over one mouse, but I think there were discussions over the other three as well. But I did want to ask how much heterogeneity was there in your success in being able to recover electrical activity from all four rats?

Hunter: Yeah. We were able to see electrical activity recovered from all four rats. There was a large degree of heterogeneity in the amount of electrical activity that we recovered. The traces that were placed into the report, I think are representative of the group. But particularly in that iteration of the device, I think we were very early and I definitely think that our QC was not as dialed. So there was quite a bit of heterogeneity in things like the cooling rates and the rewarming rates that were coming out of this little cartridge that we had.

Abhi: Why? Would you imagine the primary axis of variability... how much of it is just an experimental batch effect versus some rats are perhaps better at being cooled than others?

Hunter: I wish that I could tell you that it was down to the rats. I’m pretty sure at this point, at least particularly at the point that we were filing the report, I would chalk it up to experimental variability.

[00:20:28] How do you measure cryopreserved tissue damage?

Abhi: Okay. That makes sense. Returning back to the kidney cooling and rewarming moonshot, what is your metric for... you freeze a whole kidney, you rewarm it back up. How do you tell whether this kidney is good?

Hunter: That’s a great question and this is something that actually I didn’t appreciate until we started working on it, which is that there’s this whole literature and an entire field of organ evaluation in isolation, outside of a body. These are normothermic machine perfusion. So this is for the electronics people. This is like a test bench for your organ. You can hook up fluidic circuits into, if you imagine a kidney has one inlet and two outlets.

It’s got its arterial inlet and it’s got a ureter and a venous outlet. I can press certain fluids into the arterial inlet and then measure the fluids that are coming out of both the ureter and of the venous effluent side.

This allows you to measure things like what is the uptake of glucose in the organ. You can measure things like what is the lactate concentration in the stuff that is coming out of the venous side. So there’s a whole host of these biomarkers that have been established by the transplant community and I think that one marker of good cryopreservation work, as you guys are looking through the literature, is how well do they reproduce the metrics that have been established and known to work as predictors of transplant outcomes, which there’s a whole literature on this.

Abhi: How... are these metrics for kidney damages established by the transplant community? Are they fully dialed in? This is as good as it gets. Or there’s still room to improve there.

Hunter: I would argue that there’s still room to improve. It’s still an active area of research, improving the correlation of these NMP assays to transplant outcomes. I think it’s established that it is possible to correlate these things. I think that no one would claim that we are done

Abhi: Okay.

Hunter: With how to draw the best conclusions.

[00:22:34] Translation across species

Abhi: And right now are you still doing mice kidneys, or are you moved on to other species?

Hunter: I think you need to move up to pig as the standard preclinical model, and it’s for the reason of the size. So many of these constraints have to do with matching the size of the organs that you’re interested in.

Abhi: At what point... maybe this is an unanswered question in the transplant field, but how much do what you learn from pig kidneys transfer to human kidneys?

Hunter: Yeah, so this is a classic question of the translatability of any of these assays. And I think that particularly for something like vitrification, we’re going to have to see and we’re going to have to be intelligent with our trial design. I think that there are ways to access things like human organs that would otherwise be discarded to check for some of these questions. But this is a standard translatability question and particularly because we’re going to be pressing vitrification through for the first time, I think that we’re going to get to learn the answer to that question.

Abhi: At what point in time... are you still right now operating on... I’ll just ask the question. Slices of kidneys or are you doing the full kidney at any given run?

Hunter: I think you need to be doing the full kidney.

Abhi: Gotcha.

Hunter: And the reason for this... so there’s ways to chop this up and do these things on tissues, but in the end, vasculature is a really critical part of this process. I think this is probably something that I should have gone through in more detail as I was describing the original protocol, but the only reason this is possible is because of vasculature.

You need to be able to do mass transport of this cryoprotectant deep into tissue. The heat diffusion that I was complaining about previously because this kidney is 150 grams, the same logic would apply for loading the cryoprotectant if you weren’t to use vasculature. If I were to try to load it from the outside, it would take forever. But we are perfusing through the vasculature. So I think a lot of the development that needs doing is on how do you efficiently transport cryoprotectant agents into a kidney or another organ.

Abhi: How much of the... where do most of the ideas in the room usually come from? Are they from the nephrologists? Are they from the electrical engineers? Are they from physics people? What plays into pushing forward the kidney preservation goal?

Hunter: Yeah. The only reason that we’re able to make any progress is that the answer is all of them. Okay. I could give you... there are examples of times where I think really great ideas have come from unexpected places. So for example, Andrew Ted, who’s our head of applied physics was previously running battery materials research at Tesla.

Abhi: Interesting.

Hunter: And has a very interesting perspective on material discovery, which is critical to what we’re working on. Yeah. And then there’s stuff that comes from Gerald Brander who just joined as our Chief Medical Officer who has decades of experience in transplant. And he has obviously his own lens on the way that these things need doing inside of organs. So I think that the magic that I have seen always happens when these people get in a room and they have conversations and they can relax each other’s constraints. You have this specter of another field where it’s a black box and it’s oh, this seems quite challenging or hard because you have some perhaps overestimation of the level of constraint that collaborator has.

We just get these people in a room and they have a conversation. I think that’s where you can unlock a lot of upgrades to protocol.

[00:26:04] Why was the cryopreservation storage time so short in the progress report?

Abhi: How much... returning back to the report? You said you kept the neural tissue at negative 196 degrees centigrade for about a minute. I imagine you’re going to do the exact same for kidneys as well. Why not extend that out to a day, a month, a year? Why choose a minute? Was it just you don’t expect anything to change after that point or...

Hunter: Yeah, so that’s a great question. For the tissue slices, it was a very simple thing of just, this is the way that we built the device. I literally cannot express to you how quickly this device was thrown together to be able to do this report. And it took a lot of optimization on the protocol side. But literally we had this idea for this device and then threw it together mostly with thinking towards screening, not towards, Hey, we need to go get this milestone. And be able to report it out. Mostly to try to screen cryoprotectants. But for the kidney, I imagine that we’ll store it for longer. There’s a natural question that I think is at the core of what you’re getting at though, which is why does it matter or not matter?

Abhi: Yeah.

Hunter: And I would contend that it actually doesn’t matter.

Abhi: Okay.

Hunter: You can store it down there to be able to prove out the theory. But the reality is, because water is 13 log units more viscous down in this vitrified state than it is at room temperature. There’s a massive time dilation. Think seconds at room temp to millions of years in vitrified state for equivalent diffusion distances. So for any reasonable timetable, think days, months, years, doesn’t really matter because nothing’s going to have moved in this glass state.

Abhi: Yeah. One instinctive thought I would have is the reason you may care about a day or a month is... when you’re freezing something, you remove all of the blood from it and replace it with cryoprotectant. At least that’s my mental model of it. The reason you may care about testing it out for a day is what if you left a little bit of blood in there? And that blood will continue to gather, accrue damage over time. Is that at all...

Hunter: I don’t think so. So actually if you got... so let’s say I left some random red blood cells inside of the vasculature. It’s what’s going to happen? first of all, the water inside of those red blood cells is still going to exchange. Sure. So they would still equilibrate with all the cryoprotectant that you’re perfusing in there. And also, let’s say that maybe there’s a water pocket inside of this. if I got any ice on the way down and let’s say I induced damage as a result of that ...that damage is now there.

Abhi: Okay.

Hunter: it’s... there’s no period of time. Because even if I have ice that is nucleated it’s still not extending below minus 130. It really is the case that all the damage is getting done in this minus 130 to zero range.

That definitely doesn’t mean that you’re okay. if you have nucleated ice on the way down, there’s actually an annoying/interesting asymmetry in the physics here where you tend to form more nuclei as you’re cooling and then you tend to extend them while you’re warming. And that extension is really what kills the tissue because that’s where you start to tear through quite a bit of cells. yeah. And tear things up.

Abhi: One question I initially had when we first had this conversation was, let’s say centuries from now, we’re all cryo-frozen on a ship going through the stars. How much do we need to worry about background radiation of the universe still affecting our genomes?

Hunter: Yeah. I can’t prevent the DNA from getting nicked from radiation as a result of it being vitrified. But I think that at the time that we have all of the technology available for vitrification and deep space exploration, I would sincerely hope that we could figure out how to line the ship with lead or something.

Abhi: Yeah. Yeah. Yeah. That makes sense. Returning back to the question of metrics of damage you’re looking for when you are cryopreserving an organ. For the brain, it’s neural activity, it’s LTP. For kidneys, it’s this platform you just talked about. Do you think you’ll have to custom make this for every single organ or pretty much every organ in the human body has some sort of well-established protocol as to how you assess damage?

Hunter: I think that at least the donor organs, Okay. which are the first ones that we’ll be interested in, Yeah, I think there’s pretty canonical ways of evaluating their function. I think the things that we have to build out bespoke are things that are not so much for proving out a milestone of, is this thing healthy or not? There we can always reference back to the transplant community.

Abhi: Yeah.

[00:30:47] Nuances of loading cryoprotectants into tissue

Hunter: I think the thing where we’re both working on our own and also looking to the cryobiology community as well, is how do you assess things that are very specific to the cryo process? There are damage mechanisms that you want to be looking for and optimizing against that are specific to vitrification and cryoprotectant loading.

Abhi: What’s an example?

Hunter: So an example here might be figuring out the time constant that you can load the cryoprotectant in. So obviously you would like to get the cryoprotectant in there very quickly.

You don’t want to put it in too quickly.

Abhi: Why not too quickly?

Hunter: Yeah. That’s a great question. So I want you to, let’s do a liberally simple thought experiment first. Okay. So I have a cell in a test tube, by itself, just a cell. And to start it is in a solution that is isotonic with the interior of the cell. So water is exchanging into the cell just as quickly as it’s exchanging out of the cell. A key part about cell membranes is they’re really good at exchanging water. So there’s a protein that is in the membrane called aquaporin. And its entire job is being just a water-specific transporter. Just water freely diffuses through it in and out. So now what happens is, if I take, let’s say half the water out of the extracellular space, and I replace it with cryoprotectants.

Even very permeable cryoprotectants are still going to be, 100x slower at diffusing across the cell membrane compared to water. So what invariably happens is initially you have an influx of water... that water runs out of the cell. And then cryoprotectant can slowly get into the cell.

So the cell immediately shrinks initially and then starts to re-expand. if I shrink too much, then I die of osmotic shock. And similarly on unloading, and unloading actually is an even stiffer constraint. If it’s loaded with cryoprotectant and then I add a bunch of water on the extracellular space, then the cell inflates like a balloon and then pops. So there’s that constraint on not loading or unloading too quickly. You can imagine extending this very simple thought experiment of a single cell up to that first layer of the vasculature, that first endothelial layer of the vasculature. It’s seeing that maximum osmotic shock as you’re trying to increase the cryoprotectant. So when you’re loading this, you don’t just go from zero, isotonic solution, to full-blown cryoprotection. The protocol looks like a ramp. Gotcha. Where you’re linearly increasing the cryoprotectant as a function of time.

Abhi: Is there established theory as to what the slope of this line should look like or is it empirically determined?

Hunter: Yeah, so you can calculate what it should be. The thing that is challenging is that you don’t necessarily know all the transport metrics that you need to be able to establish. But I think it’s still useful to go through the first principles of how you would think about this. And it’s a balancing between the relative permeability of the cryoprotectant compared to the water. And then you also need some transport model of how is this cryoprotectant getting through the vasculature and perfusing along the flow path.

If you have these together, then you can establish a rough heuristic of what it should look like. Things that make this really challenging are that, yeah, not having the permeability coefficients for all the cryoprotectants in formulation for the cell types that it’s going to be seeing. That really ends up mattering. The other thing is that for some of these cryoprotectants, the permeability is actually a function of the cryoprotectant concentration, and then it’s just incredibly hard to model. So I think transport... one of the interesting biophysics questions that’s left in cryopreservation is good transport models for the cryoprotectant into the tissue.

Abhi: How in practice, how much do you use models versus just empirically determine that?

Hunter: Yeah, so I think that you want to use both, definitely don’t want to just write it down on pen and paper. Definitely the biologists in the company would prefer that I used a whiteboard marker less than I do. But I think that oftentimes you can make these things sing together. So for example, we really prioritize making simple assays, not just for how well are we doing, but also to try to establish these metrics, try to build models because you can’t proceed with, for example, how do I know what slope to load the organ? We’re pretty committed to having a first principles, let’s say hypothesis that’s built off of some simple cellular model. Some simple reduced experiment that can be done at high throughput. And then we make a best guess and adjust from there. But it always requires adjustment in the context of an organ because it’s so complex.

Abhi: How much do different cell types matter? Are all human cells, let’s say, all happy with the exact same slope or are some cell types especially sensitive to osmotic pressure and they will almost always pop if you...

Hunter: Yeah. Yeah. I think so there’s... talking like a physicist, I would say, to first order, you can consider them as all the same. And then there’s corrections. And one way of thinking about this is it is the case that aquaporin is the dominant way that water gets into and out of all cells.

So that story that I told you, that’s true across all the cells.

Abhi: Yeah.

Hunter: But the specific permeability of a given cryoprotectant is going to be different for different cell types because, for example, they use things like... they tend to hijack things like urea transporters to get into the cell. Obviously the density of urea transporters is going to be different in different cell types. So the relative permeability of these cryoprotectants in different cell types is going to be different.

Abhi: Yeah. How much do... if you move beyond organs and you start to consider, I’m trying to think of something that isn’t an organ... like blood vessels. For example. There probably are blood vessel transplants.

Hunter: Yeah.

Abhi: Transplantation. But it doesn’t sound super trivial to just perfuse blood through and see what pops out the other side. It’s more of a morphology question. Are there good metrics there? And if not, do you have to generally create your own?

Hunter: Yeah. So fortunately, again, if you want to do... so vessel cryopreservation is its own field.

Abhi: Oh, okay.

Hunter: It exists. Yeah. And there are things like H&E staining is really useful. This is also a place where you can lean on the medical field and look at established protocols for histology. And look to what is the inter-lumen supposed to look like? There’s a bunch of these structural assays that you have to lean on. Because you’re right. You can’t do the same thing of asking the question of, is the organ functional? Yeah. ‘Cause it’s just a pipe, so it’s not going to give you any information.

[00:37:03] Let’s discuss rewarming

Abhi: Yeah. We’ve spent a long time talking about the freezing process itself. Moving on to the rewarming side. What does that actually look like in practice?

Hunter: Yeah. So I think that, first let’s start with why do you need to rewarm quickly? We already talked about this, but we want to outcompete the formation of ice on the way up. And this is particularly challenging because ice nuclei that have been formed on the way down but are very small, are going to extend really quickly during the rewarming. So this placed a huge pressure on the field to create methods that would homogeneously and rapidly rewarm the tissue. So you can envision a few different ways that you could try to rewarm something that’s the size of a kidney. Maybe the most naive one is I just take it and I’ll just put it in warm solution. Here again, we have the problem of a heat diffusion equation. Everything tries to diffuse in from the outside. It’s going to be way too slow. Maybe you’d be able to warm the surface really quickly, but the core is going to stay cold. Similarly, maybe you could think of, okay, throw it in a microwave. But there again, you get cold and hot spots. If you’ve ever tried to rewarm something that’s very extended in your microwave, the surface gets warm, but the core does not.

So one of the innovations that came out of the Bischof lab at the University of Minnesota was they figured out that you can use biocompatible iron oxide nanoparticles that are perfused into the vasculature of the kidney. Fill it with metal and then put the entire kidney into an alternating magnetic field. And what this does is it rewarms the kidney, somewhat homogeneously, much like an induction heater in your kitchen. You’re flipping the magnetic dipoles back and forth. That generates some heat that tends to heat the organ relatively homogeneously. As it would happen, this was actually the topic of the second half of my doctorate. Not for the applications of organ rewarming, specifically. I was studying it in the context of cancer hyperthermia. It was the application I was looking at. But it’s a full circle moment for me that we’ve circled back to this. So this is one thing you could think to use is alternating magnetic field heating. This is one thing that we’re studying at Until. We’re also studying some other methods for volumetric rewarming. But this is the canonical one that the field is settling in on, is this idea of using alternating magnetic fields in combination with magnetic nanoparticles.

Abhi: Just to set the order of operations. The cryoprotectant agent and iron oxide nanoparticles are both given at the same time.

Hunter: So yes, you think of it as if we have this ramp of cryoprotectant that has to go into the organ, imagine the last part of that ramp, you also dope in the iron oxide nanoparticles into your solution. So it’s a colloidal suspension of iron oxide nanoparticles inside of cryoprotectant.

Abhi: Is there no interaction between the iron oxide and how the cryoprotectant actually works?

Hunter: That’s a great question. Iron oxide nanoparticles can be ice nucleating agents if they’re not coated properly. So the surface chemistry here becomes really interesting ‘cause you need things that are colloidally stable in cryoprotectants, which tend to be very high salt and kind of messy. And then you... I should say, high concentration of non-water substances. And you also need something that’s not going to nucleate ice. You don’t want to create surfaces that tend to allow for this nucleation process to occur.

Abhi: That makes sense.

You mentioned earlier, and this is something I hadn’t naively thought of, was as you freeze, I can vaguely understand water molecules interlocking with one another to create these ice crystals. How come during rewarming, there’s also a chance for nucleation to happen? It just feels...

Hunter: Yeah, it’s deeply counterintuitive, right? Yeah. But you get more ice that forms on rewarming. We can maybe walk through why.

Abhi: Sure.

Hunter: Basically between zero and minus 130, the chemical potential energy of ice, solid water, is lower than liquid water. This is true in that temperature range, whether you’re warming through that range or cooling through it. But there’s something that’s this asymmetry that we’ve been getting at. I guess we can go into a little more detail. What ends up happening is as you’re cooling, the temperature at which you get maximum nucleation, that’s the formation of these tiny nuclei, is actually colder than the maximum rate of extension of nuclei.

So it ends up happening is on the way down, you go through the temperature zone of maximum extension, then you get to the temperature of maximum nucleation.

So you go through extension, but there’s nothing to extend. And then you nucleate. On the way up, I nucleate a little bit more, and then I extend everything that I’ve nucleated on the way down and on the way up. So there’s this constraint where because you go through on the way up the nucleation and then this maximum extension phase, which is relatively warmer, you produce the majority of the volume of ice on rewarming, assuming you do symmetric cooling and warming rates.

Abhi: Okay. So if you have no ice crystals in your solution and you rewarm, there’s no chance for further crystals to form.

Hunter: Oh no, you can. Sorry. Just to clarify, the, if you think of it as the rate that you’re warming or cooling is separate from the question of just is the equilibrium of water, ice, or liquid at a given temperature, at a given absolute temperature. Not at a dT/dt, not at a change. Just think about in terms of the physics here, think of it as if there’s a water molecule that is at minus 80 degrees, it doesn’t know or care if you are currently cooling it or warming it.

So it is equally likely to do nucleation or do extension during the cooling or rewarming phase. You can think of just running a path integral over this entire curve to get the amount of ice that is formed.

[00:43:01] What scientific problems amongst vitrification and rewarming keep you up at night?

Abhi: Combining these two areas of... it seems two very difficult problems to solve. Building very good cryoprotectant agents and also building very good ways to rewarm a frozen tissue or organ. Which of those two axes do you consider hardest, easiest, or what problems in both keep you up at night?

Hunter: Yeah, both keep me up at night. I think that one thing that is nice about the rewarming system is we can bring to bear some pretty strong power electronics expertise that we have brought into the company to be able to build out a few different technologies there. And I think that we’re pretty committed to building out a wide technology platform. So we look at a few different mechanisms of action here. On the molecular side, I think there’s some give and take here because the reality is that we’ve been using... we, the field, have been using the same cryoprotectants... same few dozen cryoprotectants have been used since the 1950s. Like glycerol and then DMSO were quickly discovered. A lot of people still use glycerol and DMSO.

Abhi: Yeah.

Hunter: There’s obviously a very large chemical space that you could search for cryoprotectants.

And I think that you can find things that work better, but the question is how much better and how much less toxic can you get? And so because these things relax each other’s constraints, I don’t think of one as necessarily being harder than the other because if I were able to get traction on it and make progress on it, then I would just demand more of it.

Abhi: Sure.

Hunter: Yeah. And that actually builds a really cool team environment. Practically it builds a very cool team environment because there is no... for our applied physicists who are thinking about these molecular interactions, there is no “we’re done.” It just... you just keep chasing

Abhi: Yeah.

Hunter: continuously better versions of this. For the people who work on our cooling systems, there is no “we’re done” with improving the homogeneity and rate of cooling. I think this is true of the field writ large. This is why I think this is a relatively... yeah. This is why this is a relatively interdisciplinary, multidisciplinary field as a whole, is that everybody can find their angle through which their expertise can help with cryobiology.

Abhi: Yeah. With the development of better cryoprotectants. I think the one that you used in the September 2024 report was a well-established one with one particular ingredient removed. Yep. I’m curious about the rationale. It was just... you just wanted to get the report out of the way and there wasn’t that much work on fiddling around with the cryoprotectants. Yep. yeah.

Hunter: Yeah. So the report actually came out before we had a molecular development team.

Abhi: Okay. Okay.

Hunter: So we now have a molecular development team that looks at novel cryoprotectants. So we were working off of a sheet of stuff that had been previously used.

[00:45:58] Why are there so few cryoprotectants?

Abhi: When you say that only a handful of cryoprotectant agents have been developed since the 1950s, is that because finding better ones has been too difficult or more because it’s been pretty high-hanging fruit that people haven’t really been attracted to solving?

Hunter: You know, it is is a little bit complicated for me to understand the relative social milieu that has driven this, because I think that there actually has been quite good work on screening new cryoprotectant agents. I’ve been thinking about people like the Toner lab, the Higgins lab. There are labs that have been really focused on how to screen better cryoprotectant agents.

Abhi: Yeah.

Hunter: And yet everybody’s still using ethylene glycol, propylene glycol, formamide, DMSO. These are... there are various combinations of these. And so you’ll see like VS3, VMP, all these things thrown around in the literature. But fundamentally we’re just mixing around a few different agents and occasionally there’ll be a new one that will come in or a new additive that will come in. Some of them are quite toxic. But yeah, it is surprising to me because there has been really quality academic work on screening these things.

And yet people are still using the same stuff. So it’s an interesting contradiction.

Abhi: Is it potentially because there’s no... there’s not that many groups focused on actively translating this to a full actual organ?

Hunter: I think it’s a big lift to do this in whole organs. There are examples of people doing the full stack. So there’s great work coming out of a few different labs to try to scale this up. I think that John Bischof’s lab at UMN would be a good example of this. The Toner and Tessier labs at Harvard would be another good example.

Abhi: On the flip side of how do you improve the rewarming process? Is iron oxide like... will we ever get better than that?

Hunter: Yeah. I mean there are examples of... or is iron oxide it, meaning is magnetic warming it or...

[00:48:11] How can you improve rewarming capabilities?

Abhi: yeah. Is magnetic warming it, even where... even the agent as to how you do the magnetic warming, is that also as optimized as it could be?

Hunter: No, I think that you can continue to make better versions of this. And the... John Bischof’s group has continued to do work on improving the core compositions. We have done our own work on improving core composition materials. There’s a whole field obviously that pre-exists for how to do really good coupling of magnetic fields with nanoparticles. This is not specific to the problem of cryobiology. I think honestly, as with many questions inside of cryobiology, one need only look outside of the field, and this is probably true of any discipline in science, look outside of the narrow aperture and you will find someone else who has solved a similar problem. This is a good example of that.

Abhi: I guess even outside of magnetic induction, is there any other upcoming or perhaps already well-established approach that you view as particularly promising?

Hunter: Something that I was interested in that came out recently was the use of electric fields and microwave... actually very tens of megahertz frequency electric fields for rewarming a rabbit kidney. This was Greg Fahy was the person who led this study, and I just don’t see how this scales up.

Abhi: Okay.

Hunter: I think that maybe one day it will be doable, but it is quite challenging to think of how do you scale this from something that’s rabbit-sized up to something the scale of a human organ?

Abhi: Is it just because you mentioned earlier about how microwaving has the chance to introduce such large thermal gradients or something else?

Hunter: It... there’s there’s a bunch of... and some of the physics here is a little bit subtle, but yeah, there’s a tendency for hotspots to form inside of the organ when you’re using electric fields. Which couple... The way of thinking about this very intuitively is in one case I’m trying to directly couple to water, the thing that is everywhere inside of the organ. And so by definition, my energy is going to get attenuated just a little bit as I’m trying to go through. Whereas for the iron oxide, this is an exogenously introduced material that has a very high coupling coefficient that is very localized to the particle itself. So that... there’s some arguments for increasing the penetration depth using that.

Abhi: For the magnetic stuff, what are the waves actually interacting with? If you replace all the water...

Hunter: What are the waves interacting with? With the magnetic fields? Yeah. Yeah. So they’re interacting with the nanoparticles.

Abhi: Oh, okay.

Okay. They’re still... they’re still using the iron oxide or something...

Hunter: In the... sorry. In the case of any magnetic field stimulation, you’re going to have to use iron oxide nanoparticles.

Abhi: Okay. Okay. Yeah.

Hunter: For electric field stimulation, you can directly excite water molecules themselves. Gotcha.

Abhi: Okay. One thing that has been you’ve hinted at this, or perhaps explicitly mentioned at one point. All this inherently depends on the vascular system. Distributing everything nicely and equally. How does this work for tissues that don’t perhaps, or are not perhaps as heavily vascularized? I think even within an organ, I’m sure not every single little bit of it is... there’s potentially fascia that does not have that much vasculature attached to it. How much work goes into thinking about that? Is that kind of a long-term thing? Like we don’t need to worry about that right now? Or that’s on your mind?

Hunter: Yeah, I mean it is on the mind insofar as we are continuously thinking about the entire scientific program and not just what is right in front of us, but it is not right in front of us. In fact, yeah. There are areas of the kidney that are less vascularized than others, but all of them are sufficiently vascularized that you can, with some intelligent protocol development, get the cryoprotectant where it needs to go. And get the heating power where it needs to go. Yeah. But yeah, you’re right that the vascular density even inside of a given organ is not homogeneous.

Abhi: How are... you’ve mentioned a few metrics you use thus far for measuring function of the organ post-warming. Are there metrics to establish how well the perfusion process works in the first place?

Hunter: Yeah, so there’s a few different ways of doing this. So you can do... there’s a variety. One is I can just look at when I cool the entire thing down and vitrify it. This is a midpoint assay. You can actually use micro-CT to figure out, are there ice nuclei somewhere inside of this organ.

Abhi: That’s interesting.

Hunter: Because you can look for the crystals themselves. There’s also some crazy applications of MRI where you can look for chemical exchange with your cryoprotectant agent inside of the MRI. And that can give you an idea of local concentration of a given molecule. Actually, Alex German, the guy that I referenced previously, was a radiologist first before becoming a cryobiologist. He published a really cool paper on using something called CEST imaging to look for the concentration of given cryoprotectant species deep in extended tissue.

[00:53:03] What are the experimental costs of running cryopreservation studies?

Abhi: Okay. And one question I had while reading the September 2024 report was this... this experiment was done on four mice, which is a relatively low N. Yeah. And that made me think, how expensive was this whole process? How many shots on goal do you have when you’re running Until Labs? Do you do... you guys all sit in the room, you think very hard and then you run a million-dollar experiment? Or is it more there’s a bunch of intermediate assays you can do to get sanity checks?

Hunter: Yeah. Yeah. This is great. So definitely the mice are less expensive than the larger preclinical models for sure. But the way I like to think about this is as a funnel. And at the top of the funnel you have in silico models. You want to do as well as you can with in silico models because they are very cheap to run.

Abhi: Sure.

Hunter: Compared to everything that’s going to come downstream in this pipeline. Then below the in silico models, you have things that you can do in a test tube or things that don’t require any interaction with biology. A canonical example of this would be things like differential scanning calorimetry. So here you can take a cryoprotectant that is in water, you can put it in a tiny little sample and cool it down to minus 130 degrees. And you can look for an exothermic peak. This is heat flow out of your sample during the cooling or during the rewarming process. If you see this exothermic peak, that’s indicative of ice formation. The liquid-to-solid transition in that temperature range is exothermic. So you’ll be able to tell, oh, I got ice in this little solution, and the solution still hasn’t seen cells yet.

You can then imagine screening many of these such compounds in cultured cells.

Abhi: Yeah.

Hunter: And that gives you some idea of toxicity. And then from there, I think that this is where a lot of our work is done, it’s what is the gap? How do you bridge between doing cells and going all the way to an organ? Because setting up organ experiments, like you were saying, is both expensive in terms of capital, but mostly it’s that they’re very time-expensive. These are very long protocols. And I’m incredibly proud of our team for their ability to work through some of these arduous protocols, but you don’t want to waste that.

Abhi: Yeah.

Hunter: So I think that there’s a lot of focus on how can we move stuff up that funnel to more translational assays.

Abhi: To give some sense of how expensive this protocol is from a time perspective. What did it take to get these rat cerebellums frozen and rewarmed?

Hunter: Yeah. So the rat stuff, the rat acute slices, that’s not so bad.

Abhi: Okay. Okay.

Hunter: That’s not so bad. The challenge is when you want to do a preclinical model of an organ. And that obviously requires actually taking the organ out of the animal or getting it. And then you’re going to have to do things like flush the blood out of the organ. You need to load with cryoprotectant, you need to do the vitrification process, do the rewarming process, unload the cryoprotectant, and then do the assay.

And the assays themselves take several hours to be evaluating the function of the organ. So it’s just, it’s a very long process if you want to get down to an entire organ as an evaluation.

Abhi: That makes sense. One question that immediately popped up in my head was, I imagine you want to rewarm uniformly to prevent these thermal gradients and prevent mechanical stress. Is there any world in which you have a molecule that you can inject alongside the cryoprotectants to help with that? Are there such things as cryoprotectants that help prevent mechanical stress?

Hunter: Yeah, so I think mechanical stress is one that would be interesting to look into. I haven’t looked specifically at that, but what we have looked at... there’s a set of molecules that would help with not necessarily blocking ice, but help alleviate any damage that would be induced.

Abhi: Yeah.

Hunter: And we’re definitely not the only ones who are looking at this. I think the field as a whole is looking for these additive molecules that can help with things like the shock that is induced by going down and coming back up again. There are also some cryoprotectant agents whose mechanism of action is strictly what I would call biomechanical. That instead of blocking ice, what they do is they actually interpolate into the biological membrane and strengthen it. So you can actually have these, they’re polymer molecules that Medevelop developed that will actually sit in the membrane and will strengthen the biological membranes. This obviously works much better in small cells or embryo context. Pretty hard to get that to work at the scale of an organ, but it is an active area of research.

Abhi: Why is it difficult to get to work at the scale of an organ? Just...

Hunter: Loading things that are that large homogeneously outside of the vasculature becomes challenging. The transport here is one of the key limiting factors for any of these approaches.

[00:57:49] What happens to the cryoprotectants and iron oxide nanoparticles after the organ has been thawed?

Abhi: I think when we first talked, one of the most interesting questions I had and most interesting answers you gave was, you fill up this organ with cryoprotectant and iron oxide nanoparticles and you rewarm it back up. What... how does the body deal with all this stuff that’s left over?

Hunter: Yeah. Yeah. So you really don’t want to leave anything behind at 37.

Abhi: Yeah, for sure.

Hunter: And so I think a lot of the work that we do is making sure that you’re able to effectively clear all these cryoprotectants. I think that there is sometimes a misunderstanding of a beautiful symmetry here. So when you go down to around four degrees, which is where we load the cryoprotectant into the organ, obviously the metabolism is much slower.

Abhi: Yeah.

Hunter: And as a result, the toxicity of any of these molecules is substantially reduced. As long as you’re loading down at four degrees. And we load between four and 10. So you hold there, you load your cryoprotectant up. Similarly, when you want to go back and you want to unload, I think that oftentimes there’s this misconception. It’s oh, I need to add a chelator or something to take the cryoprotectant out. But you can unload the cryoprotectant the exact same way that you loaded the cryoprotectant,

Abhi: Just flush...

Hunter: Just slowly reducing the concentration of the cryoprotectant in what’s being flowed through the vasculature. And then you get passive diffusion that starts to take it out.

But yeah, critically we don’t want to leave cryoprotectant behind and then warm the organ back up to 37 degrees Celsius. Because then we eat all of this toxicity that’s going to be a result of being warm and loaded with cryoprotectant.

Abhi: Yeah.

Hunter: Similarly on the nanoparticle side, I think a lot of work in the field has been focused on passivating the surface of the nanoparticles to make sure you’re not leaving iron oxide behind in the kidney.

Abhi: You’ve mentioned a few times in the past about, you don’t want your cryoprotectant agent to be too toxic, but why does toxicity actually matter if it’s so cold that no reactions are actually occurring?

Hunter: Oh, interesting. Yeah. So the thing is that some reactions are occurring at four.

Abhi: Okay.

Hunter: That’s the trick. Gotcha. Yeah. It doesn’t matter if it’s toxic once you’re down at minus 130.

Abhi: Okay.

Hunter: For sure. And there have been some strategies that have been brought up previously where you can actually have very toxic cryoprotectant agents that can be loaded even colder. So one example of this would be M22, which is a formulation that must be loaded at minus 22 degrees. That’s where it got its name. Very effective cryoprotectant, also very toxic.

Abhi: What does toxicity usually... yeah. What does toxicity usually manifest as?

Hunter: Yeah, there’s a variety of different... when you say manifest as, it’s like in cultured cell context or...

Abhi: In full organ context, or I guess has there ever been a particularly toxic cryoprotectant used in organs?

Hunter: I think one tends to not use a really toxic cryoprotectant in whole organs. Obviously, the outcomes here are going to be different based on different organs.

Abhi: Yeah. That makes sense.

Moving... perhaps relatedly, there was this kidney... mouse kidney transplant thing that you discussed earlier. How did that go? When they... everything was an autologous thing. Where they took the kidney out, froze it, rewarmed it, put it back in. Was the mouse just perfectly fine or... I think, was it a rabbit or mouse?

Hunter: It was a rat.

Abhi: Rat. Okay. A rat. Okay. Did the rat live?

Hunter: Yes. The rat in fact lived. Yeah. It was a cool study. They showed that you can actually regain normal function in the kidney. I think that if you look at the recovery time, it took some time for the organ to recover, a matter of weeks for the organ to fully recover normal function. But they were able to get a viable kidney through the cryopreservation and rewarming process.

[01:01:33] Cryopreservation and immune response

Abhi: Gotcha.

And I remember you mentioned at some point that there was this concern of an immune epitope exposure as a result of the rewarming and the freezing and rewarming process. Would you be able to walk through that again? Because I thought that was a very interesting anecdote.

Hunter: Yeah. I think that anytime that you’re going to induce cell lysis in tissue, you can invoke an immune response. And so there’s actually been some examples of this in human pediatric cases. Where you can take some heart tissue out of a person, it can go back into the person afterwards. So obviously there’s no self versus other response. But the strategy that you use for cryopreservation affects the outcomes of these very tiny little tissue extracts and then reimplantation. And one hypothesis for the mechanism of action here is that because you get some cellular lysis during the cryopreservation process, you expose a bunch of the intracellular molecules that are then going to induce some immune interaction. And can cause rejection of the graft, even though it’s an autologous transplant.

Abhi: I imagine there haven’t been so many cryopreservation transplant studies done such that it’s pretty clear when this will happen and when it won’t happen.

Hunter: Yeah. I think that this is such a nascent field that I wouldn’t say... The thing that is not nascent is embryo cryopreservation.

Abhi: Okay.

Hunter: There, we really, I think, have many N on that process. Right now, I think this year there will be something like 150,000 embryos... 150,000 live births this year in the US alone that were previously vitrified as embryos, which is a wild number to me. And you can store for... they can get stored for 30 years, which is crazy to me.

[01:03:24] How do you filter through the cryopreservation literature

Abhi: Cryopreservation is a nascent field. It does seem like it’s ramping up in recent years. And now that there’s this probably deluge of cryopreservation papers consistently popping up all the time. I’m curious, what is your own filter when you’re reading through these papers? What marks a paper as particularly high quality and you pay attention to this versus something that you can probably just skim and not super pay close attention to?

Hunter: Yeah. I think that there are fortunately a whole host of groups that are doing things that would meet this bar. But I think that the bar that I’m particularly interested in for when you’re doing things that are going to translate into an organ is to ask the question, are the assays that are done to demonstrate the viability of this organ in concert with what the field that they’re trying to interact with has established? So for example, if they’re doing kidneys, are they doing standard assays that would be accepted in nephrology to evaluate the viability of this organ? And I think that if you go all the way down to the molecular study, I think the molecular studies that I’m particularly interested in are ones that don’t do random scattershot screening, but instead really get into what’s the mechanism of action of the interaction of this cryoprotectant with the water molecule. Water is an under-hyped molecule. It’s actually incredibly subtle and complicated to interact with it. I’m biased, but it is currently becoming my favorite molecule. And I think that you can write really deep interrogations of the interactions of these cryoprotectant agents with the water molecules. I think that those can be deeply informative for follow-on study.

Abhi: And sorry, these are papers specifically trying to create new cryoprotectants or even in the context of no novel cryoprotectant work, you still want to know what the interaction of the cryoprotectant was?

Hunter: I think I learned stuff even when people just look at DMSO interacting with water. And people have done great things of studying the Raman spectroscopy of water and DMSO interacting to try to figure out the actual coordination, like how is this hydrogen bond actually coordinating? I think that sometimes these things are academic and can’t be translated into better engineering, but I think oftentimes we can lay better foundations for our mental models of coming up with better cryoprotectants and definitely we can come up with more interesting metrics we can be calculating in silico. I think that’s where a lot of these physical chemistry studies become really helpful is it’s oh, how can I think about simulating that? How can I think about driving this onto a chip?

[01:05:54] How much is molecular simulation used at Until Labs?

Abhi: On the topic of in silico simulation, how much molecular dynamics, molecular simulation goes on internally at Until?

Hunter: Yeah. I think we think a lot about how to do in silico screening well, and I think that there’s a variety of different ways that you can do this. One that is quite common is you can think about molecular dynamics simulations where you’re trying to look at what is the extension rate of an ice nucleus that is placed inside of some solutions. These are coexistence simulations. The way that you can think about this is just set up a box. This box has hexagonal ice in one half of it, and the other half it has some mixture of cryoprotectant and water. And you can look for, does the ice tend to extend or does it tend to melt?

And this is a good way of measuring the equilibrium between the liquid and solid state, given some cryoprotectant mixture. And we found that this is somewhat interesting to look at for in terms of ice formation.

Abhi: I’m obviously not at all a molecular simulation expert. My interpretation was that phase changes, especially of water in molecular simulation, is really gnarly and not very well modeled. Does it... do you see that? And even if you do see that, you still think these simulations are pretty predictive of what happens in real life.

Hunter: Yeah. I think one nice thing about trying to be an engineer here and not trying to be a scientist is that you just want to find things that correlate.

Abhi: Okay.

Hunter: And we have the thing in the lab. It is true, the thing that you’re getting at, which is a real problem for the field as a whole, is it’s quite hard to simulate actual ice nucleation. So this is a very... the thing I just pitched you is super contrived, right? I literally just set up a boundary of ice and I’m trying to look at the thing that is interacting with it. This obviously has some deep limitations on its translation to the actual test tube that I’m doing the experiment in, and it’s because I haven’t allowed for nucleation. Nucleation has been taken completely out of the equation.

And that’s the whole thing that you’re trying to get at, is how to suppress the nucleation. The reason nucleation is hard to simulate is that it’s actually... it’s too fast for us, but it’s too slow for molecular dynamics. Because if your time steps are one femtosecond and you need to simulate for a nanosecond or something to be able to see nucleation, for every cryoprotectant that you want to interrogate, this is not tractable. So I think that it’s a challenging thing to get around is what is the right thing to simulate? And I think the things that our applied physics group spends a lot of time thinking about is what are the right things to simulate to be able to have some predictive power around what is the efficacy of this cryoprotectant going to be?

Abhi: How much do you personally or perhaps anyone in the company pay attention to the neural network potential research that’s coming out?

Hunter: Yeah, I think that this is actually really helpful. There’s some versions of this that are more useful than others, as it would always be the case. Some of the neural network potentials are still very costly to simulate compared to classical force fields. And so I think you want to pick and choose which ones you use and in what context you use them. It’s I think there’s no, at least in our hands, there is no skeleton key for, oh, you do this exact simulation and it works. I think it’s more of an intelligent ensemble of simulations to try to get some interpretation. But yeah, I think the work that’s been coming out of ML-related tools for better simulating interactions is going to affect cryopreservation, just like it affects drug discovery. This is fundamentally improving our ability to model the physical world in silico. And I think that is great, and I would encourage those academics to continue to push hard.

Abhi: Maybe a naive question, how much does QM matter for these sorts of simulations? Or is molecular mechanics fine.

Hunter: Yeah. I think that there are some things that you really care about quantum mechanics for, and this is where the neural network-based potentials can be helpful is the actual hydrogen bonding of the stuff to water obviously involves things that are not well simulated just with a standard Lennard-Jones potential. Sure. So there, I think that it is useful to think through some more complex interactions than just a simple Lennard-Jones system.

[01:10:04] What are the (expected) economics of Until Labs?

Abhi: Okay. Yeah.

That makes sense. Moving on to almost non-scientific questions about Until. One immediate question I had upon learning about you guys when you were named Cradle was, what are the economics of this setup? My mental model of organ transplantation is that there is this organ waiting list. When your name gets called, you go and get your organ. There’s no direct-to-consumer setup. Who are buying these organs?

Hunter: Yeah. So the transplant pipeline is very complex as I have been learning. And the thing is there is no direct-to-consumer. I can’t call up and say, “Hey, I need a kidney.” That’s not how this works. As you were mentioning, there is a transplant list. There are the organ procurement organizations who are responsible for facilitating the transaction of organ comes out of this donor, goes into this recipient. And there are a host of companies that specifically handle the logistics of transporting the organ from one place to another viably. A really big player in this space would be TransMedics, a publicly traded company, who literally... they have a suite of private jets that will fly around and pick up the organ from the donor and bring it back to the recipient. This is all very heavily coordinated and the logistics are certainly not trivial.

Abhi: We were just talking about this before we started this conversation. I had this question about, right now putting an organ on ice is incredibly cheap. Perfusing with oxygen is incredibly cheap. What’s the value proposition to go for something like Until Labs? A potentially much more expensive protocol. And you mentioned that you’ll need to rely on this incredibly expensive transport chain less heavily.

Hunter: Yeah. I think that one of the nice things about doing something like organ vitrification is that because it takes urgency out of the process, it just relaxes all of these logistical constraints. So for example, I don’t need to have a private jet to go get the organ anymore. I also don’t need to wake up a transplant surgeon at 2:00 AM because that’s when the organ became available. Yeah. We now have in our vision a process that can be much more, let’s say, disciplined about bringing the organ from the donor to the recipient. And this has a bunch of knock-on effects.

So one for example is that it could increase testing. If you look at the outcomes for living versus dead donors for kidneys, if you look at 10-year graft survival rates, the 10-year graft survival rate in the US for a kidney recipient from a dead donor is around 50%. From a living donor, so this would be you get it from your brother or something. Sure. It’s about 60%. So literally just the increase in immune matching of getting it from a living donor... and it may also have some other logistical constraints there of you can literally do it on the table that’s next to the person. But there’s that much benefit to get just from improved matching in the biological sense.

If you look at the reference that I made previously about the fact that in vitro fertilization of embryos... you now have a higher chance of getting a live birth from a cryopreserved embryo than from a freshly implanted embryo. And the reason for this is, again, increased ability for testing. So we think one, the cryopreservation process can lead to better outcomes for patients because we have this time that we have bought to be able to improve matching. We think that it can improve the equality of organ allocation by allowing us to respect the transplant list more, and have fewer open calls where the organ just needs to go to someone because we don’t want it to get wasted. And then, yeah, no private jets required because we can get them there. No surgeons woken up in the middle of the night.

Abhi: When it comes to what the supply chain would look like if Until Labs ruled the entire system. How careful do you need to be with a vitrified organ? Can I put it in a truck and just have the truck go? Or does it need to be in a very specific, very special container?

Hunter: So I think we would manufacture a container that was sufficient such that it could go on a truck. I think the things that are significant is you don’t want thermal gradients to be able to come in. You don’t want to thermally cycle the organ ever up to above minus 130 degrees. There’s some stuff around tight temperature control.

But I think that these are all highly solvable problems. And I think the marginal cost of doing this on a program basis is pretty trivial.

[01:14:49] How much does cryopreservation practically solve the organ shortage problem?

Abhi: Yeah.

And again, in this hypothetical of Until Labs is everywhere. How much... is the organ shortage problem solved overnight?

Hunter: No. Okay. Unfortunately not. I wish that were the case. But in the end, this is still a supply-limited market.

Abhi: What gets better? Let’s say 10% of people who need... 10% of names are crossed off every year from the organ transplant list and everyone else dies. What does that number rise up to, at least for kidneys?

Hunter: Yeah. So I think that what’s going to end up happening is you’re going to have initially, let’s say, a few thousand organs, which would be the ones that should be going into patients, would be viable, but get lost due to logistics. It’s like the organ is on the plane, needs to get de-iced, the organ expires while it’s on the plane.

Abhi: Does that happen?

Hunter: Yeah.

Actually, Laura, my co-founder, was literally talking to a transplant surgeon the other day that was recounting this exact story. I’m not making this up. This is an actual thing.

Abhi: And this is not a particularly rare incident.

Hunter: I think that this particular annoyance of plane de-icing maybe is rare. But I think that the idea... if you can get a few thousand additional organs, if you reduce logistical constraints. And these are publicly available figures.

Abhi: Yeah.

Hunter: The thing that is an interesting unlock in the long term is if you can start to relax the supply constraint.

Things like being able to get organs that are from... there’s a concept of DCD versus DBD donors. A DCD donor is death by a cardiac event.

These are more, let’s say challenging logistically to get out. The time constraints are tighter.

Abhi: And just because ischemia happens immediate...

Hunter: Okay. Exactly. It’s an ischemia issue. The clock starts earlier. And so as a result, that is a nascent field where people are trying to push into these DCD organs to try to increase organ availability. The other in the long term that I’m particularly excited about is xenotransplantation.

[01:17:04] Synergy between xenotransplantation and cryopreservation

Abhi: I was going to ask about that.

If we had sufficiently good xenotransplantation, do we need cryopreservation?

Hunter: I think sufficiently good xenotransplantation mandates cryopreservation, but that’s my perspective.

Abhi: What’s the rationale there?

Hunter: I want to envision a future for xeno that is maybe more aggressive than has previously been talked about. I want to envision a world where someone who has a heart attack and maybe would not even have time to get a heart transplant, can now get a heart transplant. Where you go into an ER and there is a liquid nitrogen dewar sitting there that is filled with hearts that are ready to go, they can be transplanted in. Now all of a sudden, organ donation is no longer something for chronic conditions. It is now something that is also for acute conditions. And this is the thing that will dramatically increase the availability of organs is if xenotransplantation could get solved. I think that there is some reasonable, let’s say, skepticism about the timetable on which xenotransplantation will come online. But yeah, I am, let’s just say I’m cautiously optimistic that those guys will make progress. And I think that cryopreservation would be a natural fit for their logistics supply chain.

Abhi: I think bringing acute conditions onto the table is really fascinating. I had never really thought about it that way. What do you think of the xeno... I’m not super familiar with it. I just know there was that pig heart that was CRISPRed to be a little bit more humanized. It was implanted into a human patient. The patient ended up dying, I think, but potentially for reasons unrelated to the heart.

Hunter: Yep.

Abhi: Do you think that field is going to rapidly mature over the next five years or there’s some big insurmountable problems there?

Hunter: Yeah. I should clarify. I’m certainly not an expert on the process or the progress of that field. And I think that if you ask five people, you might get six opinions on the future of xeno. I’ve heard everything from, “Oh yeah, it’s right around the corner.” To one transplant surgeon told me, “Xeno is the future of transplantation and it always will be.”

Abhi: It’s perpetually five years away.

Hunter: Exactly. It’s perpetually... Yeah. I think that was his perspective. Okay. So I think that there’s a variety of perspectives on that one.

Abhi: One thing I was really curious about, this X number of people die per year because they are unable to receive that organ. How has X changed if Until Labs really succeeds? I’m not sure if... the answer you gave was, oh, several thousand because we’re still figuring things out. I’m not sure if you had an exact... there’s been almost models drawn up as to how much can we put a dent in the organ shortage crisis if this really takes off.

Hunter: Yeah. This is a really, unfortunately, it’s a super complicated question to answer. And really a complicated, even more complicated question to answer well. And the reason for this is that the data here is just, it’s very challenging. Yeah. I imagine I should tease out what is the counterfactual of the organ going into a person or not going into the person. So we have some statistics on the expiry of organs during transit, which is where I got the few thousand organs metric. You can actually look at a pie slice of what’s the outcome of various organs. And the ones that are basically, were viable but expired in transit... if you add that pie slice with some other ones that are clearly logistically related, that’s where you arrive at a few thousand organs a year.

Abhi: Gotcha.

And so it’s not necessarily the case that, let’s say Until succeeds, 10 years goes by, we’ll have a surplus of organs, like every organ that’s currently in transport right now will be given to someone.

Hunter: Yeah. I think, that for the time being, this is still going to be a supply-limited problem.

Abhi: Gotcha. Okay.

Hunter: So it will still be the case that there will be a waiting list. Unfortunately, it’ll still be the case that there will be people on dialysis in this country. And I think that it will have to be cryo plus some other technology that will need to come online for that to not be the case anymore.

[01:21:12] How much will the final cryopreservation protocol likely cost?

Abhi: Yeah, that makes sense. How much do you envision the Until Labs protocol costing? Is it an undecided figure or...

Hunter: It’s an undecided figure. Partially because we don’t know what the Until Labs protocol will be.

Abhi: That’s fair.

Hunter: I think it’s... we have such a large suite of potential technologies that are brought to bear, but what I can say is that I don’t see any obvious place where this is going to be a gene therapy that costs a million dollars a patient. That’s not what we’re talking about here. And I think that the parts that are expensive are primarily the devices.

Which will be amortized over many, organs. These perfusion devices that we’re talking about, you’ll have a disposable component to it for sure. But that’s not the expensive... the expensive part is the part that’s reusable.

[01:21:58] Who ends up paying for this?

Abhi: When it comes to companies like TransMedics, who’s paying? Are they primarily contracting with insurance companies and you are also planning on contracting with insurance companies? Yeah. Who... yeah. Whose job is it to ensure almost there’s alignment between what the patient wants and the money that you expect to get from this.

Hunter: Yeah. So there are Medicaid reimbursement rates that are set up for organs, basically. And so there’s a public payer. Obviously that’s setting some market baseline.

And then yeah, there’s insurance companies that you have to be able to figure out what they’re willing to compensate the transplant centers for. The customers that you’re actually working with though, these are transplant directors. And you’re going to be working with some OPOs. That’s the people who you’d be directly interacting with.

Abhi: And what’s the approval process for this? Because it’s not a drug, it’s almost a procedure more than anything else.

Hunter: Yeah.

Abhi: Is it a... what is it classified as exactly?

Hunter: Yeah. So we don’t know yet. But I think that medical device is probably how it’ll be classified. Yeah.

Abhi: How... and I imagine there’s not that much precedent for something like cryopreservation or would... was the ice stuff also... that had to go through its own approval process?

Hunter: Yeah, I think there’s a different... there’s going to be different approval processes for each of these. I think that there is some precedent if you look at things like hypothermic machine perfusion. I think that could reasonably set some precedent for the vitrification process. But again, we’ll have to leave that to the FDA to decide.

[01:23:28] What was it like to raise a Series A on such an unorthodox thesis?

Abhi: Yeah, that makes sense. Moving on to the actual raising journey for Until Labs. What was the series A again?

Hunter: We did 58 million.

Abhi: Okay. Okay. Until Labs is a little bit of a strange thesis for a company. It’s a biotech, but it’s not a therapeutics company. It’s serving transplantation, which I had not conceptualized as, oh, there could be a for-profit company really playing and doing innovative biomedical research here. And as far as I can tell, you guys are one of the very few people playing in that area. What was the fundraising journey like?

Hunter: Yeah, so I think that, first of all, I would say that while it may appear that we are one of the few people working in the area, my guess is that is not for long.

Abhi: Okay.

Hunter: And in terms of the fundraising journey, I think it started about a little over a year ago, not where we were actively looking to raise. We had just completed our seed and the neural slice paper came out right after we announced our seed.

Abhi: So what neural... the German lab?

Hunter: The one... no, the one you were referencing.

Abhi: Gotcha.

Hunter: The one where we did the rat cerebellum. We used that and announced the seed at the same time. And that was 15 million for the seed. So, yeah, about a year, a little over a year ago, we had this idea that maybe we wanted to start playing around with this donor organ problem.

And as I said, there had already been some of this work from the University of Minnesota, which we had been looking at. And I think it just became clear that this was going to be on the roadmap anyways, this is going to be on the scientific trajectory that we would want to be on and we get to start helping patients. So Isla, who’s currently our director of preclinical research, came to me and pitched this whole cloth, was saying, look, we have this long-term roadmap, there’s this obvious use case in getting this into patients, we should go after this seriously, and we should start scaling right now to go into preclinical models. And so that kicked off this journey where we started to initially just kick the tires on, could we take our protocols and move it over? Could we take our engineering team, task them off to this? And now this is the dominant focus of the company is how to get this translated.

One thing that we wanted to be able to do there is raise to be able to accelerate that process of getting this done and towards the point of doing a first-in-human trial. And so I think that the primary purpose of the raise is to bring on additional capital to be able to parallelize a lot of these processes and get things to market quicker.

Abhi: Was... what was raising like? How many questions did you get over the economic thesis versus the scientific thesis versus some other thesis?

Hunter: Yeah. I think we got questions along both axes. Okay. And I think that we are very fortunate to have excellent partners. Excellent capital partners who I think understand the thesis really deeply. I think actually in one of the pre-conversations you had asked me, what advice would you give to people who are in similar positions? And I think I didn’t give you a particularly satisfying answer because I don’t think that I have one. And the reason for that is I think that we were really fortunate in that the people who... so the raise was led by Founders Fund with Field Ventures and Lux joining.

And all three of these groups are able to do exceptionally detailed diligence on their own. Having technical conversations with their technical team was like you and I sitting here talking about science, plus on the level of depth that was required. But it wasn’t some story that needed to be packaged. For them, it was just an actual conversation about the technical risk.

Abhi: So it wasn’t necessarily that, oh, we want to do brain preservation and we’re moving over to kidney preservation because we’re bringing in outside investors...

Hunter: Yeah. No, that was... it was very much the reverse process there. It’s like we went to go get more capital to accelerate being able to get the donor stuff to market.

[01:27:49] What are common misconceptions people have about cryopreservation?

Abhi: What are common misconceptions that people have about the cryo field, especially people who are external to the field, and perhaps not even laymen, but people who you consider smart and what their misconceptions?

Hunter: Yeah, I think that I had a misconception when I originally jumped into this, which was that it was not doable because of ice. That you would always get ice that would form and irreversibly damage tissue. And in reality, you have this minimum temperature for ice formation at minus 130 degrees Celsius where water turns to glass, not ice. So if you can traverse this danger zone between zero to minus 130, then the tissue will be safe and you can rewarm it without damage.

Abhi: But you did mention about how these cryoprotectants have existed since the 1950s.

Hunter: Yeah.

Abhi: And so there, I imagine there should have been time for the rest of the scientific field to be aware that, oh, ice nucleation is a solvable problem. Why do you think there still is this misconception that it’s fundamentally unsolvable?

Hunter: I think that part of it probably has to do with the fact that when we actually do cryopreservation of things like cultured cells in the lab every day, we use a completely different process that does allow ice to form, that would not be compatible with tissue. So you imagine, take some cultured cells and you want to store those in vapor phase storage.

Abhi: Yeah.

Hunter: There you use a process called slow freezing. You’re allowing ice to form in the extracellular space and it slowly expands and hyper-concentrates the cryoprotectant elsewhere and then forms the glass. I think intuitively people understand this should not work for tissue. Because you’ll tear up the extracellular matrix. Yeah. So I think that probably has facilitated a misunderstanding of fundamentally how the cryopreservation process works when you want to go to these more tissue-specific or organ-specific processes.

Abhi: Does... it sounds like people don’t actively think of the vascular system as a very good transportation medium. Is that fair to say?

Hunter: So I think that’s also a common misconception. It’s oh, if I need to diffusely load this object, isn’t it going to take forever? Because they think about the equivalent of, okay, I have in vitro fertilization, I have a tiny little embryo that’s six cells that needs to load cryoprotectant. People can envision in their head, okay, I load the cryoprotectant, I cool really fast. That makes sense. And I think that a key unlock is hijacking the vascular system for mass transport.

[01:29:58] The beginnings of Until Labs

Abhi: Why... at the very early beginning when you and Laura first founded this company, was the plan, oh, we’re just going to push forward on this brain thing until it’s done?

Hunter: Yeah. So when Lauren and I met... that’s a funny story. So Lauren and I met via her cold emailing me while I was a postdoc in Adam Cohen’s lab at Harvard.

And she hit me with a very open-ended question, which was if I thought it was possible to reversibly pause biological function. And my initial response to her was the same response that any reasonable person has, which is basically, are you kidding me? Of course not.

Abhi: Yeah.

Hunter: And there was a couple of days after there where something was eating at the back of me, which is... I was a physicist by origin. And it’s okay, if you have this kind of response to someone who has this history, right? Laura had a very established history at that point of making strong scientific bets.

Abhi: Yeah.

Hunter: It’s like, okay. I should be questioning my assumptions here. And so I went back and looked at it and was like, oh, this problem is fascinating. And I think that there’s real traction that’s been made recently and I have some ideas about how we might be able to continue to do that.

And so initially it was very much this curiosity that drove me into wanting to join up with Laura and get this going. And that curiosity manifested in a slowly expanding way where I realized the places that this could help. As a bit of personal context, in 2016, my father-in-law, Mark, was diagnosed with a terminal case of cancer and was given a six-month prognosis. And he lived almost exactly six months. And near the end of his life, a clinical trial for Keytruda came online for his disease. But he was too sick to qualify for it. And I think in these conversations with Laura, something that hit me was this is a technology that could help people like Mark.

People like Mark and their families who they don’t need some hundred-year jump into the future. They need six months.

They need a year. And this continued conversation with Laura led me to understand, oh, Mark’s case is not isolated. This is not some exceptional thing. It is the case that oncology drugs are forever getting better and survival rates are increasing. Things like pandemics, think about AIDS. Okay. In the 10 years between the onset of the AIDS pandemic and the creation of combination antiretroviral therapies, 9.7 million people died. 10 years, 9.7 million people. And I think there’s just this overwhelming sense of... there were no other things that I had touched before scientifically that had this kind of a lever arm on them to be able to affect this many people just by time. It was a very skeleton key solution to healthcare. So yeah, I flew out to meet with Laura and basically, flew back, quit my job, wrapped things up and moved out. And I think initially we really were committed, and still remain very committed to this day, to this hibernation as the long-term goal.

And this is the idea of taking an entire person, taking an entire patient who would be terminal with some disease and giving them access to cures that are right around the corner.

We started with that in mind and that was why the neural stuff, that was the first thing we wanted to do because it’s like the hypothesis is, oh, that’s going to be the hardest thing. Yeah. That’s going to be the thing that for sure breaks.

These are insanely delicate tissues. In the lab we would have challenges just keeping them viable on their own, much less cryopreserved. So yeah. That’s what we came out to originally do.

Abhi: I think prior to even learning about Cradle existing, whenever I thought of cryopreservation, I just assumed, oh, that’s thermodynamically impossible to do. It’s not... no one’s ever going to crack the problem. Everyone working in it are grifters.

Hunter: Yeah.

[01:34:07] What expertise is hardest to recruit for?

Abhi: Yeah, it... I think when I read that progress report in preparation for this interview, I was like, oh my God. That’s obviously small scale, how do you scale it up, but still a crazy achievement. One question I had was, you’ve mentioned over and over again about how multidisciplinary cryopreservation is as a field. What expertise is hardest to recruit for?

Hunter: Yeah, I think that again, you want top talent across everything. And so getting top talent in any field takes a disciplined approach. And I think that you want to show people that this is for real and show them how they can contribute. I think the thing that practically is very hard to onboard is medical talent and expertise.

And I think that this is primarily driven by the labor and economic incentives of being a doctor in the US. No one wants to stop doing their clinical practice, which means that it is really challenging to be able to get good medical talent to come and work on it. We’ve been very fortunate to recently have some awesome talent on, not necessarily even from MDs, but from PhDs who’ve been working at Hopkins for example. So Dr. Amanda Lofton, who has recently joined us, she’s a PhD at Hopkins who studied things like the perfusion of these donor organs at hypothermic temperatures. So there are... you can find it. It takes a disciplined approach. And you’re definitely looking for personalities that have a high degree of, let’s say, openness to change.

Abhi: How important is it that the people that you recruit for these positions have actual... have spent several years in an organ transplant clinic versus they have done an MD and know the biology?

Hunter: Yeah. so I think that we tend to across the board recruit for people who understand the platform, understand the biology as a platform and not very specific... So I think hiring too specifically in a company like this is a deep mistake.

Abhi: Really? Why? Why is that?

Hunter: And so I think, for example, you could go and say, oh, we’re just going to hire... let’s, if we shifted over to the molecular development side. Sure. You’d say, oh, we’re exclusively going to hire people who have developed cryoprotectants before. Yeah. I think that it’s reasonable to hire people who have developed cryoprotectants before. But I think that those are certainly not the only people who are capable of contributing well to this problem.

And so I think our approach on recruiting has been to go find not people who specifically have thought about this problem, but have thought about the facet of the problem that we need, maybe from a different lens. So Andrew maybe is a great example of this, very talented material physicist, both previously thinking about x-ray scattering and...

Abhi: This is the Tesla guy?

Hunter: This is... Yeah, exactly. He was doing battery research previously. He’s... you wouldn’t naturally think thinking about lithium ions is very similar to cryopreservation, but he’s been a very effective leader for that group. And I think that is just one of many examples of highly agentic people coming into our group and being able to make real progress against this problem.

Abhi: I imagine the mental leap to finding Andrew out in the wild working at Tesla and thinking, oh, maybe he’d be really useful for a cryoprotectant group is a big jump. Did he reach out to you first? Did you reach out to him first?

Hunter: Andrew and I go way back, so that was helpful. We used to race bicycles together actually when we were at Chicago. So that was helpful. And honestly, that was one of the benefits of having come up through a scientific background that put me in academia for a long time. Yeah. Is that I have the benefit of knowing who is very good from having seen them work before. And Andrew always struck me as someone who was both exceptional at physics, but also was deeply operationally minded. Very no-BS problem solver. And he was a natural early hire for the company.

Abhi: How much of your recruiting process is reaching out to people who seem hyper-talented in this orthogonal discipline to cryopreservation and saying, Hey, have you ever thought about working at Until versus them reaching out to you and saying, I want to turn my talents into this field I know nothing about.

Hunter: Yeah. I think that in the beginning it was almost all the former. Because we’re in stealth mode. And no one knows who we are. I think that we are slowly seeing it shift to be the latter where... Laura and I are getting cold emails from some thermodynamics expert who’s “Hey, I saw this.” Yeah. “And this is crazy. Can you tell me more about the glass transition in water?”

And I think that’s part of why I want to have conversations like this. This is part of why I think it’s my job to go around and talk about what we’re working on, is that I do think that there are likely very many intelligent physicists, biologists, chemists who would think this problem is interesting, but maybe don’t see themselves as being able to contribute to it meaningfully and I think I would want to make the vociferous pitch that is not the case.

Abhi: Okay. That’s an open call to everyone watching this to apply to Until.

Hunter: Indeed.

[01:39:27] What personality type do you most value when hiring?

Abhi: What, amongst all the talented people who end up in the recruitment pipeline for Until, is there a particular personality archetype that you think is most useful to have around?

Hunter: Yeah, I think that the first thing that I’m looking for is a highly agentic personality. And what I mean by this is a willingness to take on a high degree of responsibility for solving a really hard problem. I think that’s one. The other that I always look for is people who understand failure deeply. Because I think the one thing we pride ourselves on at Until is failing really fast. One of the things about building out a really wide tech platform, the way to do that effectively is if there’s an idea that seems good, design the experiment that proves that it’s bad, and do that at a very high rate. And I’ve found that people who have simultaneously a high degree of agency and a low enough ego to be able to design something to prove themselves wrong, those are the people who end up actually being able to, I think, move really quickly and contribute really meaningfully to the problem.

Abhi: You had, I’m not sure if I’m hallucinating this, but I think you mentioned your postdoc advisor or your PhD advisor had this one quote about, you should strive to treat everything as the same field or something along those...

Hunter: Oh, okay. Adam. Yeah. Okay. So I, this is... I have deep respect for the people that I’ve gotten to work for. I actually started... my first research job was with an ultrafast optics person, a professor at the University of Chicago, Greg Engel. Greg was great and gave me a job when I knew less than nothing. You’re an undergrad and you think you know something, so you’re actually less useful than a complete naive human being. And then in grad school, I got to work for Mikhail Shapiro at Caltech, exceptional scientist. And was the one that converted me from physics over to biology. And then my postdoc, I actually got my dream postdoc, which was to go work for this guy named Adam Cohen. And Adam is both an exceptional physicist and a great biologist and neuroscientist. And there’s this interdisciplinary way that his mind worked that just fascinated me. I had been following his publication history from even before he was a professor. And I’m looking at this guy as someone who had been able to traverse these discipline boundaries.

And so when I showed up, I asked him, “How do you do this? How is it possible that you seem to be pressing things all the way from single-molecule biophysics all the way up to solving neural circuits to building crazy microscopes?” I was an optics... I thought of myself as an optics expert and I’m looking at the microscopes that are in this guy’s lab and it’s completely blowing me away. The answer that he gave has stuck with me so deeply, which is that nature does not care. Nature doesn’t respect these boundaries. There is no such thing as biology to nature, or chemistry or physics. This is all just fundamentally the same at its core. And I think he and I share a physics background, so I think at his core it’s just physics at the core. With more scaffolding built up around it. And I think oftentimes when physicists say that, they’re trying to give themselves priority or something, but that’s very much not how he meant it. What he meant was these discipline boundaries are just purely human-imposed. Yeah. Yeah. So yeah, that has totally stuck with me. And Adam’s mentorship in particular was incredibly helpful in setting up for what I’m doing now.

Abhi: Do you think the only... the primary way one can imbibe that mindset is just to simply learn about many different fields? Or... I’m curious, if you were mentoring someone how do you encourage them to think about the scientific process?

Hunter: Yeah. I think that in terms of learning many different fields, first of all, just get comfortable with being an idiot about different disciplines, particularly early on because you have to be deeply, comfortable with the idea that if you’re going to move through a discipline with which you are unfamiliar, you will be a fish out of water in that space for a while. But I think that what... and this is one of the things that I particularly love about physics, is that it sets this beautiful foundation through which you can attempt to apply analogy to a whole host of different disciplines. And this isn’t always functional. This obviously breaks down in some very complex systems. But it gives you a language for thinking about the physical world. But I’ve also seen people who are very good biologists use their own version of analogy to try to understand nature. And have similar, conversations. Mandy, this person who I was talking about, who does a bunch of these organ perfusions for us.

She and I can have a conversation. And it’s interesting because we will find a way to find common language to describe something from completely different backgrounds. Yeah. So that’s how I think about it.

[01:44:17] Why work in cryopreservation as opposed to anything else?

Abhi: And you’ve mentioned you’ve worked in a lot of different fields. I went through your Google Scholar and I saw papers on biophysics, neuroengineering, bioimaging, molecular engineering, and nanotechnology. Why... you did mention this thing about your father and the field that Laura pitched to you as especially interesting, but I’m curious why start a cryonics company as opposed to any other company?

Hunter: Yeah, critically the stuff that we do, we consider slightly different than cryonics, which is a different process. But why start a cryopreservation company as opposed to any other company? I think a few things as we’ve gone over previously, I think one of them comes down to impact. I had actually not even thought about starting a company. I thought I was going to be an academic when Laura and I met. I was looking forward to starting my own lab, becoming a professor. I think that it was really a pull and not a push. I enjoyed doing academic research. But there was the sense of, oh, the scale at which we can have impact with this problem, the level of neglect that it has had in terms of being able to amass the amount of capital required to really go after it hard. These things were really contributing.

And then there’s this... and again, I learned this next part from Adam. Adam always applies a three-part test to figure out if he should work on something. So the first one is, do I find it interesting? Do I think that it’s something that I want to work on? Do I have the skills that are required to execute it well? And then the final one was the removal test. If you deleted me from the Earth, is someone else likely to do it? And I applied the same criteria here in my own way. And I think for me it passed those... that three-part test of should I pivot into this? I don’t think that I could have appreciated what it would look like when I did. I really was casting myself into an unknown. But yeah, that was fine.

[01:46:26] Until Lab’s competitors

Abhi: One thing I perhaps should have asked earlier is, I kind of view Until Labs as existing as perhaps one of the very few cryopreservation companies. What does the landscape of competitors look like? Do you view TransMedics, the current dominant organ supplier company, as competition? Or do you view other cryopreservation companies as competition?

Hunter: That’s a great question. I think that in the end, there’s going to be a marketplace for all of these technologies to coexist and there will be a question of which ones are used in what context. That’s all going to come down to what are the relative viabilities of all of these strategies. But yeah, there’s a whole host of companies that are coming up for doing near subzero storage. These store at minus five. There’s already established... there’s a lot of already established devices that do zero degrees up, like four degrees, hyper-oxygenated machine perfusion at cold temperatures. It’s very well established all the way up to normothermic machine perfusion. So this is at 37 degrees actively perfusing the organ during transplant. These all have different trade-offs. Some of them are organ-specific trade-offs. So yeah, it’s hard to say yet who our competition is because it’s hard to say yet who is going to be able to press those viability metrics as high as possible.

Abhi: If you view what Until is working on as, you can store an organ infinitely You can leave it for centuries and it’ll come back just as fine. Yeah. And the very top is, the organ needs to immediately go into a patient, otherwise it will go ischemic and just die. Yes. What does the time look like for each? For putting an organ on just pure ice. How long does that organ last versus if you go a little bit up or down the hierarchy, how long does it last?

Hunter: Yeah, so it’s obviously very organ-dependent. Okay. And so for something like a kidney, kidneys are incredibly robust actually to ischemia. So you can keep a kidney on ice for 72 hours and still it will think back up and recover. If you go for tissues like the lungs, that is not the case. You have, a few hours to be able to get it transplanted because of really short ischemic windows. So it’s going to be a very organ-specific question.

Abhi: And even if you have this hypothermic blood diffusion, it’s still max 72 hours.

Hunter: Yeah. I’m not actually sure if there’s an established machine for hypothermic machine perfusion of lungs. If there is, I’m not familiar with it.

Abhi: Oh, even for kidney?

Hunter: Oh, for kidney?

Abhi: Yeah. You’re saying it’s extending beyond 72?

Hunter: So I think that there’s no... yeah, I think there’s not a ton of research on pressing kidneys out beyond the 72 hours. There is, but in terms of established places where people are really looking hard at perfusion, it’s mostly actually liver is something that’s had quite a lot of perfusion work done for it. But all these things, you’re trying to press out those time windows to make it progressively longer. Yeah. But in the end, if you get to vitrification, then that just... you take time completely out of the equation. Yes. Yeah. Yeah.

[01:49:30] What would an alternative universe version of Hunter worked on?

Abhi: That makes sense. I’m curious what... yeah, you’ve worked in a lot of different fields. What would an alternative Hunter have done, if not for cryopreservation?

Hunter: Yeah. It’s a challenging question because it was such a sliding doors moment for my life that sometimes it’s hard to look back and think about it. But I guess in a complimentary world, things that I was interested in at the time that I was working with Adam. A lot were mostly revolved around two questions. One was advanced sensing and the other was neural computation. I think I was drifting very much towards advanced concepts of sensing. So something I did in my PhD was building out small-scale devices to look at really tiny magnetic fields inside of cells.

Which was mostly an academic study at the time that I did it. But I think that there are some interesting downstream applications for being able to do something like MRI on a single cell. To try to look at diffusive transport of tiny little molecules label-free inside of a cell. I think that this is one of those things where it was beautiful and academic and I think that the relative impact was not quite as high as what I could be doing outside. Yeah. And I think it’s a really natural fit and I feel really blessed to have had it walk into my life.

Abhi: Do you think if this Until Labs opportunity did not pop up, you would’ve felt pretty content not doing a ton of translation work throughout the rest of your career, or you think there was always something in the back of your head thinking, I should turn this into something that’ll reach a patient?

Hunter: Yeah. I think that was one of the things that was always challenging for me in academia actually, was that there was always this biting thing in me that was a need for a large impact. I think all of us want this, right? Sure. All of us want to make a larger impact. And I’m not alone in being an academic that’s I wish that I could find some way of translating this work. I also think a lot of academics do a really good job of eventually finding that in their career.

[01:51:33] What would you do with $100M?

Abhi: But yeah, that opportunity was just presented itself to me directly.

Yeah. That makes sense. And I think that perhaps the last question I have is if someone handed to you a hundred million dollars equity-free. You can spend it on Until or you can spend on some other scientific field that you’re very interested in. Yeah. Where would you allocate the money?

Hunter: Right now? Right now I’d allocate it directly into Until. Okay. And I’m not saying that as a cop-out. I actually think that having seen the interior of the organization, it’s like... for me, this is the obvious place where we can make rapid advancements for humanity.

And yeah, that’s where I would want to spend it. I think that in terms of where we would allocate it inside of Until, I think parallelizing more to be able to go after more kinds of organs and get these things to clinic faster. Because I think there is some real urgency to try to... there are a hundred thousand people right now in this country waiting on organs on a waiting list. And I think that I have... my personality is very, let’s say I have a bias to urgency.

Abhi: When it comes to actually using the hundred million dollars, what is the primary bottleneck that the money will be used to help solve?

Hunter: Yeah. So I think that the primary bottlenecks are a couple fold. One is in vivo experiments are hard. It’s not even in vivo experiments, organ experiments are complex, require lots of people and are capital intensive. And to just... in the end, some of these things don’t translate unless you scale up to there. So it’s a deep focus on that translational research. I also think that it allows us just to explore more of these fundamental questions. Work with academics to explore more of these fundamental questions, and build out the foundation that actually predates Until substantially around some of these more fundamental scientific questions, to try to figure out what is the best avenue for us to build off of as we go forward. Because I think that even the solution of doing this on isolated organs does not solve the long-term problem. I think that the capital injection is also quite helpful for helping us lay a foundation to be able to eventually do this on an entire organism.

Abhi: Sure.

Is the... does it make sense to say that... the way that you’ve explained this sounds like both sides of... you need people to figure out the theory for everything and then you also need a small army of RAs to actually do the experiments.

Hunter: Yep.

Abhi: Would you allocate the money equally or you think just the empirical animal studies are infinitely more valuable than people theorizing?

Hunter: I think that it’s a weird cost equation because it’s I wouldn’t say that one is more valuable than the other, but empirically one is much more expensive than the other.

Abhi: Yeah.

Hunter: That’s... and it is simply the case that doing preclinical model experiments is much more expensive.

Abhi: Okay. Cool. Hopefully someone gives you a hundred million dollars equity...

Hunter: I think we’re pr... I think we’re pretty good on capital for a while and thanks. If you’re showing up with equity-free checks, I’ll take them.

Abhi: Okay. Thank you so much for coming onto the podcast, Hunter.

Hunter: Thank you.

Abhi: Okay. Think we’re good.