<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Owl Posting: Primers]]></title><description><![CDATA[These are 'primer' posts. They are intended to be long, extensively researched deep-dives into specific scientific topics. I stick to the facts as much as possible, but also offer my own opinion pretty frequently. 

]]></description><link>https://www.owlposting.com/s/primers</link><image><url>https://substackcdn.com/image/fetch/$s_!-IFA!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621a39d3-39fa-4593-acf7-b271d3eedf1a_399x399.png</url><title>Owl Posting: Primers</title><link>https://www.owlposting.com/s/primers</link></image><generator>Substack</generator><lastBuildDate>Tue, 28 Apr 2026 15:12:50 GMT</lastBuildDate><atom:link href="https://www.owlposting.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Abhishaike]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[abhishaike@gmail.com]]></webMaster><itunes:owner><itunes:email><![CDATA[abhishaike@gmail.com]]></itunes:email><itunes:name><![CDATA[Abhishaike Mahajan]]></itunes:name></itunes:owner><itunes:author><![CDATA[Abhishaike Mahajan]]></itunes:author><googleplay:owner><![CDATA[abhishaike@gmail.com]]></googleplay:owner><googleplay:email><![CDATA[abhishaike@gmail.com]]></googleplay:email><googleplay:author><![CDATA[Abhishaike Mahajan]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Curious cases of financial engineering in biotech]]></title><description><![CDATA[7k words, 32 minutes reading time]]></description><link>https://www.owlposting.com/p/curious-cases-of-financial-engineering</link><guid isPermaLink="false">https://www.owlposting.com/p/curious-cases-of-financial-engineering</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 27 Apr 2026 12:29:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1yJN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1yJN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1yJN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1yJN!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7176439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/193109540?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1yJN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!1yJN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feaa075cf-253c-4e2e-b2f5-605bfa9af0bd_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: finance topics are slightly sensitive, so, while nothing in this article contains proprietary information, I will not include the names of people I talked with for this piece. I appreciate everyone who reached out to help me put this together! </em></p><div><hr></div><ol><li><p><a href="https://www.owlposting.com/i/193109540/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/193109540/finance-tries-to-make-failure-survivable-the-andrew-lo-thesis">Finance tries to make failure survivable: the Andrew Lo thesis</a></p></li><li><p><a href="https://www.owlposting.com/i/193109540/finance-makes-future-success-tradable-royalties-and-synthetic-royalties">Finance makes future success tradable: royalties and synthetic royalties</a></p></li><li><p><a href="https://www.owlposting.com/i/193109540/finance-rewrites-the-incentives-prvs-and-cvrs">Finance rewrites the incentives: PRVs and CVRs</a></p></li><li><p><a href="https://www.owlposting.com/i/193109540/finance-reaches-failure-itself-zombie-biotechs">Finance reaches failure itself: zombie biotechs</a></p></li><li><p><a href="https://www.owlposting.com/i/193109540/conclusion-what-does-finance-teach-biotech-to-value-and-should-we-worry">Conclusion: what does finance teach biotech to value, and should we worry?</a></p></li></ol><h1>Introduction </h1><p>For $250 million and ten years of your life, you may purchase a lottery ticket. The ticket has a 5% chance of paying out. When it does pay out, it pays roughly $5 billion. A quick calculation will show you that the expected value of the ticket is $250 million. This is essentially what drug development is. Or rather, it&#8217;s what drug development was, twenty years ago. The upfront payments have been climbing, the hit rates falling, and expected values have, at best, held flat. Should you buy a ticket?</p><p>Perhaps not. In fact, any reasonable player should have long since stopped playing this stupid game. Unfortunately, we still need drugs. People have cancer, and heart failure, and Alzheimer&#8217;s, and a thousand genetic diseases that nobody has ever heard of, and the only industry on Earth currently set up to do anything about any of this is the same industry running the lottery-ticket business described above. The game is dumb and we need it played anyway.</p><p>So the real question is not whether to play, but how to make playing less awful for those involved. And the answer, increasingly, is &#8216;<em>financial engineering</em>&#8217;: a set of structural tricks that let people hold more tickets than they otherwise could, or buy a fraction of the winning tickets after they&#8217;ve been drawn, or some other strange, clever thing that all financiers find obvious and everyone else has never heard of. All this, done to trade and barter over the risk inherent to the whole enterprise, slicing it into pieces small enough that someone, somewhere, is willing to hold each one in exchange for <em>something</em>. </p><p>I&#8217;ll walk through a handful of these, the people who invented them, and case studies involving the tactic. And at the end, we&#8217;ll ask the question of whether all these tricks are, in aggregate, altering what the pharmaceutical industry decides to value. </p><p>The first such trick, and the one that perhaps kicked off the start of the whole effort, was dreamed up by a man named Andrew Lo.</p><h1>Finance tries to make failure survivable: the Andrew Lo thesis</h1><p><a href="https://www.globenewswire.com/news-release/2020/06/24/2052635/0/en/BridgeBio-Pharma-Inc-Appoints-Biotech-Trailblazers-Brent-Saunders-and-Randy-Scott-and-Renowned-Economist-Andrew-Lo-to-Board-of-Directors.html">Andrew Lo is a finance professor at MIT's Sloan School of Management</a>. Among all TED talks that have ever been produced, there are few worth watching. Andrew&#8217;s talk, which has the wonderful title &#8216;<em><a href="https://www.youtube.com/watch?v=xu86bYKVmRE">Can Financial Engineering Cure Cancer?</a></em>&#8217;, is one of them:</p><div id="youtube2-xu86bYKVmRE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;xu86bYKVmRE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/xu86bYKVmRE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>I recommend you listen to the full thing, because it really is quite good. If you&#8217;re strapped for time, the core thesis is as follows:</p><p>Individual drug programs fail about 95% of the time. But this doesn&#8217;t mean the expected value of a single program is necessarily <em>bad</em>. As I said at the start: a 5% shot at a $5 billion blockbuster against a $200 million development cost is technically positive EV on paper. But this implies that you need to be able to survive the 95% of outcomes in which you lose everything, and most investors, reasonably, will not.</p><p><a href="https://www.nature.com/articles/nbt.2374">Lo's insight, published in a 2012 Nature paper, </a>was simple. Just bundle 50 or so drug programs into a single entity, one with a war chest of $5 to $15 billion, and roll the dice. The individual drug programs are still terrible standalone bets, but if they're sufficiently uncorrelated, at least <strong>one</strong> is almost guaranteed to hit, and it will hit big enough to pay off all the programs that failed. Which means you can keep playing, forever. Of course, the &#8216;<em>uncorrelated</em>&#8217; bit is the &#8216;<em>spherical cow</em>&#8217; part of all this. It&#8217;s impossible to do it perfectly, but it can be done well enough for risk to fall dramatically. </p><p>There&#8217;s an extra layer of complexity here about how if you can get the portfolio risk to be low enough, you can <a href="https://www.nature.com/articles/nbt.2374">issue debt </a><em><a href="https://www.nature.com/articles/nbt.2374">against</a></em> the portfolio to sell as bonds, which unlocks a much larger pool of non-venture capital who want more stable returns. This is arguably the most interesting thing that Andrew believed in, but this particular bit never really went anywhere. We&#8217;ll discuss the obvious &#8216;why not?&#8217; question at the end of this section. </p><p>The direct descendant of this whole thesis&#8212;at least the &#8216;drug portfolio&#8217; part&#8212;is <a href="https://bridgebio.com/?">BridgeBio Pharma</a>, founded in 2015 by <a href="https://www.linkedin.com/in/neil-kumar-6b460119">Neil Kumar</a>, who was Andrew&#8217;s student at MIT. It is structured almost identically to Andrew&#8217;s original thesis: a central holding company that creates subsidiary companies, each focused on a single rare disease. Each subsidiary has its own equity structure, its own management team, and 1-2 drug programs. If a subsidiary's drug fails, it dies, but BridgeBio survives. If it succeeds, the parent holds enough equity to capture massive upside. The company IPO&#8217;d in 2019, is now worth billions, and has a pretty good stock trend for a biotech. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QaaU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QaaU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 424w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 848w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 1272w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QaaU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png" width="447" height="307.5568513119534" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1372,&quot;resizeWidth&quot;:447,&quot;bytes&quot;:144169,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/193109540?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QaaU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 424w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 848w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 1272w, https://substackcdn.com/image/fetch/$s_!QaaU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fad29e2b7-2590-410a-a6ec-21aca7613f2a_1372x944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are spiritual cousins as well, such as <a href="https://en.wikipedia.org/wiki/Roivant_Sciences">Roivant Sciences, founded in 2014 by Vivek Ramaswamy</a>. It has a nearly identical corporate structure to BridgeBio&#8212;<strong>what&#8217;s come to be known as a &#8216;hub-and-spoke&#8217; model</strong>&#8212;but whereas BridgeBio does de novo drug development in rare diseases, <a href="https://adus.substack.com/p/how-does-roivant-work">Roivant in-licenses drugs that big pharma has abandoned for </a><em><a href="https://adus.substack.com/p/how-does-roivant-work">non-scientific</a></em><a href="https://adus.substack.com/p/how-does-roivant-work"> reasons</a>: portfolio reprioritization, executive turnover, M&amp;A reshuffling, quarterly earnings pressure. There are lots of these molecules floating around, and if you hire good enough people, you have the ability to spot them before anyone else. <a href="https://www.dcatvci.org/top-industry-news/roivant-sciences-in-spac-deal-valuing-the-company-at-7-3-bn/">Roivant went public in 2021 at a $7.3 billion valuation</a>, and its subsidiaries have completed<a href="https://investor.roivant.com/news-releases/news-release-details/roivant-and-priovant-announce-positive-phase-3-valor-study"> </a><strong><a href="https://investor.roivant.com/news-releases/news-release-details/roivant-and-priovant-announce-positive-phase-3-valor-study">twelve</a></strong><a href="https://investor.roivant.com/news-releases/news-release-details/roivant-and-priovant-announce-positive-phase-3-valor-study"> consecutive positive Phase 3 studies</a>. And it has an even better stock history!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BQkE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BQkE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 424w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 848w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 1272w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BQkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png" width="467" height="391.3362369337979" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:962,&quot;width&quot;:1148,&quot;resizeWidth&quot;:467,&quot;bytes&quot;:123791,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/193109540?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BQkE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 424w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 848w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 1272w, https://substackcdn.com/image/fetch/$s_!BQkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F004a72d2-cfa2-49cf-b9f5-990523949c14_1148x962.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This solves the fundamental problem of biotech, no? Really, in retrospect, it&#8217;s astonishing that we let anybody create a non-hub-and-spoke biotech. You have a set of bets, each one of which is individually stupid, and then you put them in a bag, and the bag becomes smart by virtue of each bet being insanely high variance. It is the obvious thing to do. </p><p>Unfortunately, upon trying this out, we will run into two big problems. The first one is that running many drug programs at the same time is really hard. And the second one is that people <em>know</em> running many drug programs at the same time is really hard, and they will price any attempt to do so accordingly. </p><p>An exemplar of the first lesson is <a href="https://centessa.com/">Centessa Pharmaceuticals</a>. Centessa was founded in late 2020 by <a href="https://www.medicxi.com/">Medicxi</a>, a life-sciences venture firm, as another implementation of this thesis: ten private biotech companies, each with its own single asset, combined under one holding entity, taken public in May 2021 at $20 a share. Though they are often held up as paragons of the Andrew Lo thesis (including by me!), Roivant and BridgeBio weren&#8217;t <em>real</em> hub-and-spoke enthusiasts. Centessa was. Whereas Roivant in-licensed abandoned pharma assets and BridgeBio concentrated almost entirely on rare genetic disease, Centessa bravely stuck to the Lo script: a portfolio of genuinely uncorrelated clinical risk. Their spokes covered: <em>hemophilia, oncology, pulmonary hypertension, narcolepsy, fibrotic disease, autoimmune disease &#8212; i</em>f there was any correlation risk, it was that drug development was occurring at all.</p><p>The model did not work. Within eighteen months Centessa was shutting down spokes. By 2023, they had abandoned the hub-and-spoke model entirely and pivoted to a single-asset company focused on orexin agonists for sleep disorders. That pivot, to be clear, worked spectacularly. <a href="https://investor.lilly.com/news-releases/news-release-details/lilly-acquire-centessa-pharmaceuticals-advance-treatments-sleep">Lilly bought them for $6.3 billion in early 2026, </a>making Centessa one of the more successful biotech exits of the decade. <strong>But they got there by becoming a single-asset compan</strong>y. What had gone so wrong with the original thesis? The surface answer is a mix of capital and luck. Several spokes failed on their own merits, and the 2022-ish biotech market crash closed the door on funding whatever was left. Centessa shareholders ended up all right in the end, but hub-and-spoke models are empirically not silver bullets for the hard problem of drug development. </p><p>The second problem here is that people simply may not believe in your so-called &#8216;<em>uncorrelated risk portfolio</em>&#8217;. This will obviously happen when you raise money to pursue the venture, and it will, surprisingly, happen again once you go public. </p><p>As an example: did you notice that big drop in BridgeBio&#8217;s stock in late-2021? This is when their lead candidate acoramidis&#8212;a treatment for a rare heart condition called transthyretin amyloid cardiomyopathy&#8212;failed to beat placebo on its primary endpoint in a Phase 3 trial. <a href="https://www.nasdaq.com/articles/is-bridgebio-stock-a-buy-following-heart-disease-drug-fail-analyst-weighs-in">The stock dropped 72% in a single day</a>. This was not the tidy portfolio-theory response. The rational response would be &#8220;well, <em>BridgeBio has <a href="https://www.globenewswire.com/news-release/2021/12/27/2358009/0/en/BridgeBio-Pharma-Reports-Month-12-Topline-Results-from-Phase-3-ATTRibute-CM-Study.html">four other clinical-stage programs and $800 million in cash</a>, so the diversified portfolio thesis should protect us</em>." The market said "<em>holy shit, the lead asset is dead, the portfolio theory behind this company is nonsense, sell it</em>," and priced that sentiment accordingly. </p><p>The funny part of this all is that BridgeBio kept running the trial. The 12-month primary endpoint had failed, but the study was designed to run to 30 months, with a harder secondary endpoint: death and cardiovascular hospitalization. <a href="https://finance.yahoo.com/news/bridgebio-bbio-heart-drug-meets-135300848.html">In July 2023, the longer-term data read out, and acoramidis </a><em><a href="https://finance.yahoo.com/news/bridgebio-bbio-heart-drug-meets-135300848.html">worked</a></em>, with the secondary endpoint being met. The stock surged 76% in a day, BridgeBio eventually won FDA approval, and <a href="https://www.biospace.com/drug-development/attr-cm-approval-for-bridgebio-could-trigger-tight-race-with-pfizer">the drug&#8212;now on the market&#8212;is called Attruby</a>. Stressful! </p><p>Well, that&#8217;s that. But we should return to Andrew Lo for a second. The part of Lo&#8217;s idea that did not arrive, at least not in its original form, was the bond-market part. Why has no one implemented what was arguably the most clever part of his pitch: <strong>issuing debt against your drug portfolio, allowing you to access vast sums of institutional, low-risk capital?</strong>  </p><p>Well, to some degree, someone has, but only for <em>approved</em> drugs. <a href="https://bpcruk.com/">BioPharma Credit </a>is one such institution, and makes secured loans to commercial-stage biotechs, typically collateralized by the revenue stream of one or more approved products. </p><p>But nothing like this exists for clinical-stage stuff. Why not? Happily, <a href="https://carlsonschool.umn.edu/sites/carlsonschool.umn.edu/files/2023-03/JFI%20Review%20Lo%20Thakor%20Final.pdf">Lo himself offered an answer</a>, almost a decade after his first paper. For one, biotech is simply not used to that type of financing so they don&#8217;t do it, and two, the extreme scale of financing that this unlocks has simply not yet been needed, so nobody can raise it. <strong>But the third most important point is a lack of institutional support.</strong> There is no biomedical Moody's&#8212;no quantitative, authoritative voice that can tell a pension fund how risky a portfolio of drug assets is. And even if there were, there is no biomedical Fannie Mae&#8212;no government-backed entity that acquires biopharma loans and securitizes them into something an institutional allocator would actually buy. Our field exists in the same state that mortgages were in the 1930s, which were considered too risky for banks to buy until the federal government created these two pieces of infrastructure to make it safe. </p><p>But, Lo posits, the need for capital <em>eventually</em> changes behaviors, biology is poised to only grow far larger than it is today, and models for drug portfolio risk adjustment are only getting better. Four years after the paper, I am unsure whether much has changed, but we&#8217;ll see what the future holds. </p><h1>Finance makes future success tradable: royalties and synthetic royalties</h1><p>Drug royalties are pretty simple. You discover an interesting target or chemical, but don&#8217;t want to bother with developing it further. So you pawn it off to a big pharmaceutical company with a lot of resources, alongside a contractual agreement that you&#8217;ll receive 3% of net sales if a drug based off your work is eventually approved and commercialized. <strong>And like any contractual agreement, it can be bought and sold.</strong></p><p><a href="https://www.royaltypharma.com/">Royalty Pharma</a>, founded in 1996 by Pablo Legorreta, is the largest company in this market and possibly the purest expression of financialized drug development that exists. It has no labs, no therapeutics arm, and no ambition to discover drugs itself. It buys royalties, from universities, academic medical centers, small biotechs, individual inventors, and holds them. The portfolio includes claims on <a href="https://www.nytimes.com/2017/07/08/business/dealbook/drug-prices-private-equity.html">7 of the 30 most-prescribed drugs</a> in the United States. It reported $2.38 billion in revenue in 2025 from what is, spiritually, a filing cabinet.</p><p>Is this rent-seeking? If you look at the details, it actually feels pretty fair to all parties involved. A university that has a royalty over some particular drug developed by a professor has no ability&#8212;or desire!&#8212;to forecast its chance for success, its revenue if approved, or how to hedge the risk that a competitor enters the market. It also very likely has a preference for less money today than more money over the ten years of a drug&#8217;s exclusivity period. Royalty Pharma and its competitors have the opposite preferences and all the abilities the university lacks. The university gets liquidity and certainty; Royalty Pharma gets a claim on an approved drug's revenue stream at a discount to its expected value. Both sides win.</p><p>But the more interesting recent development is the rise of <em>synthetic</em> royalties.</p><p>A traditional royalty is a pre-existing legal right. It exists because someone did the original research and negotiated a licensing agreement. A synthetic royalty is different. <strong>It&#8217;s a manufactured financial claim on future drug revenues that didn&#8217;t previously exist.</strong> Consider an example: a biotech company has a drug in clinical development, one that it owns entirely. It needs money. It doesn&#8217;t want to issue equity (dilutive) or take on debt (requires collateral). So it invents a drug royalty from scratch, an entirely new obligation that did not previously exist, and sells that. Now they do not own the drug&#8217;s IP entirely, some other party owns 3% of the future sales of it if it ever succeeds, and the biotech gets non-dilutive capital today.</p><p>What&#8217;s the difference between these increasingly complex synthetic royalty agreements and typical, bespoke pharma deals? They feel similar. And yes, they are functionally equivalent in terms of cash flow or deal structure. Where the difference lies is in each party&#8217;s intent. In typical pharma deals, the buyer cares about something <em>strategic, </em>say, operational control over a drug&#8217;s development journey. Buyers of royalties, synthetic or otherwise, do not care about that. They care entirely about the probability-weighted present value of the future payments, and you can imagine how useful this decoupling of capital from often burdensome partnership demands can be. </p><p>The royalty market is, in some sense, <strong>a secondary market for the financial value typically embedded in pharma licensing agreements</strong>. And it's still early.</p><p><a href="https://biotechbriefings.gibsondunn.com/royalty-report-royalty-finance-transactions-in-the-life-sciences-2020-2024/">One report found that there were 102 major royalty transactions from 2020 to 2024</a>, noting that synthetic royalties are growing at 33% annually. The buyer pool includes not only royalty-centric funds like Royalty Pharma, but increasingly <strong>pension funds and private equity as well. </strong>The same institutions Andrew Lo wanted to be in on biotech<strong> are</strong> getting in on the game, just in a different way.<em> </em></p><p>This whole class of synthetic royalties is growing more complex over time, with some even including milestones built into the sold contract, such that the seller (the biotech) receives even more upfront capital upon the achievement of Phase 2 trial success or outright drug approval. The whole concept is also growing <em>physically</em> larger. <a href="https://www.royaltypharma.com/news/royalty-pharma-and-revolution-medicines-enter-into-funding-agreements-for-up-to-2-billion/">In June 2025, Royalty Pharma and Revolution Medicines announced a $2 billion funding agreement</a>&#8212;$1.25 billion of which was structured as a synthetic royalty&#8212;to fund the development of daraxonrasib; the largest ever transaction in this particular asset class. </p><p>But at the same time, within synthetic royalties, you can see the beginnings of a financial instrument that is strange enough that one cannot easily predict its second- or third-order effects. Pharma partnership agreements can be burdensome in the demands they make, but they are at least &#8216;time-bounded&#8217; in ways that are easy to plan for. <strong>Synthetic royalties follow a company around forever, as long as a drug is under patent, actively extracting value all the way, their only contribution being an initial surge of capital</strong>. This is nobody&#8217;s fault of course, least of all the royalty holders. &#8216;<em><a href="https://www.youtube.com/watch?v=ag14Ao_xO4c">We are selling to willing buyers at the current fair market price</a>&#8217; </em>and all. But the cumulative effect, as more drugs carry more synthetic royalty obligations, is a pharmaceutical economy where an increasingly large fraction of every dollar of drug revenue is pre-committed to financial intermediaries before the drug reaches a single patient. </p><p>But it&#8217;s not as cut and dry as &#8216;<em>synthetic royalties are bad</em>' because of this. Consider the Revolution Medicines case from earlier. Their drug daraxonrasib has a strong chance of being a blockbuster, and so scaling global commercialization will be enormously expensive. An equity raise would have diluted ownership right before value-inflecting Phase 3 readouts (<a href="https://ir.revmed.com/news-releases/news-release-details/daraxonrasib-demonstrates-unprecedented-overall-survival-benefit">which were excellent</a>!), traditional debt at that scale would be impractical, and a pharma partnership would surrender commercial rights to what could be a decade-long franchise of label expansions. The synthetic royalty allowed Revolution to sidestep all three, largely as a result of the royalty being <a href="https://ir.revmed.com/news-releases/news-release-details/revolution-medicines-enters-2-billion-flexible-funding-agreement">tiered, decreasing with sales volume, and dropping to zero above $8 billion in annual net sales</a>. If daraxonrasib becomes a true blockbuster, the royalty burden effectively caps out and becomes negligible as a percentage of revenue. </p><p>But why would Royalty Pharma agree to this at all? Isn&#8217;t this clearly <em>not</em> in their favor? Not at all: they likely just did the numbers, and anything above some certain threshold in yearly sales is both unlikely and unneeded for their portfolio math, so they are happy to give the tail scenario away for free. </p><p>All of this, only possible because there is an entity willing to buy a manufactured claim on future revenue that didn't exist until someone decided to create it. The royalty market shows the basic pattern: once a future drug cash flow becomes legible, someone will turn it into a security. </p><h1>Finance rewrites the incentives: PRVs and CVRs</h1><p>What we&#8217;ve discussed so far assumes some degree of intentionality. Andrew Lo purposefully came up with his thesis, Royalty Pharma deliberately built a business around drug royalties, and so on. But there are two particular financial instruments that were intentionally designed at the start, but have slowly begun to display an unpredictable life of their own once deployed. I&#8217;d like to discuss them because I think they do a great job in demonstrating not only how tradable instruments in finance can have material impact in how drug development works, but also how those impacts can be very difficult to predict in advance. </p><p>The two are PRVs, or <strong>Priority Review Vouchers, </strong>and CVRs, or <strong>Contingent Value Rights. </strong></p><p>We&#8217;ll start with PRVs. </p><p>In 2006, three professors at Duke published a paper titled &#8220;<a href="https://people.duke.edu/~dbr1/research/developing-2006-preprint.pdf">Developing Drugs for Developing Countries</a>". In it, they discuss a well-trodden problem: infectious and parasitic diseases create enormous health burdens in the developing world, but because the people suffering from them are poor, there's essentially no commercial incentive to develop treatments. Of course, ideally there would be some way to incentivize for-profit companies to do it. But financial incentives require money, and money requires Congress, and Congress requires political will that rarely materializes for diseases affecting people who can't vote in U.S. elections. </p><p>The fix, the authors argued, is to use a <em>logistical</em> incentive instead. If you are willing to develop a drug for a neglected disease, the government ought to help you out somewhere <em>else</em> in your drug portfolio. </p><p>How? By offering you a PRV. But what use is the PRV? Once a pharmaceutical company has wrapped up their clinical trial work and submits an application to the FDA for official approval, they must wait 10 months for FDA review. But if they submit this one-time-use-only voucher <em>alongside</em> the application, FDA should be forced to give you a review within 6 months. And just in case you don&#8217;t actually have an internal portfolio of drugs to allocate this PRV to, the voucher should also be <strong>sellable</strong>. Four months of time-value of earlier market entry for a &#8216;top-decile&#8217; drug can be worth an awful lot, <a href="https://pubmed.ncbi.nlm.nih.gov/16522573/">around $300 million according to the authors</a>. </p><p>You can imagine a very neat feedback loop from all this. For instance, a small nonprofit or academic group develops a river blindness treatment, receives a voucher, and can then sell the voucher to Pfizer to use the proceeds to fund more neglected disease work. </p><p>In a rather astonishing act of &#8216;<em>listening to healthcare economists</em>&#8217; that I don&#8217;t believe ever occurred thereafter, <a href="https://en.wikipedia.org/wiki/Priority_review">Congress enacted the program in 2007</a>, just a year after the paper&#8217;s publication. It expanded again in 2012 to include rare pediatric diseases. And again in 2016 to include medical countermeasures against biological/chemical/radiological threats. </p><p>There are two things I find very interesting about PRVs. </p><p>The first is that, as the title of this section implies, the PRVs have gained secondary market price dynamics that its creators never intended. The buying cost of a PRV at any given moment is a function of how many are floating around, how many blockbuster drugs are approaching FDA submission, the competitive landscape, and whether Congress has recently done something to expand or contract the program. <a href="https://www.wesa.fm/2015-08-19/price-rises-for-ticket-to-a-quicker-drug-review-by-fda">AbbVie paid $350 million for a single voucher in 2015</a>&#8212;the all-time high, driven by the voucher being the only one out there <em>and</em> that their competitor was releasing a similar drug to theirs. <a href="https://www.fiercebiotech.com/biotech/novartis-buys-priority-review-voucher-pharming-discount-price-21m">Novartis picked one up in 2023 for $21 million</a>&#8212;the all-time low. </p><p>How did Novartis get one for so cheap? Funnily, that particular story <em>also</em> illustrates the increasingly complex financialization of biotech quite well. When Novartis licensed a particular drug to a particular biotech back in 2019, it baked in a &#8220;<em><a href="https://www.pharming.com/news/pharming-announces-sale-priority-review-voucher">pre-agreed, contractually defined percentage of the PRV value</a></em>&#8221; into the licensing agreement four years before the voucher existed, and, in fact, <strong>before the drug itself had even been approved</strong>. And when the biotech got the drug approved and received the voucher in 2023, Novartis simply exercised the option to purchase it for a ridiculously low value. </p><p>Imagine being the person behind that deal!</p><p>The second thing, even <em>more</em> interesting is that the whole program has increasingly begun to bear no fixed relationship to the social good it was meant to incentivize. Why? Because even at the voucher's peak secondary market value of $350 million&#8212;though it usually oscillated around the $100M mark&#8212;<strong>it</strong> <strong>was not enough to shift a large pharma company's portfolio allocation in any meaningful way</strong>. In the few cases it did, it shifted it towards doing the absolute, most bare-minimum possible thing: approval of the drug, not <em>utility</em> of the drug. The voucher pays for the regulatory event, not the public-health outcome. In a great paper titled &#8216;<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11624706/">The priority review voucher: a misconceived quid pro quo</a>&#8217;, the authors say this: </p><blockquote><p><em>&#8230;the PRV, except few examples, has largely failed to deliver medical benefits for patients suffering from neglected diseases <strong>because it rewards obtaining FDA marketing authorisation without regard for the products actually being</strong> <strong>available, affordable and equitably accessible for people.</strong></em></p></blockquote><p>Now, it would be lying to tell you that PRVs have not helped anyone. They have! But there have been enough cases of bad behavior here that it is worth wondering if there is something better that is possible. This is, in fact, being worked on, but it takes us off topic, so I&#8217;ve put some details about it in the footnotes<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>.</p><p><strong>The second financial instrument I want to discuss is the CVR, or Contingent Value Right.</strong> </p><p>CVRs are simple. When an acquirer and a target company cannot agree on what a drug-in-development is worth&#8212;which is most of the time&#8212;they structure a simple conditional payment. If the acquired drug(s) hits a specified milestone, the acquirer pays the target's former shareholders an additional sum. Most CVRs are structured like normal pharma partnerships, as in, a closed, non-transferable contract between two partners. A small minority of them are structured as tradable securities, listed on the NYSE or Nasdaq with their own ticker symbols, but these aren&#8217;t particularly special beyond their raw size. </p><p>What is most interesting about CVRs is the perverse incentives they sometimes create. </p><p><a href="https://pharmaphorum.com/news/sanofi-settles-dispute-with-genzyme-investors-over-ms-drug">When Sanofi acquired Genzyme for $20 billion in 2011</a>, Sanofi issued CVRs tied to the regulatory approval and commercial success of Lemtrada (alemtuzumab), an MS drug that Genzyme had been developing. Up to $3.8 billion was on the table if the drug hit its milestones. <strong>But Sanofi was </strong><em><strong>also</strong></em><strong> simultaneously developing its own MS drug, Aubagio</strong>. Aubagio had no CVR obligations attached to it. </p><p>Sanofi was now contractually obligated to compete vigorously against itself, on behalf of strangers, for free. Predictably, it did not.</p><p>Obviously, Sanofi was sued for this. The former shareholders alleged that Sanofi deliberately slow-walked Lemtrada's FDA submission and under-invested in its commercialization to minimize CVR payouts. But deliberate sabotage is hard to distinguish from ordinary sluggishness. <a href="https://www.biopharmadive.com/news/sanofi-pay-315-million-settle-lemtrada-cvr-go-slow-claims/566350/">Sanofi settled in 2019 for $315 million</a>&#8212;well short of the $708 million in missed payouts the shareholders claimed&#8212;without admitting wrongdoing.</p><p>The pattern repeated more recently and at even larger scales in 2019, with <a href="https://www.fiercepharma.com/pharma/as-expected-former-celgene-shareholders-sue-bristol-myers-squibb-for-6-4b-claiming-blatant">BMS's $74 billion acquisition of Celgene</a>, in which $6.4 billion in CVR payouts hinged on three drugs hitting FDA approval by fixed deadlines. Two were approved on time. The third missed by thirty-six days. As a result, the entire CVR expired worthless. As you may expect, former shareholders again sued. </p><p>If we were to generalize this, the structural problem is simple: a CVR can make the buyer responsible for creating a payout that the buyer would rather not pay. But if that&#8217;s the case, why are CVRs&#8212;<a href="https://www.biopharmadive.com/news/cvr-biotech-pharma-deals-contingent-value-right-price-acquisitions/806612/">which are accelerating in their popularity</a>&#8212;done at all? For one, the above case studies are very much not the norm, most go on perfectly fine. And two, the value of CVRs as a coordination mechanism, even when they go wrong, empirically outweighs the later headaches they cause. </p><h1>Finance reaches failure itself: zombie biotechs</h1><p>Royalty Pharma is not the only player in the royalty space. There are a few others, one of them named <a href="https://investors.xoma.com/">XOMA Royalty</a>. XOMA is especially interesting, because it was once a traditional biotech company that developed and licensed drugs. And <a href="https://www.thepharmaletter.com/biotechnology/xoma-the-royalty-aggregator-that-thinks-like-a-biotech">in 2017, it pivoted to become a royalty aggregator</a>. And starting in 2024, it began to poke at the business of buying up, and liquidating, &#8216;<a href="https://www.statnews.com/2025/02/20/why-biotechs-future-is-threatened-by-zombies/">zombie biotechs</a>&#8217;. </p><p>Zombie biotechs are publicly traded companies whose stock price is below the cash on the balance sheet. This translates to investors saying that their IP, patents, clinical data, team, all of it, is not only worthless but is actively destroying value by burning through cash that would be better deployed sitting underneath a bed. Roughly 300 companies fit this description in mid-2024, most of them casualties of the 2020-2021 IPO bubble, when a lot of biotechs went public that had no business doing so.</p><p>These companies can&#8217;t raise equity (who would buy?), can&#8217;t take on debt (against what collateral?), and can&#8217;t be bought/merged with anyone (who would want them?). In an ideal world, the founders would simply put the whole business out of its misery, but they are collecting a paycheck anyway with their dwindling cash reserves <em>and</em> closing down a public company is a surprisingly legally fraught thing to do. So they just wander around as zombies. </p><p><a href="https://www.biopharmadive.com/news/xoma-royalty-zombie-biotechs-liquidate-wind-down/760535/">XOMA&#8217;s insight was that this particular purgatory may itself be an asset class.</a> They step in, acquire the company at or below cash value, and return cash to the shareholders who have been trapped in a slowly deflating stock for years. Then, they take a close look at everything the company created&#8212;patents, clinical data packages, licensing rights, partially completed regulatory filings&#8212;and sell it, keeping the profits for themselves. Or simply hold it, just in case it&#8217;ll be useful elsewhere in their portfolio. </p><p>The concept itself is not new. This is <a href="https://en.wikipedia.org/wiki/Vulture_fund">vulture investing</a>, translated into biotech. But whereas a typical vulture investor&#8217;s goal is to flip an entire <em>company</em> onto someone else, the biotech vulture capitalist&#8217;s hope is to sell off <em>pieces</em> of the company. And the pieces can be surprisingly valuable. A drug candidate that failed a Phase 2 trial in one indication can be worth millions to, say, some of the hub-and-spoke companies we discussed earlier. Maybe Roivant believes that the endpoint was misspecified, or the indication was wrong, or that the drug is indeed useless, but that the PK/PD, safety signals, biomarker responses, regulatory responses, and dose-response curves uncovered during the trial are useful, and they&#8217;d be willing to pay vast sums for that data. What XOMA does here is make this information legible to potential buyers. </p><p>To help illustrate this, let&#8217;s consider a case study: Kinnate Biopharma. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dGh0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dGh0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 424w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 848w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 1272w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dGh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png" width="724" height="297.35714285714283" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:598,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:623185,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/193109540?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!dGh0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 424w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 848w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 1272w, https://substackcdn.com/image/fetch/$s_!dGh0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F96636301-fa2c-4f5d-824f-ff5ab83946b4_1792x736.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kinnate was an oncology company developing kinase inhibitors for cancer patients with specific genetic mutations. As the story goes for many companies of that era, they went public and by early 2024 were trading below their cash balance. There was no outright clinical trial failure, they simply ran out of money to develop their drugs further. <a href="https://investors.xoma.com/news-events/press-releases/detail/447/xoma-enters-into-agreement-to-acquire-kinnate">In February 2024, XOMA announced it would acquire Kinnate for roughly $2.50 per share in cash, or $126 million</a>. Then, over the next year, XOMA sold all five of Kinnate&#8217;s pipeline assets to other companies. <a href="https://investors.xoma.com/news-events/press-releases/detail/472/xoma-royalty-completes-sale-of-kinnate-pipeline-assets">In April 2025, they announced the completion of these sales</a>, with terms entitling XOMA to <strong>up to $270 million in upfront and milestone payments</strong>, plus, keeping to their name, <strong>ongoing royalties ranging from low single digits to mid-teens on commercial sales. </strong>Kinnate&#8217;s shareholders received most of the upfront payment, and XOMA got to <strong>double</strong> its money in flipping the assets. </p><p>What would the counterfactual be if XOMA had not stepped in? Kinnate would&#8217;ve continued to bleed cash until it ran out. At that point, the IP would have been worth even less&#8212;the utility of biological information depreciates fast!&#8212;and the shareholders would have gotten back even less, perhaps nothing at all. </p><p>There is another player in this space worth discussing: Kevin Tang, through <a href="http://linkedin.com/company/tang-capital-management">Tang Capital</a> and its acquisition vehicle <a href="https://www.concentrabiosciences.com/">Concentra Biosciences.</a> By mid-2025, Concentra had become one of the busiest buyers in biotech, making repeated bids for distressed public companies, with the explicit intention of closing them down, selling whatever assets could still be sold, returning some cash to shareholders, and keeping whatever spread remained. </p><p>Isn&#8217;t this quite similar to XOMA? Yes, both XOMA and Concentra are buyers of distressed, sometimes very clearly, biotechs. But the difference is <em>when</em> they arrive. XOMA typically shows up at the doorstep of companies that are clearly on death's door. <strong>But Concentra often arrives earlier</strong>, while the public company is technically alive and its board is still weighing bad alternatives: reverse merger, dilutive financing, slow wind-down, strategic review, or sale. And Concentra aggressively attempts to force the boards hand into a sale to <em>them</em>. </p><p>To be fair, &#8216;<em>force</em>&#8217; is a bit strong of a word here. A better term would be &#8216;<em>an offer they can&#8217;t (easily) refuse&#8217;. </em>Concentra&#8217;s pattern is to accumulate a large minority stake, make an unsolicited bid, and dare the board to explain why shareholders should keep funding the burn instead of taking cash now. </p><p>Why can&#8217;t they refuse it? </p><p>Consider <a href="http://linkedin.com/company/jounce-therapeutics">Jounce Therapeutics</a>. In February 2023, Jounce<a href="https://www.fiercebiotech.com/biotech/jounce-pounces-exit-opportunity-laying-57-staff-and-agreeing-reverse-merger-redx"> announced a reverse merger</a> with <a href="https://www.redxpharma.com/">Redx Pharma</a>, alongside a 57% workforce reduction. This was not exactly a happy ending, but it was at least a <em>biotech</em> ending: Redx&#8217;s pipeline would become the core of the combined company, Jounce shareholders would own a minority stake, and some version of the organization would continue to exist. <a href="https://www.reuters.com/markets/deals/jounce-dumps-redx-pharma-acquisition-by-concentra-cut-84-jobs-2023-03-27/">Then Concentra appeared</a> with an offer that promised even more liquidation to the shareholders, but one that would completely strip-mine Jounce to sell off as parts. </p><p>Tang is not doing anything illegal here, nor are boards literally compelled to accept every higher bid that comes along. But once a company has put itself in sale mode, the board starts to look less like a steward of a scientific project and more like <em>an auctioneer for whatever value remains. </em>This creates a bleak asymmetry. A reverse merger can be better for the people inside the company, better for the local biotech ecosystem, and perhaps even better for the vague moral category of &#8220;<em>letting the science continue</em>.&#8221; But that is not the job of the board to further. Their job is to ensure the shareholders are best served, and for them, Concentra&#8217;s highly liquid offer is difficult to argue against. In Jounce&#8217;s case, the Concentra transaction also came with an 84% workforce reduction. <strong>The board went with the Concentra offer.</strong> </p><p>Curiously, there are ways for companies to fight back against Concentra, and fight back they have. <a href="https://www.biospace.com/business/pliant-pops-poison-pill-as-concentra-threat-looms">Their weapon is colloquially referred to as a &#8216;poison pill&#8217;,</a> and goes as follows: if Tang keeps buying shares and crosses a threshold, usually around 10%, then every <em>other</em> shareholder receives the right to buy more stock at a discount, instantly diluting Tang. This does not resurrect the company, and it does not make Tang go away. It simply prevents Tang from buying enough stock in the open market to make liquidation feel inevitable before the board has themselves decided it is inevitable. </p><p>This is all quite interesting. But it is likely a transient phenomenon. The zombie biotech liquidation market is a finite resource; the 300 companies trading below cash are overwhelmingly a product of the 2020-2021 vintage, a specific historical moment when the bar for going public was unusually low. That cohort is being worked through. Some will be acquired by the players discussed here. Others will manage to raise capital and survive. Most will simply wind down on their own, returning whatever cash remains to shareholders without the intermediation of a vulture buyer. Unless there&#8217;s another IPO bubble of comparable scale soon, the supply of zombie biotechs will shrink over the next few years, and the opportunity that is currently being exploited will narrow. </p><p>So why do I mention this at all? </p><p>The zombie biotech business is worth dwelling on because it marks a kind of endpoint. Whereas every other instrument in this essay financializes drugs that might still become therapies, these are different. XOMA financializes drugs that won&#8217;t, and Concentra financializes drugs that <em>likely</em> won&#8217;t. If the frontier of financial creativity has reached the dead and dying, it tells you something about how thoroughly every other surface has already been colonized. </p><h1>Conclusion: what does finance teach biotech to value, and should we worry?</h1><p><a href="https://pubmed.ncbi.nlm.nih.gov/23023199/">Andrew Lo&#8217;s original insight</a> was not that finance could make drug development easy. Nothing can make drug development easy. His insight was that finance might make failure <em>survivable.</em> I think this is directionally correct. Financialization is just the process of making implicit economic relationships explicit and tradable, and more liquid markets for biotech risk are almost certainly better than fewer. And what has happened since Lo&#8217;s 2012 paper is that financial engineering has been applied not just to the drug portfolio problem, but to every conceivable surface of the drug development process: partnerships, mergers, royalties, and even the death of a company. </p><p>Is this a bad thing? Probably not. Objecting to the decoupling of finance from therapeutic value is a bit sentimental, since, in theory, financial incentives <em>should</em> track therapeutic value. But how confident are we about that? Are we boiling a frog here? And if we are, what exactly is the frog?</p><p>Like I said at the start, it is important to understand that financial engineering is happening for a reason: this whole industry is excruciatingly difficult to build something in. It&#8217;s only getting worse too. Starved of capital, clever people will figure out ways to offer it in increasingly exotic forms to increasingly desperate scientists or companies, and little can prevent these two from finding each other at a bar. The alternative to a financialized biotech industry is not some prelapsarian era of pure scientific inquiry. It is the same industry, with the same problems, but less money and fewer ways to deploy it.</p><p>But let&#8217;s say we are being idealistic here. What, then, should worry us about financialization squishing its way deeper into drug development? I&#8217;m happy to raise my hand first: I'm a little worried about whatever <a href="https://hms.harvard.edu/news/what-happens-when-private-equity-takes-over-hospital">private-equity did to hospitals</a> happening, in slower and less visible ways, to molecules themselves. Yes, I realize the nature of drug discovery imposes a constraint that most financialized industries don't have: <strong>the thing has to actually work</strong>. The FDA is a binary filter that no amount of financial engineering can route around, and as long as that's true, the typical finance-driven enshittification story shouldn&#8217;t apply here. </p><p>But "<em>working</em>" and "<em>mattering</em>" are not the same thing. For instance, you&#8217;ll notice that both Roivant&#8217;s and BridgeBio&#8217;s drug pipelines share a similarity: <strong>a focus on rare diseases.</strong> Finance people love rare diseases. Small trials, clear genetic etiology, often no existing standard of care, accelerated approval pathways, and excellent unit economics. This is fantastic for the several hundred, perhaps several thousand, patients helped by this work, and I don&#8217;t intend to minimize it. But would GLP-1s come out of this process? Would <a href="https://en.wikipedia.org/wiki/Lenacapavir">PrEP</a>? </p><p>This doesn&#8217;t <em>have</em> to be a big deal. All of these could coexist. Big pharma and startups continue to have high variance bets, the financialized folks stay low variance, they work together when needed, the world is at peace. But capital is finite, and drug development keeps getting more expensive and less predictable. My worry is not that BridgeBio and Royalty Pharma are doing something bad. They aren&#8217;t, and are in fact doing something very good. The worry is that they are doing something so legible, so well-suited to the preferences of the capital markets, <strong>that the money increasingly, naturally flows to them and nowhere else.</strong> </p><p>Is this a real worry?</p><p>On one hand: obviously not. The sort of financialized rare disease work presented here may look quite good, but it still makes up an extremely small portion of biotech funding&#8212;<a href="https://www.biospace.com/business/facing-a-dearth-of-big-pharma-interest-rare-disease-players-get-creative-to-fund-r-d">around 2%</a>. And it is not like Roivant or BridgeBio are poking at some genuinely undiscovered alpha. They are about a decade old, and despite their success, still don&#8217;t have many peers. Maybe this market is self-limiting. Maybe there are only so many BridgeBio-shaped opportunities in the world, and the rest of the biotech-earmarked dollars must go towards the higher-variance stuff. </p><p>On the other hand, the counterargument is the patent cliff. Between 2025 and 2030, <a href="https://deepceutix.com/insights/patent-cliff-reformulation">patents for nearly 200 drugs are set to expire</a>, including roughly 70 blockbusters. More than $300 billion in revenue is at risk, or about one-sixth of the industry&#8217;s annual revenue. Patent cliffs are normal, but this one is unusually large, weighing in at<a href="https://www.drugpatentwatch.com/blog/beyond-the-patent-cliff-15-strategies-for-pharmaceutical-lifecycle-management/"> three times the size of the cliffs of the 2010s in lost revenue.</a> Five of the top 10 pharmaceutical firms face a potential hit exceeding 50% of their current revenue. </p><p>What changes after an event like that? Perhaps Big Pharma will increasingly look towards easier, lower-risk/lower-reward diseases. Maybe they&#8217;ll be increasingly sympathetic to royalty and synthetic royalty funding agreements, further cutting into the economics of a drug. Maybe this leaks over into the public markets, and the diffuse preferences of a thousand allocators would rather fund the pharmaceutical companies who go down that path, instead of continue with the status quo. </p><p>The frog is not any single drug or company. It is the industry&#8217;s willingness to fund biology that is illegible, expensive, and likely to fail, which is to say, the kind that occasionally changes the world. Again: financial engineering did not create this problem&#8212;that fault can be attributed entirely to R&amp;D productivity decline. In fact, the financiers may even be an especially brave vanguard in giving biotech the veneer of being a viable asset class. But they still may wind up making the <em>response</em> to the underlying problem worse by offering a way to achieve returns in ways that slowly diminish our institutional capacity to create the next generation of revolutionary medicines. </p><p>To end this off: I have deliberately left out China, which may be the most aggressive current example of financial architecture shaping a drug pipeline. That deserves its own essay, and will get one soon. </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p> In June 2025, the CNPV (Commissioner&#8217;s National Priority Voucher) <a href="https://www.fda.gov/industry/commissioners-national-priority-voucher-cnpv-pilot-program">was </a>announced by FDA Commissioner Makary, and represents a brave new direction of the concept: a non-transferable voucher that can be used for a <strong>1-2 month review period</strong> and is awarded based on alignment with &#8220;<em>critical U.S. national health priorities.&#8221;</em> What does this mean? Nobody knows! </p><p>What we do know is that 18 vouchers have been awarded so far, 4 products have been approved through the program, and the whole thing has basically zero external visibility. If you go online, there is a lot of distaste about the whole thing, including two lawmakers who expressed that the program could &#8220;<em><a href="https://www.fiercepharma.com/pharma/fda-solicits-feedback-controversial-national-priority-voucher-review-pathway">enable corruption by creating a new, lucrative gift for drugmakers and allies politically favored by President Trump</a></em><a href="https://www.fiercepharma.com/pharma/fda-solicits-feedback-controversial-national-priority-voucher-review-pathway">.</a>&#8221; I get it. But I think there is actually some utility in drug approval processes that are bespoke enough to let the federal government both accommodate practical constraints&#8212;manufacturing limitations, supply chain fragility&#8212;and extract concessions like price adjustments in return for regulatory speed. Obviously not ideal that such a program exists in the context of the volatile current administration, but I&#8217;m not especially opposed to a &#8216;<em>we&#8217;ll fast-track good stuff through an opaque review process</em>&#8217; setup. </p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[On creating 'new knobs of control' in biology]]></title><description><![CDATA[4.9k words, 22 minutes reading time]]></description><link>https://www.owlposting.com/p/on-creating-new-knobs-of-control</link><guid isPermaLink="false">https://www.owlposting.com/p/on-creating-new-knobs-of-control</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Fri, 10 Apr 2026 12:41:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!kB4A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kB4A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kB4A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kB4A!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dda37299-49c5-488c-8735-4f1008962589_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7212366,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/178373718?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kB4A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!kB4A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdda37299-49c5-488c-8735-4f1008962589_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: I&#8217;ll be releasing a 2~ hour long <a href="https://www.youtube.com/@owl_posting">Podcast</a> in a few weeks, interviewing an early-stage founder working at the extremely niche intersection of (biomanufacturing x printed circuit boards). Please reach out to me at abhishaike@gmail.com or on <a href="https://x.com/owl_posting">X</a> if you&#8217;d be interested in sponsoring</em> <em>it.</em></p><div><hr></div><ol><li><p><a href="https://www.owlposting.com/i/178373718/introduction">Introduction</a> </p></li><li><p><a href="https://www.owlposting.com/i/178373718/examples-of-new-knobs-of-control">Examples of new knobs of control</a></p><ol><li><p><a href="https://www.owlposting.com/i/178373718/synthetic-cell-receptors">Synthetic cell receptors </a></p></li><li><p><a href="https://www.owlposting.com/i/178373718/exotic-physical-sensors">Exotic physical sensors</a></p></li><li><p><a href="https://www.owlposting.com/i/178373718/bioorthogonal-chemistry">Bioorthogonal chemistry</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/178373718/the-future">The future</a></p></li></ol><h1>Introduction</h1><p><a href="https://en.wikipedia.org/wiki/Atorvastatin">Lipitor</a> is a statin. Until it went off-patent in 2011, it was the best-selling drug of all time, and continues to be amongst the most prescribed. How does it work? After we swallow a pill of the stuff, it worms its way into our liver cells, crawls into the active site of a particular enzyme&#8212;HMG-CoA reductase&#8212;which turns down the rate of cholesterol synthesis in the liver, which leads to reduced cholesterol, which leads to saved lives. </p><p>But it is worth remembering that nobody is a <em>willing</em> participant here. Neither the HMG-CoA reductase nor the liver are aware of this cholesterol-reduction game that we humans are playing, and would almost certainly take great offense if alerted to it. The statin only works not because our biology has agreed to cooperate, but because the statin was intentionally made to impersonate something else, the thing that the HMG-CoA reductase is <em>actually</em> looking for, but the impersonator is biochemically incapable of participating in what the reductase wants to do with it. As a result, the therapeutic benefit is achieved: lowered cholesterol. </p><p>Our body never, ever intended for you, <em>you</em> that is, to take any part whatsoever in its maintenance. Our physiologies were built for evolution to handle, and it is only through the tools of evolution that we are allowed to intervene in the process at all. It is entirely by accident that the HMG-CoA reductase active site is available for us to touch, and without it, our body would happily let our arteries choke on their fatty deposits. </p><p>This clearly isn&#8217;t ideal. </p><p>Biology is uniquely limited amongst all scientific fields in that the &#8216;bottom&#8217; of the subject rushes up to meet you very, very fast, where the fundamental barriers are our bodies&#8217; presuppositions on what things <em>ought</em> to look like, rather than what is physically possible. Material scientists, electrical engineers, and mathematicians are not forced to suffer this indignity! Their bottom is the physical laws of the universe. Ours are the pre-existing biological communication networks that evolution could scrounge up given the deadlines it was under, and though it clearly tried to be clever during the process, the results are nowhere near as infinitely flexible as I at least would want them to be. </p><p>The whole situation feels very claustrophobic. Paternalistic even! Is there any way out? How can we regain more control over our poorly built physiology? Or, in other words, how do we install more <strong>&#8216;knobs of control</strong>&#8217;? </p><p>I had this question too! It turns out there&#8217;s a lot of new emerging therapeutic modalities that fit these criteria, and I decided to turn my research over them into an essay.  </p><h1>Examples of new knobs of control</h1><h2>Synthetic cell receptors </h2><p>Here&#8217;s one simple way to install a new knob: stick a new receptor onto your cell membranes, something that responds to a chemical that only <em>you</em>, and not your body, has access to. </p><p>This was the thought process behind the incredibly-named &#8216;DREADD&#8217;, or <em>Designer Receptor Exclusively Activated by Designer Drugs</em>, line of research. In the early 2000s, <a href="https://pubmed.ncbi.nlm.nih.gov/25292433/">Bryan Roth&#8217;s lab at UNC Chapel Hill started mutating G-protein coupled receptors on neurons</a> to see if they could make versions that lost all sensitivity to endogenous ligands, but <strong>gained</strong> sensitivity to synthetic ones. They succeeded. Through several rounds of directed evolution, they created receptors that no longer responded to acetylcholine (or any other natural neurotransmitter) but responded potently to clozapine-N-oxide, or CNO. <strong>CNO doesn&#8217;t naturally exist in your body</strong>.</p><p>Your cells don&#8217;t make it, so it doesn&#8217;t bind meaningfully to any endogenous receptor. It&#8217;s a synthetic orphan chemical that, until DREADDs came along, had no biological partner.</p><p>In practice, the system works like this: you use a viral vector to deliver the DREADD gene to whatever cells you want to control. Once the DREADD is expressed on the cell surface, it just sits there, inert. Then you administer CNO, which binds exclusively to the DREADDs. When it binds, the receptor activates its coupled G-protein just like any normal GPCR would, doing whatever you engineered the GPCR to do. And, as far as anyone can tell, there are no off-target effects.</p><p><strong>This is unambiguously a new knob</strong>. The receptor didn&#8217;t exist in your body before. Its binding partner (CNO) doesn&#8217;t exist endogenously. <strong>As a result, you, and you alone, get to decide when and where it is turned on.</strong> </p><p>But this said, one not-new-knob aspect of DREADDs is that it ultimately relies on GPCR-coded logic. Once CNO binds, the receptor couples endogenous G-proteins that plug into endogenous signaling cascades. <strong>There certainly is a novel </strong><em><strong>input</strong></em><strong>, but the </strong><em><strong>output</strong></em><strong> is still entirely native biology.</strong></p><p>Thankfully, we need not have too much anxiety here. This was addressed in 2016, when Wendell Lim&#8217;s lab at UCSF <a href="https://limlab.ucsf.edu/pdfs/lm_2016.pdf">published a paper in </a><em><a href="https://limlab.ucsf.edu/pdfs/lm_2016.pdf">Cell</a></em><a href="https://limlab.ucsf.edu/pdfs/lm_2016.pdf"> demonstrating a synthetic cell receptor, known as </a><em><a href="https://limlab.ucsf.edu/pdfs/lm_2016.pdf">SynNotch</a></em>, that made both the input <em>and</em> the output programmable. Here, every component was modular and swappable. The extracellular domain could be any desired binding protein: single-chain antibody fragments, nanobodies, designed binding proteins. This determined what the receptor detected. When that sensor domain bound its target, it triggered a release of an intracellular domain. <strong>And that intracellular domain could be just as varied as the extracellular one.</strong> </p><p>Very cool! But where is this all actually useful? </p><p>One unintuitive place is in CAR-T therapy. The original CAR-T cells were programmed with a single receptor that recognized a single antigen on tumor cells, and when they found it, they killed. This worked, but it also had a few problems, specifically that it costs six-figures a dose, sometimes melts the insides of patients from immune overreaction, and that the therapy stops working due to the chimeric T-cells rapidly undergoing exhaustion due to overactivation. While throwing in something like SynNotch potentially makes the first problem worse, it may actually <em>alleviate</em> the latter two issues. </p><p>To understand how, let&#8217;s first consider what a SynNotch CAR-T would look like. Here, we have crafted an <em>if&#8594;then logic</em> system where we can control the <em>if</em> and we can control the <em>then</em>.</p><p><strong>The "</strong><em><strong>if</strong></em><strong>" is the priming antigen, or, whatever the SynNotch extracellular domain is tuned to respond to.</strong> This could be a tumor-specific neoantigen, a tissue-specific marker, or really anything you can build a binder for. Once it is bound to, it can&#8212;as we&#8217;ve discussed&#8212;release whatever, and in our case that will be a transcription factor. </p><p><strong>The "</strong><em><strong>then</strong></em><strong>" is whatever gene you put downstream of the transcription factor. I</strong>n the simplest case, that's a CAR, but it doesn't have to be. You could induce cytokine secretion directly, or expression of a checkpoint inhibitor, or a suicide gene, or all of the above in some combination. </p><p>Why is this useful?</p><p>For the &#8216;<em>sometimes melts the insides of patients&#8217;</em> issue, the problem with basic CAR-T cells is that they are <strong>always</strong> armed. Every cell expressing your target antigen, anywhere in the body, is a potential trigger for activation, and when millions of CAR-T cells encounter their target simultaneously, they release a flood of inflammatory cytokines, and if enough of them do this at once, you get systemic inflammation that can progress to organ failure. SynNotch constrains this <em>geographically</em>: if the priming antigen is tumor-localized, then the T cells only arm themselves inside the tumor microenvironment. To be clear, this is not a hypothetical on my end, <a href="https://www.science.org/doi/10.1126/science.aba1624">this actually exists circa 2022!</a></p><p>For the &#8216;<em>exhaustion</em>&#8217; problem, the issue with modern CAR-T&#8217;s is that they have a habit of signaling&#8212;also called <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5669999/">tonic signaling</a>&#8212; even when there's nothing to kill. The receptor is always sitting in the membrane, and so many CAR constructs exhibit some baseline activation even without antigen binding. Over days and weeks, this chronic low-level stimulation pushes T cells toward exhaustion, and, by the time they encounter the actual tumor, they may be largely inactive. <strong>SynNotch sidesteps this because there is no CAR until the priming event.</strong> Until the T cells find the tumor-specific antigen that causes them to express the CAR (or whatever else), they stay &#8216;fresh&#8217;. And, again, this is not a clever second-order belief about what <strong>may</strong> happen to SynNotch&#8217;d CAR-T, <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8362330/">but an established finding with rather striking results </a>(albeit in mice).</p><p>Fairly, these both presume a bog-standard CAR as the <em>output</em>, but still, that&#8217;s something that wouldn&#8217;t have been possible without the modularity of the system! But if this is so promising, where are our mutant, hyper-engineered SynNotch&#8217;d CAR-T&#8217;s?</p><p>Happily, it has not ended up in some valley of death. <a href="https://neurosciencenews.com/synnotch-cart-glioblastoma-25954/">There are two ongoing phase 1 clinical trials right now</a>, both run by UCSF, to test these types of constructs out in glioblastoma patients. Looking forwards to what the results will be!</p><h2>Exotic physical sensors</h2><p><a href="https://www.owlposting.com/p/optogenetics-could-change-the-world">We&#8217;ve discussed optogenetics on this blog before</a>, guest-written by <a href="https://www.linkedin.com/in/pelagia-martin-ab0558253/">Pelagia Martin</a>. But optogenetics really belongs to a much broader class of synthetic biology methods that seek to give cells entirely new sensing modalities, ways of perceiving the world that evolution never bothered to install.</p><p>To start off, let&#8217;s re-explain optogenetics, which is perhaps the first ever instantiation of this concept. The basic idea is to take light-sensitive proteins from algae or bacteria (<a href="https://en.wikipedia.org/wiki/Channelrhodopsin">channelrhodopsins</a>, <a href="https://en.wikipedia.org/wiki/Halorhodopsin">halorhodopsins</a>, and their many cousins), stick them into neurons, and now you can control neural activity with light. Shine blue light, neurons fire. Shine yellow light, neurons silence. Your neurons did not previously respond to light. Now they do!</p><p>What other physical sensing modalities could we force into cells? </p><p>Seems like the answer is basically &#8216;anything&#8217;. Mechanosensitive receptors can be installed (sonogenetics), temperature sensitive receptors can be installed (thermogenetics), even <strong>magnetically</strong> <strong>sensitive</strong> receptors can be installed (magnetogenetics)&#8212;all of which work via the same fundamental properties as optogenetics. </p><p>You can do all sorts of interesting things with these. </p><p>With sonogenetics, you can engineer mice with mechanosensitive-expressing-neurons that can have their brain modulated via noninvasive ultrasound <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10235981/">to affect disease-relevant brain circuitry</a>&#8212;which you may recognize as one of the theses of <a href="https://merge.io/blog">Merge Labs</a>. As a point of nuance, this isn&#8217;t exactly a fully new knob of control, since <a href="https://en.wikipedia.org/wiki/Mechanosensitive_channels">neurons are already slightly mechanosensitive</a>. That&#8217;s why noninvasive ultrasound already works for modulating unmodified human neurons! But it is <em>slightly</em> new in the sense that engineering new mechanosensitive receptors have a few upsides: only your transfected neurons respond, the engineered channels are more sensitive than endogenous ones, and you can much more reliably do excitation/inhibition (typical ultrasound can do both, <a href="https://www.nature.com/articles/s41467-021-22743-7">but it&#8217;s difficult to pick</a>). </p><p>With thermogenetics, you can do something not dissimilar to the SynNotch CAR-T geographic activation, but instead of activating in the presence of a local antigen, instead have them activate only underneath mild elevations in temperature. <a href="https://www.biorxiv.org/content/10.1101/2020.04.26.062703v1.full">From a 2020 paper:</a></p><blockquote><p><em>To enable CAR T cells to respond to heat, we construct synthetic thermal gene switches that trigger expression of transgenes in response to mild elevations in local temperature (40&#8211;42 &#176;C) but not to orthogonal cellular stresses such as hypoxia. We show that short pulses of heat (15&#8211;30 min) lead to more than 60-fold increases in gene expression without affecting key T cell functions including proliferation, migration, and cytotoxicity&#8230;<strong>In mouse models of adoptive transfer, photothermal targeting of intratumoral CAR T cells to control the production of an IL-15 superagonist significantly enhances anti-tumor activity and overall survival.</strong></em></p></blockquote><p>But while this is interesting, it feels a bit unsatisfying to repeat the same CAR-T trick again. Cell therapies suck for a lot of reasons, and it would be putting a lot of eggs in the same basket if all our &#8216;new knob&#8217; ideas revolved around simply treating cancer better. </p><p>This leads us to magnetogenetics, which is perhaps the most interesting of the bunch, and in fact what drove me to write this article to start off with. </p><p>If you&#8217;re as online as I am, you may remember that in mid-summer 2024, Andrew York, a scientist at Calico Labs, posted a <a href="https://x.com/AndrewGYork/status/1797408565742776348">Twitter thread that is burned into my mind</a>. It was over his team&#8217;s discovery of MagLOV, which is a fluorescent protein that is also magnetically sensitive. </p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/AndrewGYork/status/1797408565742776348&quot;,&quot;full_text&quot;:&quot;Meet MagLOV: an engineered protein that responds STRONGLY to magnetic fields.\n\nThis is a fluorescence timelapse of MagLOV in E. coli. We're waving a (small) magnet under the plate.\n\nCan you tell where the magnet is?\n\nWant some? It's on Addgene now! <a class=\&quot;tweet-url\&quot; href=\&quot;https://www.addgene.org/219957/\&quot;>addgene.org/219957/</a> &quot;,&quot;username&quot;:&quot;AndrewGYork&quot;,&quot;name&quot;:&quot;Andrew York&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/934291215469514752/4-dXXuOF_normal.jpg&quot;,&quot;date&quot;:&quot;2024-06-02T23:22:50.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://substackcdn.com/image/upload/w_1028,c_limit,q_auto:best/l_twitter_play_button_rvaygk,w_88/fuxajbybmeruoajqssxn&quot;,&quot;link_url&quot;:&quot;https://t.co/45UVihbQKU&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:40,&quot;retweet_count&quot;:295,&quot;like_count&quot;:1029,&quot;impression_count&quot;:201441,&quot;expanded_url&quot;:null,&quot;video_url&quot;:&quot;https://video.twimg.com/ext_tw_video/1797407215935983617/pu/vid/avc1/484x558/0GP2SCRwemfYgKtw.mp4?tag=12&quot;,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>To understand the significance of this, we need some extra context. Prior to this result, magnetogenetics papers claimed things like "<em>we fused ferritin to an ion channel and now magnetic fields open it</em>." This is problematic, because it is literally physically impossible to do anything useful via this route &#8212; a fact that was explained at length in a great 2016 paper titled <a href="https://elifesciences.org/articles/17210">&#8216;Physical limits to magnetogenetics&#8217;:</a></p><blockquote><p><em><strong>These [above] calculations show that none of the biophysical schemes proposed in these [magnetogenetics]</strong></em><strong> </strong><em><strong>articles is even remotely plausible,</strong> and a few additional proposals were eliminated along the way. The forces or torques or temperatures they produce are too small by many orders of magnitude for the desired effects on molecular orientation or on membrane channels. If the phenomena occurred as described, they must rely on some entirely different mechanism. </em></p><p><em>Barring dramatic new discoveries about the structure of biological matter, the proposed routes to magnetogenetics, based on either pulling or heating a ferritin/channel complex with magnetic fields, <strong>have no chance of success.</strong></em></p></blockquote><p>Because of this, the field of magnetogenetics is a bit of a mess, with at least one <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7103519/">high-profile failure to replicate</a> in 2020. </p><p>So, what changed? How is MagLOV somehow responsive to magnetic fields? </p><p>Well, for one, MagLOV is not really <em>mechanically</em> responding to magnetic fields in the same way these magnetogenetics proteins papers claim, but rather alters its own fluorescence in response to a magnetic field. How does it do this? It&#8217;s beyond me, given that the underlying mechanism has something to do with<a href="https://www.biorxiv.org/content/10.1101/2024.11.25.625143v3"> &#8216;</a><em><a href="https://www.biorxiv.org/content/10.1101/2024.11.25.625143v3">radical pair mechanisms</a></em><a href="https://www.biorxiv.org/content/10.1101/2024.11.25.625143v3">&#8217; and &#8216;</a><em><a href="https://www.biorxiv.org/content/10.1101/2024.11.25.625143v3">quantum spin dynamics</a></em><a href="https://www.biorxiv.org/content/10.1101/2024.11.25.625143v3">&#8217;,</a> and I am not going to pretend I have any real intuition for either of these, but it does seem reasonable to boil the whole process down to two observations.</p><p>One, introducing a magnetic field alters an ongoing photochemical reaction, which changes the protein&#8217;s (MagLOV) fluorescence. </p><p>And two, fluorescence can change a protein&#8217;s conformation. </p><p>This means that you can alter the distribution of conformational states in a solution of [optosensitive protein fused with MagLOV] by altering nearby magnetic fields.<strong> </strong>Why is this useful?<strong> Because now we have a way to use this whole tech stack that optogenetics has built up over the last fifteen years, which has largely been squandered due to the fact that getting light inside the body is really, really hard. </strong></p><p>As of 2026, there is now a company formed around this idea: <a href="https://www.nonfictionlaboratories.com/">Nonfiction Labs</a>&#8212;co-founded by Richard Fuisz, who gave a really great <a href="https://www.corememory.com/p/richard-fuisz-nonfictionlabs">Core Memory podcast over his work there</a>&#8212;which has developed the first ever magnetically sensitive antibody, or &#8216;MagBody&#8217;.  </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!64nM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!64nM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 424w, https://substackcdn.com/image/fetch/$s_!64nM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 848w, https://substackcdn.com/image/fetch/$s_!64nM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 1272w, https://substackcdn.com/image/fetch/$s_!64nM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!64nM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png" width="724" height="312.77197802197804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:629,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:674687,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/178373718?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!64nM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 424w, https://substackcdn.com/image/fetch/$s_!64nM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 848w, https://substackcdn.com/image/fetch/$s_!64nM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 1272w, https://substackcdn.com/image/fetch/$s_!64nM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc25036b3-93f7-45d6-b2ad-7a5ea7f53d16_2872x1240.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You could imagine that a pretty simple application of this work is to do the same cancer tricks as before; only activating a drug at specific regions, or even suppressing a drug in especially sensitive areas. <a href="https://www.corememory.com/p/exclusive-fridge-magnet-medicine-nonfiction-labs-cancer">From another Core Memory article covering Nonfiction:</a> </p><blockquote><p><em>HER2 is a useful example. Herceptin, Kadcyla, and Enhertu are all FDA-approved drugs targeting this antigen, commonly found in breast cancers. All produce distinct toxicity because HER2 is also expressed in healthy tissue, particularly the heart. A magnetically controllable HER2 therapy could, in principle, be active at the tumor and silent in tissues prone to damage.</em></p></blockquote><p>But while cancer is neat and all, we should think bigger. Where else would something like this, &#8216;<em>this</em>&#8217; meaning magnetically sensitive proteins, be useful? It is common for new therapeutic modalities to tout their universal applicability, but there is a genuine reason to believe that magnetically sensitive proteins may be able to boast that without accusations of hyperbole. Because it is useful for basically any situation where you want external, non-invasive, temporal control over a protein&#8217;s function inside a living body, <strong>and that&#8217;s a </strong><em><strong>lot</strong></em><strong> of situations.</strong></p><p>As an example of how creative one can get here: consider chronic pain. You may be aware that tools for managing this today are pretty bad. On one end, you have opioids, which work great, but are also systemic, addictive, tolerance-building, and responsible for a <a href="https://www.vox.com/policy-and-politics/2017/6/6/15743986/opioid-epidemic-overdose-deaths-2016">crisis that has killed more Americans than every war since Vietnam combined</a>. On the other end, you have local anesthetics, which are geographically precise, but wear off in hours, require repeated injections, and are often not useful for many types of pain. And in between, you have things like gabapentin. Which sort of work, sometimes, for some people, while also making them foggy and fatigued, because, like opioids, they're systemically active and nonspecific.</p><p><strong>The core issue, which by now should sound familiar, is that we have no knob to tune how pain-reduction medications work.</strong> Once the analgesic is in you, it does its thing everywhere, at all times, at whatever dose your last pill provided. If there is any knob made available, it is in a single, coarse, delayed-feedback dial for adjusting drug dosage.</p><p>If the world that Nonfiction Labs hopes to usher in comes to pass, the future may look very different. In this setting, you&#8217;ll receive a single systemic administration of a magnetically controllable protein that inhibits pain signaling. Maybe it&#8217;s a MagLOV-fused nanobody against a sodium channel like Nav1.7, <a href="https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2025.1573254/full">which is famously specific to pain sensation</a>. In the absence of a magnetic field, the nanobody is inert, or at least has dramatically reduced binding affinity. Then the patient puts on a wearable device that generates a local magnetic field over the affected area. The field activates the nanobody, which binds Nav1.7, which drops pain signaling. And if the patient needs the pain to return, for say, a physical therapy appointment, turning the magnet off suffices. </p><p><strong>The patient now has, for the first time in the history of pain medicine, a knob for their own analgesia, one that is spatially and temporally specific in a way that was previously impossible.</strong> </p><p>Of course, many questions must be answered. How long will the MagBody persist in the body? How spatially precise of a magnetic field can you put onto a portable device? Will the nanobodies be affected by external magnetic fields, like an MRI? But the trajectory here feels fascinating to me, and I look forwards to learning more about what Nonfiction ends up doing. </p><h2>Bioorthogonal chemistry </h2><p>What actually happens when we ingest a drug? Say, aspirin. As you may expect, it begins chattering with the other biomolecules in our body. This has some upsides in the sense that interacting with native biology is typically the primary way that drugs exert their therapeutic effect, but the downside is that native biology talks <em>back</em>. Aspirin inhibits the COX2 enzyme, which is what you want&#8212;less prostaglandin synthesis, less inflammation, less pain&#8212;but the same enzyme also exists in your platelets, where it helps with clotting, which means that people with bleeding disorders cannot take it. </p><p>Now, fairly, how much demand for aspirin exists amongst hemophiliacs? Probably not much. But the broader point holds: <strong>every drug that works by engaging endogenous biology has the inconvenient habit of expressing your target in tissues we&#8217;d rather leave alone</strong>. </p><p><strong><a href="https://en.wikipedia.org/wiki/Bioorthogonal_chemistry">Bioorthogonal chemistry</a></strong> <strong>offers a way out of this headache</strong>. The idea, which won the 2022 Nobel Prize in Chemistry, is defined as &#8216;<em>any chemical reaction that can occur inside of living systems without interfering with native biochemical proces</em>ses&#8217;. </p><p>Well, wait a minute. Attempting to deviate from our plumbing <em>entirely</em> seems like a slightly contrived problem, no? If the ultimate purpose of any of these systems is to <em>eventually</em> interact with our native biology, why would we care about anything that <em>doesn&#8217;t</em>? Let&#8217;s suspend disbelief for the moment, I&#8217;ll explain the actual utility of solving the problem later. For now, let&#8217;s assume it&#8217;d be useful. </p><p>How would you do this so-called bioorthogonal chemistry? </p><p>Well, only half of the 2022 Nobel Prize was awarded to bioorthogonal chemistry, the other half was for an adjacent idea called &#8216;<em><a href="https://en.wikipedia.org/wiki/Click_chemistry">click chemistry</a></em>&#8217;. Click chemistry is a much broader concept and has to do with designing chemical reactions that are modular, high-yielding, and work reliably under mild conditions. And as it turns out, the most therapeutically relevant click chemistry reactions happen to also be bioorthogonal, the two concepts are deeply intertwined, and three researchers&#8212;chemists and biologists alike&#8212;won the prize for this reason.</p><p>For our purposes, there is one particularly important reaction class here: the &#8216;<em>inverse electron-demand Diels-Alder reaction between a tetrazine and a trans-cyclooctene (TCO)</em>&#8217;. </p><p>&#8216;<em>What is that?&#8217;</em>, you may ask. I&#8217;m not quite sure, but all you really need to understand is three things: neither tetrazine nor TCO react with anything in the human body, they react incredibly fast with one another, and the byproducts of their reactions are entirely innocuous (nitrogen and carbon dioxide). Ah, and we forgot the most important thing: both the tetrazine and TCO are very amenable to having external molecules bolted onto <em>them</em>, which, given some clever chemical engineering, would fall off after the (tetrazine x TCO) reaction occurs. </p><p>Do you see the therapeutic relevance here? </p><p><strong>The utility of bioorthogonal chemistry is to create better </strong><em><strong>prodrugs</strong></em><strong>.</strong> </p><p>Prodrugs are defined as anything pharmacologically inert that, upon being introduced to the body, will undergo some form of chemical rearrangement&#8212;cleavage, addition of groups, and so on. They aren&#8217;t new either; the first prodrug appeared a century back, and, circa 2018, <a href="https://pubmed.ncbi.nlm.nih.gov/29700501/">12% of all approved small molecule therapeutics are prodrugs</a>. What actually triggers a given prodrug&#8217;s conversion into something biologically active is heterogeneous, but can be grouped into one of a few categories: metabolism (e.g. phosphorylation), pH (e.g. acidic environments), or interaction with endogenous biomolecules (e.g. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11784115/">trypsin</a>). </p><p><strong><a href="https://www.nature.com/articles/s41573-024-00914-7.epdf?sharing_token=BLYHI8BEWwad8EDmQmjH-NRgN0jAjWel9jnR3ZoTv0Mwlv56StAaOaz9S1b32Hf4vecu6CNvzIM6DEwqcEsri5twgpuCeNrcHbv2PNjRzkz5Vyx0kgzROXcFppr3l6T8JoEDFqHsQBueYxh-lc8aMUe4CKAbbOoyhWOhQBbcZKs%3D">Importantly, whatever actually causes the conversion is typically the answer for why you&#8217;d use a prodrug at all</a>.</strong> If activation depends on liver metabolism, it usually means the prodrug form survives the digestive tract better than the active drug would, buying you oral bioavailability. If activation depends on an enzyme enriched in a specific tissue, it means you get some geographic targeting for free. If activation depends on acidic pH, it means you&#8217;re exploiting the metabolic quirks of a particular tissue microenvironment to concentrate drug activity there. The trigger is the therapeutic logic.</p><p>The problem is that none of these triggers are <em>yours</em>. They are all endogenous, which means they are all leaky, which means you are playing a statistical game of relative improvement on a particular axis, not absolute. </p><p><strong>The proposal for bioorthogonal chemistry prodrugs is that they offer </strong><em><strong>absolute</strong></em><strong> control; a prodrug will become active exactly where and when you want it to.</strong> </p><p>How? </p><p>Consider <a href="https://en.wikipedia.org/wiki/Doxorubicin">doxorubicin</a>, one of the most relied-upon chemotherapeutics ever developed, part of a fairly high number of cancer treatment regimens, and also one of the most unpleasant. It is one of the few drugs, cancer or otherwise, with a foreboding nickname: &#8216;Red Devil&#8217;. Perhaps accordingly, doxorubicin is extremely cardiotoxic. Alongside causing you nausea, hair loss, and immunosuppression, it will, at some point, likely be responsible for giving you irreversible heart damage. The game of chemotherapy has always been one of hoping it kills the cancer before it kills you, and it is empirically the case that administration of the 'Red Devil' results in clinical heart failure in somewhere between<a href="https://www.nature.com/articles/s41514-024-00135-7"> 5% and 26% of patients</a>, increasing based on the dose, and that if you are unlucky enough to be in that group, <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC2848530/">you face roughly 50% mortality within a year</a>. In fact, there are <a href="https://pubmed.ncbi.nlm.nih.gov/291473/">case reports of patients dying of cardiac arrhythmias </a><em><a href="https://pubmed.ncbi.nlm.nih.gov/291473/">during the infusion itself. </a></em></p><p>Now imagine a version of doxorubicin that has been linked with TCO. </p><p>In this form, the active site of doxorubicin is capped, so the drug cannot interact with anything, including your heart. Meanwhile, at the tumor site, you have arranged for a tetrazine to be waiting. How? The simplest version, and the one furthest along clinically, is almost comically literal: you inject a tetrazine-modified biopolymer directly into the tumor. The (TCO x doxorubicin) conjugate is then administered systemically, circulates through the body, eventually stumbling onto the tetrazine deposit you made. The two react, and active doxorubicin is released locally, <strong>only at the site of the tumor</strong>. </p><p>Happily, we needn&#8217;t merely imagine this. In June 2025, the company <a href="https://www.shasqi.com/">Shasqi</a> published the results of a f<a href="https://pubmed.ncbi.nlm.nih.gov/40522144/">irst-in-human Phase 1 clinical trial in </a><em><a href="https://pubmed.ncbi.nlm.nih.gov/40522144/">Clinical Cancer Research</a></em>, <strong>the first time bioorthogonal chemistry had ever been used therapeutically inside a human being</strong>. Patients with advanced solid tumors received up to fifteen-fold the conventional doxorubicin dose as prodrug, and no dose-limiting toxicities were reported. </p><p>Given the theory we&#8217;ve established here, you may imagine that this could be <em>the</em> cure to cancer. Chemotherapy works extremely well, and the only reason it can&#8217;t work even better is because it also works very well at killing healthy cells. So, if bioorthogonal chemistry chemotherapy offers us a way to nearly-perfectly prevent off-target effects, can&#8217;t we just fill someone up to the brim with it and call it a day? </p><p>Unfortunately no, at least according to the trial results. <strong>Fifteen-fold the dose does not mean fifteen-fold the active drug at the tumor.</strong> At a certain point, the biopolymer tetrazine&#8217;s ability to capture circulating prodrug saturates, and (TCO x doxorubicin) that doesn&#8217;t react will simply drift through the body inertly. And while the acute safety data looked clean, doxorubicin cardiotoxicity is notorious for appearing years after treatment (<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3368447/">in one case, 17 years later!)</a>, so the long-term picture remains unknown. And most importantly, <strong>no objective tumor responses were observed, </strong>only stable disease rates that are not out of line with typical levels. Now, fairly, this trial was on a heavily pretreated, refractory population, so while stable disease is not meaningless, it is a long way from a cure.</p><p><strong>But there is a proof of concept here.</strong> And you could imagine that an easy way to improve is, instead of having a human-legible injection site for a tetrazine polymer to sit at, you could attach the tetrazine to something that finds the tumor on its own&#8212;which may vastly increase the &#8216;surface area&#8217; for the TCO x chemotherapy conjugate to react with. </p><p>And this is exactly what the other company in the bioorthogonal chemistry space, <a href="https://www.tagworkspharma.com/">Tagworks Pharmaceuticals</a>, does. The Tagworks thesis is simple: conjugate TCO to an antibody meant to bind to tumor markers, inject it systemically, allow it to park itself within a tumor. Then, some time later, administer a tetrazine trigger intravenously, which reacts with the TCO on the tumor-bound antibody and releases the payload right there in the microenvironment. Their lead program, <a href="https://www.tagworkspharma.com/tagworks-fda-clearance-phase1-tgw101-keith-orford-cmo">TGW101</a>, targets TAG-72&#8212;a marker on solid tumor cells&#8212;and entered a Phase 1 dose-escalation trial in 2025.</p><p>But we&#8217;re getting trapped in the cancer bubble again. Where else can bioorthogonal chemistry theoretically be used?</p><p>Similarly to MagBodies: anywhere you want geographic or temporal precision in drug activity, which is essentially everywhere. I&#8217;ll leave an exact definition as an exercise for the reader, but one hint is for immunosuppression in autoimmune conditions. Say, joint pain? Could a tetrazine scaffold be injected into a <a href="https://en.wikipedia.org/wiki/Synovial_joint">synovial cavity</a>, and a TCO bearing the immunosuppressant be administered systemically? Of course, who knows whether this would have any advantages over standard of care, but it&#8217;s a fun idea! </p><h1>The future</h1><p>We began this essay with a complaint: that biology is uniquely claustrophobic amongst the sciences, that the floor rushes up to meet you, that the barriers are not physical law but evolutionary happenstance. All of this is true. But there is also an upside here. The very thing that makes biology so frustrating to work with is also what makes it so astonishingly <em>extensible</em>. The sections above are, I think, the very earliest results of what happens when clever people notice this.</p><p>What feels particularly interesting about the modalities discussed here is that none of them feel like they could have emerged from within a single field. To pull off something like <em>bioorthogonal-chemistry-for-tumors</em>, you would&#8217;ve needed a physical chemist&#8217;s knowledge of click chemistry, a medicinal chemist to design the prodrug linkage, and an oncologist to understand where such an innovation is best deployed. The low-hanging fruit of simple binders has been largely picked, and the drug discovery field at large has been increasingly eyeing stranger modalities&#8212;<a href="https://en.wikipedia.org/wiki/Proteolysis_targeting_chimera">PROTACs</a>,<a href="https://en.wikipedia.org/wiki/Allosteric_modulator"> allosteric modulators</a>, both of which are requiring rethinking of what a &#8220;drug&#8221; even looks like. And the modalities in this essay are stranger still.</p><p>The discovery of more like these may depend less on searching known chemical space faster and more on the kind of lateral, cross-domain synthesis that has historically been bottlenecked by the simple fact that very few people are simultaneously deep in so many fields at once. I don&#8217;t want to plug LLMs into an article where there&#8217;s really no need to do it, but it&#8217;s tough to not think of the potential here. This interdisciplinary bottleneck is loosening fast! It is exciting to think about what else is on the horizon. </p>]]></content:encoded></item><item><title><![CDATA[Reasons to be pessimistic (and optimistic) on the future of biosecurity]]></title><description><![CDATA[13.2k words, 59 minutes reading time]]></description><link>https://www.owlposting.com/p/reasons-to-be-pessimistic-and-optimistic</link><guid isPermaLink="false">https://www.owlposting.com/p/reasons-to-be-pessimistic-and-optimistic</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 16 Mar 2026 15:25:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eKaJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eKaJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eKaJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eKaJ!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7091815,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/145813239?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eKaJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!eKaJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F125e23ad-3ec0-47b3-bede-34bdc51e7203_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: this essay required conversations with a lot of people. I&#8217;d like to thank<a href="https://www.linkedin.com/in/patrick-boyle-a790094a"> Patrick Boyle</a> (ex-CSO of Ginkgo Bioworks),<a href="https://www.harm0n.com/"> Harmon Bhasin</a> (founder of a stealth biosecurity startup),<a href="https://www.linkedin.com/in/bryanlehrer/"> Bryan Lehrer</a> (ex-Blueprint Biosecurity),<a href="https://vgel.me/"> Theia Vogel</a> (ex-SecureDNA),<a href="https://ifp.org/author/jake-swett/"> Jacob Swett</a> (founder of Blueprint Biosecurity),<a href="https://www.mitre.org/who-we-are/our-people/matthew-watson"> Matt Watson</a> (ex-MITRE),<a href="https://sentinelbio.org/people/"> Janika Schmitt</a> (Program Officer at Sentinel Bio),<a href="https://www.linkedin.com/in/hmusu/"> Harshu Musunuri</a> (PhD student at UCSF),<a href="https://www.linkedin.com/in/liyam-chitayat-b92973160/"> Liyam Chitayat</a> (PhD student at MIT),<a href="https://www.linkedin.com/in/jakeradler/"> Jake Adler</a> (founder of Pilgrim Labs),<a href="https://www.linkedin.com/in/dianzhuowang/"> Dianzhuo (John) Wang</a> (PhD student at Harvard),<a href="https://www.linkedin.com/in/jassipannu/"> Jassi Pannu</a> (Assistant Professor at Johns Hopkins),<a href="https://www.linkedin.com/in/charliepetty"> Charlie Petty</a> (many biosecurity-related positions),<a href="https://nishy.business/"> Nish Bhat</a> (VC at Carbon Silicon Ventures),<a href="https://www.linkedin.com/in/sarahcarter/"> Sarah Carter</a> (Senior Advisor at Federation of American Scientists), and<a href="https://www.linkedin.com/in/james-black-b98939217/?originalSubdomain=uk"> James Black</a> (Scholar at Johns Hopkins) for speaking with me. All opinions in this essay are my own.</em></p><p><em>Second note: This essay is very long. While it can be read from top-to-bottom&#8212;and is written assuming you will&#8212;you will lose little by simply choosing specific sections you find interesting and reading only those.</em></p><div><hr></div><ol><li><p><a href="https://www.owlposting.com/i/145813239/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/some-thoughts">Some thoughts</a></p><ol><li><p><a href="https://www.owlposting.com/i/145813239/the-business-case-for-biosecurity-requires-another-pandemic-for-it-to-work">The business case for biosecurity requires another pandemic for it to work</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/the-preventative-architecture-assumes-a-chokepoint-thats-disappearing">The screening architecture assumes a chokepoint that&#8217;s disappearing</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/targeting-humans-with-bioweapons-is-probably-genuinely-difficult">Targeting humans with bioweapons is (probably) genuinely difficult</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/agricultural-bioterrorism-is-probably-really-easy">Agricultural bioterrorism is (probably) really easy</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/the-monitoring-architecture-is-useful-for-detection-but-not-defense">The monitoring architecture is useful for detection, but not defense</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/machine-learning-may-be-very-useful-for-rapid-response-therapeutics">Machine learning may be very useful for rapid-response therapeutics</a></p></li><li><p><a href="https://www.owlposting.com/i/145813239/pathogen-agnostic-defenses-are-extraordinary-but-who-pays-for-it">Pathogen-agnostic defenses are extraordinary, but who pays for it?</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/145813239/conclusion">Conclusion</a></p></li></ol><h1><strong>Introduction</strong></h1><p>It is easy to scare yourself about biosecurity, and it is getting easier every day. Everyone has their moment when the fear first crept into their throat. Mine was when I read the article titled &#8216;<em><a href="https://substack.com/home/post/p-161981504?utm_campaign=post&amp;utm_medium=web">AIs can provide expert-level virology assistance</a></em>&#8217;, which found that LLMs&#8212;even ones as ancient as Gemini 1.5 Pro&#8212;are more than capable of happily providing the knowledge needed to debug BSL-4-sounding questions about wet-lab experiments.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cqBf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cqBf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cqBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg" width="623" height="637.4883720930233" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1320,&quot;width&quot;:1290,&quot;resizeWidth&quot;:623,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cqBf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cqBf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4350a8cf-9f91-4a9c-a7da-0fe0070dc0a7_1290x1320.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As with any paranoia worth having, there are good objections to it. </p><p>Most recently, the non-profit<a href="https://activesite.bio/"> Active Site</a> published the largest<a href="https://arxiv.org/pdf/2602.16703"> randomized control trial of its kind</a>&#8212;153 novices, 8 weeks, a BSL-2 lab in Cambridge&#8212;studying how much access to frontier LLMs (Opus 4, o3, Gemini 2.5, all with safety classifiers <em>off</em>) gave participants &#8216;uplift&#8217; on performing a set of viral genetics workflow (including virus production), compared to only access to the internet. Their conclusions are the following: &#8216;<em>We observed no significant difference in the primary endpoint of workflow completion (5.2% LLM vs. 6.6% Internet; P = 0.759), nor in the success rate of individual tasks</em>&#8217;. with the caveat that the LLM has numerically higher success rates on 4 out of 5 tasks, just not high enough to reach significance level. <strong>YouTube, not the LLMs, was rated most helpful by both groups.</strong> </p><p>So, while frontier models are theoretically capable of providing virology assistance, it doesn&#8217;t immediately seem like they can bootstrap someone into wet-lab competence; the hands are still the hard part. There are counterpoints to this as well, like, &#8216;<em>LLMs probably help non-novices a lot!&#8217;, </em>and &#8216;<em>the study is underpowered!&#8217; </em>and so on. I agree with some of these. The truth is almost certainly somewhere in the middle: LLMs can help a novice with wet-lab work, but they don&#8217;t help an infinite amount.</p><p><strong>Yet, I still believe it is still hard to actually turn all this into something evil.</strong> And no, I do not think that gesturing towards &#8216;automated labs&#8217; is a good counter argument. Doing things in the world of atoms is difficult. Especially here. Why?<a href="https://www.owlposting.com/p/heuristics-for-lab-robotics-and-where"> Didn&#8217;t I just write a month back about how cloud labs are the final end-state of lab automation plays</a>, so can&#8217;t they be hacked into doing something ulterior? Man, maybe. But you should consider the fact that these cloud labs are, at the moment, barely functional enough to do the things their paying customers want them to do, let alone serve as unwitting accomplices in a bioterror plot. Yes, they will improve, but their improvement is on a <em>very</em> jagged frontier. Liquid handler automation is going splendid. Liter-scale creation, purification, and aerosolization of BSL-4 substances automation is not going so splendidly. Also, even in the case where automation suddenly rapidly accelerates, it is almost certainly economically <em>not</em> viable for these labs to care about servicing the likely small consumer market of &#8216;large-scale non-therapeutic virus creation&#8217;.</p><p>I&#8217;ll discuss this more deeply in the upcoming sections, but it feels that doing something as ambitious as bioweapon creation will likely be extremely annoying to do for the foreseeable future, and I am consistently on the side that only a well-funded actor would be capable of such a thing. And why wouldn&#8217;t those actors opt for much simpler acts of violence that would roughly accomplish the same thing?</p><p>This all said: I sympathize with the bioterrorism-phobia that is sweeping my simcluster. If you stare for long enough at the AI trendlines, and also observe the increasingly WW3-y vibes that the world is emanating, it is difficult to not feel at least some worry. Maybe a genuine bioterrorism incident is not too far away. And maybe it will be far, far worse than anything can imagine.</p><p>Or maybe not. Biosecurity is one of those topics that can either feel extraordinarily bleak in its prognosis, or like things are obviously going to be fine. As with many things in the world, I think both sides have a bit of a point, and I think holding them in tension is the only honest way to consider how the future may go. In this essay, I&#8217;ll share some of my own thoughts on the field at large, and the specific themes that arose in my discussions with people.</p><h1><strong>Some thoughts</strong></h1><h2><strong>The business case for biosecurity requires another pandemic for it to work</strong></h2><p>As with all problems that, if not solved, may lead to the depopulation of the planet, we can depend on venture capitalists to search for a market opportunity. A few companies have emerged in the last few months as the vanguard of this effort:<a href="https://valthos.com/"> Valthos</a> ($30M Series A for being the Palantir of biosecurity),<a href="https://www.redqueen.bio/"> Red Queen Bio</a> ($15M seed for designing therapeutics against bioterrorism threats), and<a href="https://www.aclid.bio/"> Aclid</a> ($4M seed for DNA synthesis screening infrastructure). There are others too, but we&#8217;ll stick with these ones for now for illustration purposes.</p><p>I have zero doubt that these companies, or something akin to them, are worth having around. What I cannot quite figure out is the business model. The usual answer for the &#8216;<em>who pays for this</em>?&#8217; questions in these sorts of public-goods-situations are government agencies: BARDA, DoD, DHS, CDC and so on. This is not so bad of an idea.</p><p><a href="https://www.astho.org/advocacy/federal-government-affairs/leg-alerts/2025/white-house-releases-additional-fy26-budget-materials/">Let&#8217;s take a look at the United States&#8217; 2026 budget proposal for the biodefense-adjacent areas</a> to get a sense of these agencies&#8217; funding.</p><p><em>In the proposal, BARDA is being cut by $361 million, a roughly 36% decrease from its prior state. Project BioShield, the program that actually buys finished countermeasures, is on track to lose $100 million. The CDC budget is halved, coming at around a $5.4 billion loss. DTRA down $150 million.<a href="https://councilonstrategicrisks.org/2025/09/04/mixed-signals-on-biodefense-in-trumps-fy26-budget-request/"> One article more deeply analyzing the many, many other various biodefense cuts being made had this to say about it:</a></em></p><blockquote><p><em>With the Trump Administration&#8217;s priority of reducing federal spending, the funds requested for biodefense have been significantly reduced. <strong>Very few biodefense programs saw increases in their funding or even a continuation at their previous funding levels.</strong></em></p></blockquote><p>How about the Department of Defense (War)? Mixed picture: while the overall budget of the department was increased, the bio-adjacent programs within it saw a drop.</p><blockquote><p><em>One notable example is the Defense Threat Reduction Agency (DTRA), a key government agency that prevents and mitigates deliberate biological threats to the US globally, for which the PBR requests<a href="https://comptroller.defense.gov/Portals/45/Documents/defbudget/FY2026/budget_justification/pdfs/01_Operation_and_Maintenance/O_M_VOL_1_PART_1/DTRA_OP-5.pdf"> $708 million</a>. This is $150 million less than the $858 million requested in FY25. Similarly, the $1.61 billion FY26 request for the<a href="https://comptroller.defense.gov/Portals/45/Documents/defbudget/FY2026/budget_justification/pdfs/02_Procurement/PROC_CBDP_PB_2026.pdf"> DoD Chemical and Biological Defense Program</a> is $46 million less than requested for FY25.</em></p></blockquote><p>In other words: the agencies that would theoretically buy tools from, say, Valthos, are the same agencies that the current administration is intending to either gut or barely increase the budget of.</p><p>There is good news: this budget did not come to pass.</p><p>Congress rejected nearly every one of the proposals: the CDC&#8217;s budget was not reduced, while BARDA, Project BioShield, and NIH&#8217;s budget actually slightly increased. There is one unfortunate budget stain&#8212;<a href="https://www.bbc.com/news/articles/c74dzdddvmjo">Kennedy pulling $500 million from a BARDA program developing mRNA vaccines against various respiratory viruses</a>&#8212;but things overall turned out fine, though I cannot find specific numbers on how things fared on the DoD end. But it is a little worrying that the administration is not particularly sympathetic to biosecurity concerns. Why? Because if your primary customer is prone to wild swings of financial unpredictability, and it is only thanks to the grace of Congress that those sentiments are not actively reflected in their budget, it almost certainly hurts the capacity for these companies to plan for the future.</p><p>Earlier I mentioned that Valthos intends to be the Palantir for biosecurity. This is not a presumption on my end, they have basically said this. The CEO (<a href="https://www.linkedin.com/in/kmcmahon320/">Kathleen McMahon</a>) is an alum of the company, and has stated that Valthos <a href="https://www.bloomberg.com/news/articles/2025-10-24/openai-backs-a-new-venture-trying-to-thwart-ai-bio-attacks">plans to apply</a> &#8220;<em>many of the same principles she learned at Palantir, about working with officials as well as commercial customers</em>&#8221;. <strong>But an easy counterargument to this is that Palantir&#8217;s government business was built during the post-9/11 spending surge, when homeland security funding went from $16 billion to $69 billion.</strong> Biodefense is holding steady, for now, but not seeing the same dramatic jumps.</p><p>You could imagine that a pretty simple steelman for these objections is not dissimilar to the usual AI-wrapper-SaaS advice people give: <strong>build not for where the models are today, but where they are going.</strong> And if you trust the trend-lines, it is not inconceivable that there is a catalyzing event in our near future&#8212;a genuine, bona-fide bioterror incident&#8212;which will unlock massive government spending the way 9/11 created the entire homeland security industry overnight. In this setting, the companies that already have working products and government relationships when that moment arrives will be the Palantir of biosecurity. The ones that don&#8217;t will be too late.</p><p>The game then, is to survive until this catalyzing event occurs. If it does, Valthos may be able to gobble up all the government contracts it wants, Red Queen Bio may find the DoD suddenly desperate to fund therapeutics platforms that have a biosecurity veneer to them, and Aclid may discover that its few dozen synthesis company customers grow to have even stricter compliance requirements. If it doesn&#8217;t, it is tough to imagine that these companies don&#8217;t go either bankrupt or stay growth-capped. Because of this, you shouldn&#8217;t be surprised at all that these companies acquired the funding that they did! <strong>The game of venture capital is to play &#8216;</strong><em><strong>big if true</strong></em><strong>&#8217; bets, continuously, forever, and few areas are as well-shaped to that as biosecurity.</strong></p><p>Well, maybe. You could argue that the SARS-CoV-2 virus maybe couldn&#8217;t be <em>the</em> catalyzing incident for the government, since it is still unclear whether it was a lab-leak or not, but what about the 2001 anthrax attacks? How come that didn&#8217;t spur a massive amount of increased federal biodefense funding? In fact, it did. Total US biodefense funding jumped from roughly <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8364771/">$700 million in 2001 to about $4 billion in 2002, peaking at nearly $8 billion in 2005</a>. What was the money used on? A fairly large chunk was put into anthrax-specific [stuff]. As a case study, consider Emergent BioSolutions, the sole producer behind &#8216;BioThrax&#8217;, the only FDA-approved anthrax vaccine. <a href="https://www.fiercepharma.com/vaccines/u-s-government-instills-1b-confidence-emergent-biosolutions-anthrax-vax">They received</a>: one $1.6B contract for a second-generation anthrax vaccine, one $1.25B five-year contract for delivering 44.75 million doses of an older vaccine candidate, followed by a $911 million CDC contract for another 29.4 million doses. <a href="https://www.cnbc.com/2021/04/20/congressional-investigation-launched-into-emergent-biosolutions-federal-vaccine-contracts-.html">A 2021 Congressional investigation found that</a>, for the past decade, nearly half of the Strategic National Stockpile&#8217;s budget had gone to purchasing this anthrax vaccine, a product whose price had been raised 800% since 1998. And it is still ongoing, <a href="https://www.globenewswire.com/news-release/2026/01/08/3215816/33240/en/Emergent-BioSolutions-Receives-Delivery-Order-up-to-21-5-Million-to-Supply-BioThrax-Anthrax-Vaccine-Adsorbed-to-the-U-S-Department-of-War-in-2026.html">with a $21.5 million delivery order to the Department of War was issued as recently as January 2026.</a></p><p>This is, on some level, completely understandable. You prepare for the thing that just happened to you. But it should make us a little nervous about the &#8220;catalyzing event&#8221; thesis for biosecurity companies,  because the empirical reality is that it may not unlock general biodefense spending so much as it locks in countermeasures that are overly anchored on the specific threat and threat vector of that particular incident.</p><p>So, perhaps it is worth exploring outside of this customer base. While governments are the biggest buyer, they surely are not the only one. After all, didn&#8217;t Kathleen&#8217;s comment mention commercial buyers too? There is another group on the table: DNA synthesis companies. A fairly high fraction of the current biosecurity framework rests on a pretty simple idea: that biological information must pass through synthesis companies to become biological reality. To actually make a [thing], you need physical DNA, and to get physical DNA, you order it from a commercial provider. Why not create a layer to screen the DNA being ordered, ensuring that whatever it is, it isn&#8217;t dangerous? This is, as previously mentioned, Aclid&#8217;s business model, alongside<a href="https://www.twentytwo.bio/"> TwentyTwo</a>,<a href="https://securedna.org/"> SecureDNA</a> (a non-profit), and likely others.</p><p>How is that going?</p><h2><strong>The preventative architecture assumes a chokepoint that&#8217;s disappearing</strong></h2><p>There seem to be three problems with DNA-screening-as-biosecurity.</p><p><strong>The first of which is that the screening only works if you&#8217;re ordering sequences long enough to screen.</strong> According to<a href="https://aspr.hhs.gov/S3/Pages/Synthetic-Nucleic-Acid-Screening.aspx"> HHS guidelines, the current screening threshold is 50 nucleotides,</a> but oligonucleotides&#8212;short DNA fragments often used in legitimate research&#8212;can be ordered, assembled, and stitched together into longer sequences. This is not theoretical. <a href="https://www.science.org/content/article/how-canadian-researchers-reconstituted-extinct-poxvirus-100000-using-mail-order-dna">In 2018, Canadian researchers synthesized a functional horsepox virus from mail-ordered DNA fragments for about $100,000</a>. Fairly, this is annoying to do, but a sufficiently dedicated adversary may be happy to do annoying things.</p><p><strong>The second is that screening assumes you&#8217;re looking for </strong><em><strong>known</strong></em><strong> threats, which is to say, sequences with similarities to characterized pathogens.</strong> But if AI biological design tools might enable the creation of <em>de novo</em> pathogens, things that don&#8217;t have a match in any database because they&#8217;ve never existed before, then the screening becomes useless. And you needn&#8217;t even hop your way to truly <em>de novo</em> stuff, you could just redesign the existing bad pathogens in ways that make them invisible to screening tools.<a href="https://www.microsoft.com/en-us/research/story/the-paraphrase-project-designing-defense-for-an-era-of-synthetic-biology/"> For example, Microsoft has a &#8220;paraphrasing&#8221; paper that was exactly this</a>, redesigning known, toxic proteins in ways that evade sequence-based screening while preserving function. To counter this, you&#8217;d need to predict function from sequence alone, which is one of the hardest open problems in the field, especially because &#8216;function&#8217; in biology is one of those super fuzzy, contextual words that can have a bunch of different meanings. It is certainly possible to do&#8212;<a href="https://www.youtube.com/watch?v=w6L9-ySnxZI">see the podcast I did with Yunha Hwang, an MIT professor creating tools to automatically annotate the function of metagenomes</a>&#8212;but it&#8217;s not easy.</p><p><strong>The third problem is the biggest, and it is that</strong> <strong>benchtop DNA synthesizers are getting longer-range. </strong>In other words, you could neatly side-step all these screening checks by buying your own DNA-creation machine, and running synthesis in your bedroom. Right now, the best commercially available benchtop synthesizers tops out at about 120 base pairs per well, which, given that real viruses are on the order of dozens of kilobases, means we&#8217;re safe for now. But there is no functional reason that they cannot get any better. In fact, according<a href="https://ifp.org/securing-benchtop-dna-synthesizers/"> to a fantastic Institute for Progress (IFP) report</a>, it&#8217;s just around the corner. Enzymatic (as opposed to chemical) DNA synthesis is likely less than a decade off, comfortably pushing DNA synthesis capabilities to the kilobase realm. This all said: a few people I talked to mentioned that &#8216;<em>long-range DNA synthesis has been a few years away for a decade-plus now</em>&#8217;, so maybe we can discount this a <em>little</em>, but it&#8217;s worth paying attention to. Especially because, as we mentioned earlier, a DNA synthesizer needn&#8217;t be capable of <em>full</em> viral genome synthesis to be dangerous, since you can simply splice its outputs together.</p><p>This is all quite a pickle.</p><p>Yes, you could lock down the benchtop synthesizers, such that any attempt to use them would involve making an external call to some pathogen database to screen your request. But if the ML design tools get good enough, you can just do continuous zero-shot designs of something that doesn&#8217;t match anything in the database, and iterate from there. And even if the models don&#8217;t get good at that sort of in-vivo prediction behavior&#8212;which, despite what you may hear, is a genuine possibility for at least some time&#8212;you could simply split your order across multiple machines, synthesizing fragments that are each too short to trigger any screening individually, but that assemble into something very much on a select agent list once stitched together.</p><p>This last point is also called a split-order attack. The IFP report discusses this last point as well, and is refreshingly blunt about the prognosis.</p><blockquote><p><em>Moreover, an offline device is vulnerable to the whole class of split-order attacks, whereby the adversary can combine the outputs of two or more devices that are small enough to evade screening in isolation, but together would be recognized. <strong>Without some centralized connectivity, such an attack is impossible to defend against.</strong></em></p></blockquote><p>Are we doomed?</p><p>Maybe. The optimistic angle is that the government can be awfully good at shutting things down when it wants to, and the track record in other domains is quite encouraging. When the Combat Methamphetamine Epidemic Act passed in 2005, putting pseudoephedrine behind the counter and requiring ID and purchase logs,<a href="https://www.chpa.org/about-consumer-healthcare/activities-initiatives/preventing-illegal-meth-production"> domestic meth lab incidents dropped by over 65% within two years</a>. Nuclear materials are an even stronger case: the NRC administers over<a href="https://www.nrc.gov/about-nrc/radiation/protects-you/reg-matls"> 20,000 active licenses for radioactive materials in the US alone</a>, coordinated across 40 states and backed by the international IAEA safeguards regime. This has almost certainly contributed to the fact that there has not been a single case of nuclear terrorism. When the government decides something absolutely cannot be allowed to proliferate, and builds the institutional machinery to back that up, it can, against all odds, work.</p><p>But the fundamental problem here is that preventing bioterrorism requires a level of governmental diligence <strong>that is closer to nuclear-level than meth-level</strong>, and right now it is far behind both. To be fair, there are clear structural differences between biology and nuclear/meth, the biggest one being that biology is much more dual-use. Benchtop synthesizers have far, far more legitimate uses than malevolent ones, and the upside of restricting them is a lot harder to argue for then, say, restricting access to pseudoephedrine.</p><p>Well, what <em>should</em> be done?<a href="https://ifp.org/securing-benchtop-dna-synthesizers/"> The IFP proposal</a>, to its credit, has some pretty clear demands: a mandatory Biosecurity Readiness Certification before any benchtop synthesizer can be legally sold, standardized customer screening for both devices and reagents, and a reagent track-and-trace system modeled on the Drug Supply Chain Security Act for pharmaceuticals. <strong>None of this is crazy, and rhymes with what has already been done for meth and nuclear material.</strong></p><p>What is actually being done? Unfortunately for all of us, every federal document governing DNA synthesis security in the United States right now is (somewhat) voluntary, though there is a nuance here we&#8217;ll get to in a bit. The only binding rules are export controls, which have, circa 2026, already been violated. The IFP essay from earlier happily reports that<strong><a href="https://telesisbio.com/"> </a></strong><a href="https://telesisbio.com/">Telesis</a> disclosed in their SEC filings that their DNA assembly systems <strong>have accidentally ended up in<a href="https://www.sec.gov/Archives/edgar/data/1850079/000119312524107257/d820064dars.pdf"> embargoed countries through distributors.</a></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vZ9q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vZ9q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vZ9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg" width="1456" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vZ9q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 424w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 848w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!vZ9q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F48d06c89-8986-4530-a8dd-7bf32e3b9121_1456x414.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Oops! Does uranium ever accidentally end up in embargoed countries?</p><p>Well, actually, yes. The IAEA has logged over<a href="https://www.iaea.org/newscenter/pressreleases/more-than-145-reports-added-to-iaea-incident-and-trafficking-database-in-2024"> 4,300 incidents of nuclear material outside regulatory control since 1993</a>, 353 of which involved trafficking or malicious use, 13 that involved high enriched uranium, and 2 that involved plutonium. <strong>But importantly, the last time someone got their hands on<a href="https://www.iaea.org/sites/default/files/25/03/itdb-factsheet.pdf"> kilogram quantities of weapons-usable material was 1994</a></strong>. The system leaks at the margins, but it doesn&#8217;t leak at the catastrophic level.</p><p>The security model that you&#8217;ll continuously hear repeated among biosecurity experts is the &#8216;swiss-cheese&#8217; model, in which the purpose of the regulatory apparatus is to present enough overlapping layers of defense such that no actors, other than the absolute most determined, are willing to go through the trouble. The defenses against nuclear and meth are swiss-cheese-y, and the ideal solution for DNA screening will likely be similar. Possible to defeat, but difficult, annoying, and legally scary to attempt.</p><p>And at least one layer of cheese is present: I mentioned that screening is largely voluntary on the part of the synthesis companies, but there is an important caveat. It is required for <strong>federally funded entities</strong> to purchase synthetic nucleic acids only from providers or manufacturers that adhere to the<a href="https://genesynthesisscreening.centerforhealthsecurity.org/for-customers"> US Framework for Nucleic Acid Synthesis Screening.</a> In other words, if a DNA synthesis company wants to sell to the enormous market of federally funded researchers (most of the U.S. life sciences market), they effectively must implement screening.</p><p>Well&#8230;kinda. This particular screening requirement was the intended purpose of one piece of legislation that was passed in 2024, but the current administration issued an executive order in 2025 to replace it with something [better] within 90 days. These 90 days have come and gone, and there is yet for anything to pass to mandate it again. This said, the <em>biggest</em> DNA synthesis providers (Twist and the like) see the writing on the wall, and have already implemented the screening that they imagine will be required of them, but it is unlikely smaller DNA synthesis providers have. Circa February 2026,<a href="https://www.nti.org/news/nti-endorses-biosecurity-modernization-and-innovation-act-of-2026/"> there is a bill going through the Senate to address this current regulatory gap</a>.</p><p>But what about all the problems from before? Split-order screening, AI-assisted genome redesign, and DNA benchtop synthesizers? Legally mandated screening is surely useless given those. We need more layers of cheese to defend against these!</p><p><strong>Many smart people have thought about these challenges, and there are ways to solve them </strong><em><strong>if</strong></em><strong> you can get widespread buy-in from the synthesis providers.</strong> You could create centralized repositories of DNA orders that are aggregated from multiple providers, you could assemble private saturation mutagenesis viral datasets to catch most attempted redesigns from bad actors, and you can install hardware locks on benchtop synthesizers to prevent them from being used without connection to the aforementioned centralized repository.</p><p>None of this is scientific fiction! There are groups actively working on all of them, and some are even wrapped up in the Feb 2026 bill I just mentioned. But we&#8217;ll see how realistic they are to implement in practice.</p><h2><strong>Targeting humans with bioweapons is (probably) genuinely difficult</strong></h2><p>There is something under-appreciated worth discussing: making and spreading bioweapons is not easy. I mentioned this at the start, but there is a lot more color to add.</p><p>If you talk to biosecurity folks for long enough, they will eventually mention<a href="https://en.wikipedia.org/wiki/Aum_Shinrikyo"> Aum Shinrikyo</a>. Aum is a Japanese doomsday cult that, in the 1990s, had everything a would-be bioterrorist could ask for: hundreds of millions of dollars, a graduate-trained virologist who had studied at Kyoto University running their bioweapons program (<a href="https://en.wikipedia.org/wiki/Seiichi_Endo">Seiichi Endo</a>), dedicated lab facilities, and years of total freedom from law enforcement scrutiny. They believed the end of the world was upon them, and that their mission was to hurry the whole thing along. On their journey to do exactly this, they attempted ten biological attacks.</p><p><strong>Every single one failed.</strong> Their most ambitious effort was the<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC88589/"> 1993 anthrax attack on Kameido, Tokyo</a>, where cult members sprayed a liquid suspension of <em>Bacillus anthracis</em> spores, or anthrax, from a cooling tower on the roof of their headquarters onto the streets below. Nothing happened. It turned out they&#8217;d<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7123542/"> acquired a vaccine strain of anthrax</a>, one that is, to quote the CDC&#8217;s postmortem, &#8220;<em>generally regarded as nonpathogenic for immunocompetent people</em>.&#8221; Even if they&#8217;d had the right strain, the spore concentration in their slurry was about 10&#8308; per milliliter, versus the 10&#8313; to 10&#185;&#8304; considered optimal for a liquid bioweapon.<a href="https://www.nti.org/analysis/articles/revisiting-aum-shinrikyo-new-insights-most-extensive-non-state-biological-weapons-program-date-1/"> They had a botulinum toxin program too</a>, in which they attempted multiple attacks using vans fitted with sprayers. Once again, zero effect. The toxin was likely either degraded during processing, too dilute to have any effect, or produced from a non-toxigenic strain because they couldn&#8217;t maintain proper anaerobic fermentation conditions. It is unclear as of today.</p><p>An account of the many difficulties the group faced in actually creating usable bioweapons is<a href="https://www.nti.org/analysis/articles/revisiting-aum-shinrikyo-new-insights-most-extensive-non-state-biological-weapons-program-date-1/"> well-described in this 2011 report</a>, which has some real comedic gems:</p><blockquote><p><em>Mice on which the yellow liquid [Botulinum Neurotoxin] was tested showed no toxic effects, and one cult member reportedly slipped into a fermenting tank and nearly drowned, but subsequently showed no signs of illness.</em></p></blockquote><p>The same report notes that even Aum&#8217;s manner of spreading their pathogens may have interfered with their efficacy:</p><blockquote><p><em>In the even more unlikely event that Aum had produced and successfully stored volumes of a virulent strain, it is possible that poor dissemination capabilities might have damaged the material or failed to aerosolize it so that sufficient quantities could be inhaled.</em></p><p><em>For example, the cult employed a homemade nozzle that reportedly spouted rather than sprayed and dispersed material during the day, exposing it to UV radiation and thermal updrafts that would have reduced concentrations at ground level.</em></p></blockquote><p>The group did finally end up partially succeeding, but only after switching to chemical weapons: sarin nerve gas,<a href="https://en.wikipedia.org/wiki/Tokyo_subway_sarin_attack"> which ended up killing 13 people and injuring thousands on the Tokyo subway in 1995.</a></p><p>But, you may protest, the 1990&#8217;s was a long time ago. We have nanopores now. We have Alphafold3 now. We have a (somewhat) mature field of synthetic biology.</p><p>All very true, but consider what actually went wrong for Aum. They used the wrong strains, their fermentation got contaminated, their concentrations were off by five orders of magnitude, their aerosolization likely didn&#8217;t work, a guy fell into a fermenter and was fine. These were problems of bioprocess engineering, strain selection, maintaining sterile culture conditions, building dissemination devices that produce the right particle size, and overall wet-lab competence. Some of these are pure information problems, yes, and some of them are fixed by using easier-to-produce viruses (rather than bacteria), yes. But others are iterative, hands-on, tacit protocol development work. Of those, none would be aided by the current generation of structural biology models, and only some would be aided by LLMs given the<a href="https://activesite.bio/"> Active Site</a> results I mentioned at the start of this essay.</p><p>There are other case studies to consider too. Canonically, there are three other historical bioweapons programs of note: the Soviet Union&#8217;s in the 1970s, Iraq&#8217;s program under Saddam in the 1960s, and the US&#8217;s own Cold-War-era investigation into bioweapons in the 1960s. Unlike Aum, all three had one thing in common: they were<em> state programs</em>, with thousands of employees, dedicated production facilities, and decades of institutional knowledge.</p><p>How did these groups fare?</p><p>Iraq&#8217;s program, despite Saddam&#8217;s enthusiasm,<a href="https://pubmed.ncbi.nlm.nih.gov/9244334/"> produced anthrax and botulinum toxin of such inconsistent quality</a> that US intelligence assessments after the Gulf War concluded the weapons would have been largely ineffective in most deployment scenarios.</p><p><a href="https://en.wikipedia.org/wiki/United_States_biological_weapons_program">The US program</a>&#8212;which weaponized anthrax, botulinum toxin, tularemia, brucellosis, and Q fever&#8212;had a slightly different takeaway, but one that&#8217;s still directionally aligned with what we&#8217;ve discussed. After nearly three decades of doing comically dangerous acts like<a href="https://www.businessinsider.com/biological-agents-were-tested-on-the-new-york-city-subway-2015-11"> releasing simulant organisms in the San Francisco Bay Area and the New York subway</a> to study how pathogens would move through civilian infrastructure, the conclusion wasn&#8217;t exactly that bioweapons <em>didn&#8217;t work</em>,<a href="https://nationalinterest.org/blog/buzz/does-america-still-have-bioweapons-program-hk-092525"> it was that they were strategically irrelevant</a>. At this point, the US already had a nuclear arsenal that can glass a continent in an afternoon, and the marginal value of a weapon that is unpredictable, uncontrollable, and might blow back on your own population became effectively zero. Nixon shut the program down in 1969, and there were few complaints against the decision.</p><p>Next, the Soviet program, also known as &#8216;Biopreparat&#8217;. It was the largest biological weapons program in human history,<a href="https://www.frontiersin.org/journals/political-science/articles/10.3389/fpos.2025.1654084/full"> employing over 60,000 people at its peak</a>, and spent years trying to weaponize smallpox and plague. And it worked.<a href="https://www.frontiersin.org/journals/political-science/articles/10.3389/fpos.2025.1654084/full"> Some insane lines from a Frontiers article about the program attached here, bolding by me:</a></p><blockquote><p><em>Some Biopreparat and military facilities continuously produced agents and filled the delivery systems kept on standby. <strong>For example, the Soviets annually made about two metric tons of antibiotic-resistant pneumonic plague and 20 tons of liquid smallpox grown in eggs. Refrigerated bunkers stored the bulk smallpox, which had a 6 to 12-month shelf life, and also contained filling lines for munitions and spray tanks.</strong></em></p><p><em>&#8230;.The Corpus One building of The State Scientific Center of Applied Microbiology at Obolensk contains <strong>42-story tall fermenters</strong>, separated into different biosafety containment zones, to make plague and other agents.</em></p><p><em><strong>Building 221 at The Scientific Experimental and Production Base at Stepnogorsk housed 10 four-story-high, 20,000-liter fermenters and could make 300 metric tons of anthrax in 10 months.</strong> Other production lines at Kurgan, Penza, and Sverdlovsk could add hundreds more tons to the USSR&#8217;s prodigious capability to make biowarfare agents and fill munitions on short notice.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h-AS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h-AS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 424w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 848w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 1272w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h-AS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png" width="548" height="440.7541766109785" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:838,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h-AS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 424w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 848w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 1272w, https://substackcdn.com/image/fetch/$s_!h-AS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a63fe7c-53aa-4f91-afa8-25df2be4141e_838x674.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Fortunately for us, the Soviet economy collapsed before this stockpile could be used for anything world-ending.</p><p>I think there are a few takeaways here. One&#8212;from the US&#8217;s experience&#8212;is that bioweapons are fundamentally not worth it if the end goal is to wag a very large stick towards your enemy. Two&#8212;from Aum&#8217;s and Iraq&#8217;s experience&#8212;is that bioweapons are genuinely hard to create and disperse, even with significant resources and time. And three&#8212;from the Soviets experience&#8212;is that if you throw enough of a country&#8217;s industrial base at the problem, the engineering/scientific barriers <em>can</em> be overcome, but the scale of effort required is immense.</p><p>These are, alongside Aum, four, isolated cases from decades back. How much could we learn from such an isolated slice of history? Should we really let our mental models be informed by this?</p><p>Unfortunately, it is the best we&#8217;ve got. We do know there are other ongoing bioweapons programs today.<a href="https://www.state.gov/wp-content/uploads/2024/04/2024-Arms-Control-Treaty-Compliance-Report.pdf"> In an April 2024 compliance report</a> released by the the U.S. Department of State, they state that North Korea and Russia are definitely running a bioweapon program, and it is possible that Iran and China are also. Should this freak us out? Maybe. On one hand, we should take seriously the US opinion that bioweapons kind of suck, and that there are easier ways to kill many people. On the other hand, the strategic value of bioweapons is not just in killing many people, but also in plausible deniability. Either way, whether these programs perform as intended in a real-world deployment scenario is a very different question, and one that neither the compliance report nor this essay is not positioned to answer. </p><h2><strong>Agricultural bioterrorism is (probably) really easy</strong></h2><p>Unfortunately, most of what I said earlier referred to pathogens meant to target <em>humans</em>. The calculus changes dramatically when your targets are cows or a wheat field, or so-called &#8216;<em>agroterrorism&#8217;</em>. This isn&#8217;t great news, especially because if you spend any time reading the biosecurity discourse, you will notice that relatively few people discuss this topic, and, of the folks who mention it, the word &#8216;<em>overlooked</em>&#8217; pops up a worrying amount.</p><p>Over the next few paragraphs, I&#8217;ll try to give some intuition as to why agroterrorism is uniquely challenging to combat.</p><p>First, the actual design of the pathogen.</p><p>Unlike most of the other, nastier viruses and bacteria that cause humans to bleed from every orifice, many incredibly dangerous agricultural pathogens do not require BSL-3/4 equipment to safely create. As a result, the barrier to entry in agroterrorism is incredibly low. While the Soviet Union bioweapons program had to regularly deal with unfortunate cases of accidental Marburg, smallpox, and anthrax leaks&#8212;even while having BSL-3-ready labs!&#8212;a bad actor here can freely muck around with designing whatever they want with little threat. And if you&#8217;re feeling especially thrifty, you don&#8217;t even need a novel gain-of-function chimera. You need<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9145556/"> foot-and-mouth</a> disease, which already exists in nature, is endemic in parts of Africa and Asia, and is one of the most contagious diseases known to veterinary medicine.</p><p>In fact, we know this because a former Soviet Union bioweapons producer&#8212;<a href="https://en.wikipedia.org/wiki/Ken_Alibek">Kenneth Alibek</a>&#8212;told us. In a 2006 report, he extensively discussed his work, with one paper having a particularly good paraphrasing:</p><blockquote><p><em>Alibek describes the Soviets as producing anti-livestock, anti-crop, and combined anti-livestock/anti-personnel pathogens. During the course of its existence, the Soviet&#8217;s anti-agricultural bioweapons program produced and weaponized the anti-crop pathogens Wheat Rust, Rye Blast, and Rice Blast; the anti-livestock pathogens African Swine Fever, Rinderpest, and foot-and-mouth disease&#8230;</em></p><p><em>&#8230;The Soviets used simple, rudimentary techniques to develop these effective antiagriculture pathogens. They developed anti-crop fungal pathogens through a simple ground cultivation technique, while anti-livestock pathogens were developed in live animals&#8230;</em></p><p><em><strong>All of these techniques, as Alibek points out, could easily be utilized by unsophisticated terrorist organizations to develop bioweapons designed to cause mass casualties of agriculture.</strong></em></p></blockquote><p>Next, distribution.</p><p>If you want to cause a human pandemic, you need aerosolization, you need to calculate incubation times, you need sophisticated delivery mechanisms. Agricultural pathogens require none of this.<a href="https://www.degruyterbrill.com/document/doi/10.1515/jbbbl-2019-0010/html?lang=en&amp;srsltid=AfmBOoqihe-QYQxoYSFURfEafvtCrCE3hG51FUezhzl2B9yrBD665aG8"> As one paper puts it</a>, deploying plant or animal pathogens could be as simple as &#8220;<em>atomizing unprocessed pathogen near the target organisms or, in the case of animals, directly applying the pathogen to the nose and mouth of the organisms.</em>&#8221;. Why is it so easy? Is there something special about agricultural pathogens? <strong>No, but there is something special about how modern agriculture is done, in that it involves thousands of nearly-genetically-identical plants and animals in astonishingly dense conditions</strong>. The environment does the work. All this, with virtually zero risk to the adversary, given that this would not be done in crowded cities with cameras on every corner, but on sprawling, isolated farms that have essentially zero security infrastructure.</p><p>Finally detection.</p><p>Unlike human disease surveillance, which benefits from the fact that sick people tend to show up at hospitals and demand attention, cows and wheat do not. As a result, agricultural disease relies on a very error prone set of steps for its detection to ever occur: one, the farmer noticing something is wrong with their animals, two, the farmer reporting it to the government, and three, the authorities being dispatched.</p><p>We&#8217;re going to spend the next few paragraphs discussing these three steps, because each step is a point of failure, and they fail constantly.</p><p>First, the farmer notices something is wrong. This is hard. You have to realize the scale that modern agriculture operates at. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FNk1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FNk1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FNk1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg" width="1040" height="695" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:695,&quot;width&quot;:1040,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Worker in factory farm surrounded by hundreds of chicken &quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Worker in factory farm surrounded by hundreds of chicken " title="Worker in factory farm surrounded by hundreds of chicken " srcset="https://substackcdn.com/image/fetch/$s_!FNk1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 424w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 848w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!FNk1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6512432a-f0b2-404d-b401-801b162b2404_1040x695.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://insideanimalag.org/poultry-factory-farm-sizes/">A single large-scale poultry operation</a> can house 50,000 turkeys or hundreds of thousands of laying hens in a single building. A feedlot might hold 100,000 head of cattle. The average dairy herd in states like California or Idaho now exceeds a thousand cows. And the trend is accelerating: <a href="https://www.ers.usda.gov/amber-waves/2020/february/consolidation-in-u-s-agriculture-continues">U.S. livestock production has been consolidating into fewer, much larger operations for decades</a>, with the economics of scale constantly toward ever-increasing density. As a matter of example: an outbreak of H5N1 among cattle populations in the United States began in December 2023, . How long was the lag between initial infection and actual detection?<a href="https://www.science.org/doi/10.1126/science.adq0900"> According to a Science paper from April 2025</a>, <strong>the virus circulated entirely undetected for over 4 months.</strong> Clinical signs&#8212;reduced milk production, decreased feed intake, and changes in milk quality&#8212;were first noticed by veterinarians in late January 2024. Only on March 25, 2024 was the virus confirmed to exist after genetic sampling of the cows milk. By that point, the virus had already reached 26 dairy cattle premises across eight states and six poultry premises in three states.</p><p>Let&#8217;s say the farmer eventually realizes that something is wrong. Now they need to report it to the correct authorities. But why would they? There is something extraordinarily perverse about the reporting incentives at play here: <strong>farmers are actively disincentivized from flagging unusual disease, because a confirmed outbreak of a notifiable disease may wipe out their entire livelihood</strong>. Remember: these pathogens are often so virulent, so adaptive, that mass culling of their herd will be what is demanded of them. So, if you&#8217;re a rancher staring at a few sick animals, the economically rational move is to wait and see if they get better, not to call a vet and risk having your entire herd destroyed. <strong>Once again, there is empirical proof here: how Indonesian farmers handled avian bird flu in 2006.</strong><a href="https://www.ncbi.nlm.nih.gov/books/NBK215309/"> A paragraph from a zoonotic disease book is instructive</a>:</p><blockquote><p><em>Those smallholder poultry keepers questioned the severity of the avian influenza threat to their birds&#8230;.Some continued to consume and sell diseased dead birds. <strong>Small to medium-sized contract poultry farmers feared that government officials might cull their birds before definitive laboratory confirmation of the disease, and they were skeptical of compensation schemes or believed compensation was too low.</strong> These poultry farmers reported the deaths of chickens to contractors, who in turn sought the services of private veterinarians to determine the causes of bird death, making effective disease surveillance difficult. Smallholder poultry farmers and keepers feared reporting incidents directly to the government. This fear was not limited to a concern about losing their own birds, but also to the social risk of angering nearby neighbors, whose birds would be subject to culling within a 2&#8211;5 km radius of an outbreak location.</em></p></blockquote><p>You may ask: in the case of animals, why can&#8217;t we just vaccinate them? You can! But export regulations prevent most farmers from doing so, because standard vaccines make it impossible to distinguish a vaccinated animal from an infected one. Vaccines that include marker proteins allowing serological tests to tell vaccinated animals apart from infected ones do exist, or so-called <a href="https://en.wikipedia.org/wiki/Marker_vaccine">DIVA vaccines</a>, but adoption has been glacial.</p><p>Finally, let&#8217;s say, against their better judgement, the farmer reports it. What happens then?</p><p>How the U.S. government actually responds to agricultural threats is theoretically fairly straightforward. Human pathogens fall under HHS, via the CDC. Agricultural pathogens fall under the USDA, via its Animal and Plant Health Inspection Service (APHIS). There is a select agent list for each, plus an overlap category for things that threaten both. The jurisdictional lines are reasonably clear. <strong>The problem with the agency technically in charge, the USDA, is that it is also the agency whose mission includes promoting the very industry it would need to disrupt in a crisis.</strong></p><p>To understand this better,<a href="https://www.vanityfair.com/news/story/inside-the-bungled-bird-flu-response"> we can look at a fascinating Vanity Fair investigation</a> that interviewed over 55 people across USDA, CDC, HHS, and the White House, all of whom were involved in the same H5N1 cattle outbreak we just discussed. Since the virus was first confirmed in 2024, the two organizations were barely aligned: the White House was planning a public-health-directed response, while the USDA was prioritizing the needs of the dairy industry.</p><p>Within weeks of the diagnosis, <strong>APHIS employees began calling state veterinarians from personal cell phones to confide that they had been instructed not to discuss, not to engage, and to discontinue even routine conversations with health officials in the field unless talking points were pre-approved</strong>. The USDA sat on genetic sequencing data for weeks, sharing samples an average of 24 days after collection&#8212;compared to 8 days for the CDC&#8212;and without basic metadata like the date or state of collection, rendering the data effectively useless for real-time monitoring. <strong>The same farmer incentive problem from before reared its ugly head too: dairy farmers simply opted not to test, and some forced veterinarians off their property.</strong> At least five veterinarians who had been outspoken in responding to the outbreak were fired from their jobs. By the time a Federal Order requiring pre-movement testing was issued, the virus had already spread across multiple states. And the testing regime was widely regarded as obviously insufficient: just 30 animals per herd, with farmers reportedly prescreening in private labs to cherry-pick healthy animals.</p><p>Because why not? Who was going to stop them?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sJXI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sJXI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 424w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 848w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sJXI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png" width="1456" height="957" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sJXI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 424w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 848w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!sJXI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bee507c-a1e4-4e8d-8165-1101ec35613a_1600x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This was a naturally occurring virus, both in viral origin and how it was spread. Yet, the federal response still took months to coalesce into something real.</p><p>And as much as you may think the APHIS bungled this, it is difficult to imagine their future responses will look much better.<a href="https://www.thepoultrysite.com/news/2025/05/veterinary-exodus-leaves-us-vulnerable-to-animal-disease-threats"> As of mid-2025, APHIS lost roughly 1,377 staff under the administration&#8217;s workforce reduction push</a>, about 16% of its employees. The USDA also accidentally fired several employees working on the H5N1 response, and<a href="https://www.nbcnews.com/politics/doge/usda-accidentally-fired-officials-bird-flu-rehire-rcna192716"> had to scramble to rescind those termination letters within days</a>. Yes, it may be the case that the organization is bloated beyond a reasonable doubt, and the cuts were deserved. But the cuts have not been accompanied by any visible effort to fix the structural problem here: the fact that the USDA is simultaneously the regulator of and the lobbyist for the industry it oversees.</p><p>But there is an important question to ask. What is the ultimate impact of all this? What actually happens if a successful agroterrorism attack occurs? Because if it&#8217;s insignificant, just a rounding error, then none of this should be a concern.</p><p>It is not a rounding error. The 2001 foot-and-mouth (FMD) outbreak in the UK resulted in over 6 million animals culled,<a href="https://www.nao.org.uk/reports/the-2001-outbreak-of-foot-and-mouth-disease/"> cost the public sector &#163;3+ billion and the private sector &#163;5+ billion</a>, was severe enough to delay that<a href="https://en.wikipedia.org/wiki/2001_United_Kingdom_foot-and-mouth_outbreak"> year&#8217;s general election by a month</a>, and lead to the<a href="https://en.wikipedia.org/wiki/Ministry_of_Agriculture,_Fisheries_and_Food_(United_Kingdom)"> dissolution</a> of the Ministry of Agriculture entirely. Simulation models for the United States are even uglier.<a href="https://journals.sagepub.com/doi/full/10.1177/104063871102300104"> A study modeling FMD outbreak</a> originating in a single California dairy farm found that median national agricultural losses ranged from $2.3 billion to $69.0 billion depending on detection delay, with every additional hour of delay at the 21-day mark costing roughly $565 million and another 2,000 animals to be slaughtered. What about a deliberate, state-actor attack?<a href="https://www.omicsonline.org/economic-impacts-of-potential-foot-and-mouth-disease-agroterrorism-in-the-usa-a-general-equilibrium-analysis-2157-2526.S12-001.php?aid=11430"> Another simulation model</a> estimated the economic impact of a FMD agroterrorism scenario&#8212;vast, widespread dispersal of the pathogen&#8212;put possible losses between $37 billion and $228 billion across three scenarios, from a contained state-level outbreak to a large multi-state attack.</p><p>But there is at least some argument that, under some mental models, it actually is a rounding error. The United States&#8217; agricultural GDP is roughly $1.4 trillion, while the overall GDP is $29 trillion. Even the worst-case FMD simulation represents about a 16% hit to agriculture, and a 1% hit to the broader US economy. This is not nothing, it may completely devastate the nation, but it also is not civilization-ending.</p><p>Yet, while agroterrorism perhaps isn&#8217;t a standard x-risk scenario, when evaluated against the <em>&#8220;is this a serious national security threat</em>&#8220; standard, the answer feels like it is an obvious yes. This raises a rather important question. If everything I&#8217;ve said is true&#8212;and I&#8217;m pretty sure it is&#8212;why hasn&#8217;t there been a significant agroterrorism event&#8230;ever? I have no idea, and it too was a point of confusion among most those I talked to. The best argument I&#8217;ve heard is that, if the ultimate goal of bioterrorism is to either terrify a nation or outright end the world, neither the aesthetics nor net-effect of agroterrorism is well suited for either. </p><p>However, one person I talked to <em>did</em> say there has, in fact, been one case of minor agroterrorism they are aware of: <a href="https://www.scmp.com/news/china/society/article/3042991/china-flight-systems-jammed-pig-farms-african-swine-fever">in late 2019,</a> drones controlled by gangs dropped [items] infected with African swine fever into commercial pig farms in China. Why were the gangs trying to spread swine fever? So that the farmer would be forced to sell their potentially infected meat cheaply to the gangs, who would then sell it on as healthy stock. This feels like a rather roundabout way to make money, but it happened. Moreover, it may be the case that stuff like this occurs far more than anyone realizes, since the whole racket was only discovered because Chinese farmers resorted to radio jammers to prevent the drones from flying near the farms, which ran afoul of the regional aviation authority. </p><h2><strong>The monitoring architecture is useful for detection, but not defense</strong></h2><p>The United States has two main systems for detecting biological threats in the environment: one that watches the air, and one that watches the sewage.</p><p>Let&#8217;s start with the air.<a href="https://en.wikipedia.org/wiki/BioWatch"> BioWatch</a> is a federal program to detect the release of pathogens into the air as part of a terrorist attack on major American cities, created in 2001 in response to the anthrax attacks.<a href="https://www.ncbi.nlm.nih.gov/books/NBK499702/"> Here is how it works:</a></p><blockquote><p><em>As currently deployed, BioWatch collectors draw air through filters that field technicians collect daily and transport to laboratories, where professional technicians analyze the material collected on the filter for evidence of biological threats [via PCR]. The entire collection and analysis process takes up to 36 hours to detect the presence of a potential pathogen of interest.</em></p><p><em>A positive result triggers what is known as a BioWatch Actionable Result (BAR), an indication that genetic material consistent with a target pathogen was present on a BioWatch filter. Upon declaration of a BAR, local, state, and federal officials then assess relevant information and determine the course of action to pursue.</em></p></blockquote><p>Very cool, isn&#8217;t it? Here&#8217;s what one of the air filter boxes look like:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zZ3N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zZ3N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zZ3N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg" width="425" height="566.5693681318681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1941,&quot;width&quot;:1456,&quot;resizeWidth&quot;:425,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;That mysterious Homeland Security box plugged into an SF utility pole is  a...&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="That mysterious Homeland Security box plugged into an SF utility pole is  a..." title="That mysterious Homeland Security box plugged into an SF utility pole is  a..." srcset="https://substackcdn.com/image/fetch/$s_!zZ3N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zZ3N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faa312b74-7e14-4b36-a85d-535515adf9dc_1920x2560.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The problem with the system, and this is a big one, is that it has literally never once been useful. Never. Not once. Every single time a BAR has been announced, the subsequent investigation has concluded that it was either a false positive or an environmental anomaly indistinguishable from something dangerous.<a href="https://www.dhs.gov/archive/news/2012/07/12/truth-about-biowatch"> A Department of Homeland Security page</a> has this helpful note about it:</p><blockquote><p><em>Out of these more than 7 million tests, BioWatch has reported 149 instances in which naturally-occurring biological pathogens were detected from environmental sources. Many of the pathogens the BioWatch system is designed to detect occur naturally in the environment, such as the bacteria that causes anthrax, which has been used as a weapon, but is also found in nature. For example, near the nation&#8217;s Southwest border there have been a number of instances where a bacterium that is endemic in the environment has been identified. Thankfully, none of the instances were actual attacks.</em></p></blockquote><p>It also has these lines that I thought were quite funny:</p><blockquote><p><em>The detection of commonly occurring environmental agents is not a &#8220;false positive.&#8221; Much like a home smoke detector goes off for both burnt toast and a major fire, the smoke detector is meant to notify you of a potential fire before it&#8217;s too late. BioWatch works very much the same way.</em></p></blockquote><p>A smoke detector that has gone off 149 times over two decades and never once for an actual fire is almost certainly not a functioning smoke detector. And this particular smoke detector cost hundreds of millions to set up, and tens of millions a year to maintain! To be clear: there is no technological reason that these can&#8217;t be made better, and there are startups, such as <a href="https://pilgrimlabs.com/">Pilgrim Labs</a>, that are working on improving similar air-detection systems. If curious, <a href="https://youtu.be/Vxj41-p8xyo?t=1318">I found the Pilgrim&#8217;s founders interview here to be worth watching</a>.</p><p>On the sewage side, the whole endeavor is actually going fairly well. But before we go on: monitoring the air is obvious, but why monitor sewage? Because nearly every pathogen that infects a human being eventually ends up in the toilet. Because of this, looking through sewage is perhaps the most honest epidemiological data source available, because people cannot choose not to participate.</p><p>And we&#8217;re doing very well in monitoring this sludge, or doing so-called &#8216;wastewater screening&#8217;. A lot of people in biosecurity complain that &#8216;<em>the federal government learned nothing from COVID</em>&#8217;, and they are mostly right, with one huge counterexample: <strong>the national wastewater surveillance infrastructure, which was largely built in response to the pandemic.</strong> The<a href="https://www.cdc.gov/nwss/about.html"> National Wastewater Surveillance System (NWSS)</a>, launched by the CDC in September 2020, established that you could detect community-level viral trends days before clinical cases appeared, using nothing more than the genetic material people flush down the toilet, without requiring any of them to consent to testing, show up at a clinic, or even know they&#8217;re sick.</p><p>But the problem with the NWSS, as it is currently deployed, is that it is a targeted system, relying on qPCR to identify specific, known threats. And among the 500-600 sites where NWSS monitoring stations are deployed, they measure three things: COVID-19, Influenza A, and RSV. </p><p>80% of them also measure three more things: Measles, H5N1, and Monkeypox.</p><p>There&#8217;s an awful lot missing, isn&#8217;t there? What about all the other types of Influenza? Norovirus? And the scarier ones too, Nipah, Ebola, Tularemia, all of them are entirely absent.</p><p>The answer is, in principle, to switch away from qPCR and do metagenomic sequencing: instead of looking for specific pathogens, you sequence <em>everything</em> in the sample and computationally figure out what&#8217;s there.<a href="https://www.owlposting.com/p/a-primer-on-why-microbiome-research?open=false#%C2%A7difficulty-of-characterization"> I&#8217;ve written about metagenomics in the context of microbiomes</a>, so you can look there for a deeper explanation on how it works.</p><p>Why isn&#8217;t anyone doing this?</p><p>In fact, there is someone doing this, and this leads us to what I&#8217;d consider one of the crown jewels of what the U.S. nonprofit-biosecurity-complex has managed to accomplish:<strong><a href="https://naobservatory.org/"> SecureBio Detection, previously known as Nucleic Acid Observatory</a> (NAO), which has been building a pilot metagenomics-based wastewater screening network in the US since 2021</strong>.<a href="https://securebio.substack.com/p/nao-updates-november-2025"> Circa November 2025, they maintain 31 sampling sites across the US, in 19 cities</a>, sequencing about 60 billion read pairs weekly. And they&#8217;ve already stumbled across a few interesting things, such as detecting measles in wastewater from Kaua&#699;i County, Hawaii and West Nile Virus in Missouri&#8212;<a href="https://content.govdelivery.com/accounts/MODHSS/bulletins/3f64540">the latter of which ended up having real, confirmed cases to go alongside it</a>! There is an ongoing effort to have something similar at the federal level&#8212;the<a href="https://naobservatory.org/blog/biothreat_radar/"> so-called &#8216;Biothreat Radar&#8217;</a>&#8212;but it doesn&#8217;t seem to actually exist yet. But SecureBio Detection continues!</p><p>This is quite promising. This is a bonafide, national-scale attempt to detect both known and unknown biological threads, and it works! They are also doing some interesting ML <a href="https://arxiv.org/abs/2501.02045">work in being able to automatically detect</a>, via a metagenomic language model, whether unknown metagenomes are simply uncharacterized, innocuous microbes (i.e. nearly all microbes) or human-targeting pathogens worth worrying about.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Qvr2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qvr2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 424w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 848w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 1272w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qvr2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png" width="1456" height="385" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:385,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:386950,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/145813239?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qvr2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 424w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 848w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 1272w, https://substackcdn.com/image/fetch/$s_!Qvr2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5ba999a-61b8-438b-97a5-c6792b5fb09d_3014x796.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But, despite how good wastewater screening is, it is worth remembering that <strong>detection is not defense</strong>. This may seem like a semantic point, of course detection isn&#8217;t defense, but certainly it should allow you to defend faster or better.</p><p>But does it really?</p><p>If you&#8217;re detecting something known&#8212;a COVID variant, a resurgent influenza strain&#8212;then yes, detection may accelerate response, because you already know what to make against it. <strong>But if you&#8217;re detecting something novel, then what exactly happens next?</strong> Designing vaccines that elicit neutralizing antibodies is difficult in the best of circumstances, clinical trials take time, and, in the meantime, the underlying pathogen will continue to mutate, potentially diverging from whatever you&#8217;re designing against it. This is surprisingly under-discussed, but it is worth marinating in the fact that, yes, BioNTech&#8217;s and Moderna&#8217;s capacity to generate a COVID-19 vaccine so quickly was indeed an extraordinary feat of logistics and science,<a href="https://www.ama-assn.org/public-health/infectious-diseases/how-decade-coronavirus-research-paved-way-covid-19-vaccines"> </a><strong><a href="https://www.ama-assn.org/public-health/infectious-diseases/how-decade-coronavirus-research-paved-way-covid-19-vaccines">but</a></strong><a href="https://www.ama-assn.org/public-health/infectious-diseases/how-decade-coronavirus-research-paved-way-covid-19-vaccines"> </a><strong><a href="https://www.ama-assn.org/public-health/infectious-diseases/how-decade-coronavirus-research-paved-way-covid-19-vaccines">the usage of the spike protein segment as an immunogen in the vaccine was informed by two decades of prior coronaviruses research</a>. </strong>In the case of a brand new, chimeric virus that has no immediate cousin, a few weeks of advance notice is just a longer window in which to watch the curve steepen.</p><p>Finally, in both cases, either a natural or engineered pathogen, there exists one last problem: coordination. There is no pre-negotiated decision tree for what happens after something scary is detected, no threshold that, once crossed, triggers automatic funding for therapeutic stockpiling or accelerated clinical development. There probably should be one! But there isn&#8217;t today and, as far as I can tell, there aren&#8217;t plans for one to exist. <strong>Ultimately, the value of early warning is bounded by the speed of the response it enables, and that speed seems extremely limited today.</strong></p><h2><strong>Machine learning may be very useful for rapid-response therapeutics</strong></h2><div><hr></div><p><em>This section is me going off-script from the experts I talked to. The pipeline I will describe below does not exist in any meaningful capacity, but there are inklings of it found across the therapeutics-for-biosecurity plays out there, so it feels like the mental framework is informative regardless. As in, the logical steps mentioned here may <strong>massively</strong> diverge from what will realistically occur, but the types of models, timelines, and decision calculus used likely will not.</em> </p><div><hr></div><p>The<a href="https://cepi.net/"> Coalition for Epidemic Preparedness Innovations</a>, or CEPI, has an initiative that identifies exactly what you&#8217;d want your government to be capable of in the case of a major pandemic: <a href="https://cepi.net/cepi-20-and-100-days-mission">the 100 Days Mission</a>. As in, from the day of realizing, &#8216;<em>we probably should mount a response to this weird sequence we found&#8217;,</em> therapeutic options should be ready to go within three months for population-scale deployment. It took 326 days to get the first COVID-19 vaccine authorized, and that was widely regarded as the fastest vaccine development in human history. How could 100 days be possible?</p><p><a href="https://www.mdpi.com/2076-393X/13/8/849">Luckily for us, they&#8217;ve defended the position at length in a paper.</a> Long story short: this is not an unreasonable timeline if you&#8217;re in a coronavirus-y situation, where your adversary is something that millions of hours of research has already gone into characterizing. Why? Because the second you can identify the ideal immunogen&#8212;or, what you should be sticking in your vaccine to elicit the antibody repertoire that neutralizes the virus&#8212;you&#8217;re done with the major technical design challenge. Like I mentioned earlier, the spike protein was the obvious immunogen for SARS-CoV-2, informed by two decades of prior coronavirus research going back to SARS-1 and MERS.<a href="https://www.stvincentsspecialneeds.org/about-us/news-press/news-detail?articleId=30034"> Thus, the fun little party story of BioNTech and Moderna having a vaccine candidate within </a><em><a href="https://www.stvincentsspecialneeds.org/about-us/news-press/news-detail?articleId=30034">days</a></em><a href="https://www.stvincentsspecialneeds.org/about-us/news-press/news-detail?articleId=30034"> of receiving the SARS-CoV-2 sequence.</a></p><p>So, mRNA basically hands us our vaccine. Now we just need to deal with the two other bottlenecks: manufacturing scale-up and clinical trials. I think it&#8217;s interesting to discuss how things may be sped up here&#8212;and the arguments for how you&#8217;d speed them up are within the realm of possibility&#8212;but it does lead us off-topic, so I&#8217;ll place those in a very long footnote.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>But remember, what I&#8217;ve described so far is the rosy scenario, where we are dealing with something we already mostly understand. What about things that are wholly new? This includes not only <em>de novo</em> pathogens, but also mostly natural ones that have immune-escaped the established immunogens through either evolutionary or otherwise methods. For these cases, the same CEPI paper admits that things are harder, and that a 200 or 300 day turnaround time should be the goal.</p><p>But is <em>that</em> possible? Remember, now the vaccine design problem becomes quite difficult. Which viral protein subunit do you use as the immunogen? Which conformation elicits neutralizing versus non-neutralizing antibodies? Which epitopes are conserved enough that you&#8217;re not designing a vaccine that will be obsolete by the time it&#8217;s manufactured? These are not easy questions to answer! And if you get them wrong, you waste months manufacturing the wrong thing. The same CEPI paper from earlier optimistically states that immunogen/antigen design for these novel pathogens would take just a few months if we really worked hard at it.</p><p><strong>But it feels like getting to this speed of development would almost certainly require immense technological leaps. </strong>One of my favorite podcast episodes was my interview with a founder of a vaccine development startup: <a href="https://www.youtube.com/watch?v=CHokQ5dMxHQ">Soham Sankaran of PopVax</a>. In it, I ask a lot of questions about why immunogen design for vaccines is so hard, and I will paraphrase his answers in the footnotes<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>. To keep it short: it&#8217;s really, really hard. </p><div id="youtube2-CHokQ5dMxHQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;CHokQ5dMxHQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/CHokQ5dMxHQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Now, the question of the evening: can machine-learning help us with this?</p><p>Probably not. At least not in a significant way anytime soon. ML seems useful in the margins for, say, figuring out how to scaffold specific immunogens of interest such that they are &#8216;correctly&#8217; presented to the immune system, but we are far off from a model being able to reliably respond to a query like &#8216;<em>here is the structure of the virus I am scared of, please design an immunogen that I can encode into an mRNA vaccine that will elicit broadly neutralizing antibodies&#8217;.</em></p><p>At least, that&#8217;s the consensus from everyone I talked to. But if we&#8217;re willing to stretch our brains a little, I think one can imagine a scenario in which ML, as it exists <em>today</em>, may end up being extraordinarily useful for how we respond to pandemics. And it comes down to the fact that mRNA is such a stupidly, insanely versatile platform. You don&#8217;t need to encode an immunogen in the mRNA. <strong>Instead, you could simply encode the antibodies that you&#8217;d </strong><em><strong>want</strong></em><strong> the immunogen to elicit.</strong></p><p><em>What</em>, you may scream, <em>surely you can&#8217;t do that.</em> But you can! <strong>As far back as 2021,<a href="https://www.nature.com/articles/s41591-021-01573-6"> Moderna </a>injected adult humans with an mRNA vaccine that had, as its payload, monoclonal antibodies against the Chikungunya virus. And it worked quite well! </strong>Moderna <a href="https://www.fiercebiotech.com/biotech/molecular-glue-biotech-shutters-after-brutal-last-few-years-early-stage-companies">has since shelved this particular asset</a>, but for reasons that seem more portfolio-optimization-y than the drug not having enough efficacy. Luckily,<a href="https://www.nature.com/articles/s41467-025-65456-x"> there is ongoing work outside Moderna in exploring mRNA-encoded nanobodies</a>, which have the advantage of being far smaller than typical antibodies, so less stressful for our weak, mammalian cells to pump out. And upon looking it up, I have discovered that I am not the first one to find this absurdly relevant to biosecurity efforts! <a href="https://pubmed.ncbi.nlm.nih.gov/41521363/">One 2026 review paper echoes my sentiment</a>, and expands on it: &#8216;<em>mRNA-encoded antibody approaches have been explored in preclinical models of Zika virus, Ebola virus, and rabies, where a single intramuscular dose provided prophylactic and therapeutic benefits in animal models</em>&#8217;.</p><p>Insane, right? Now, you may immediately spot problems with this. For instance: antibodies don&#8217;t last very long in our blood stream, on the order of 2-3 weeks. How useful could this possibly be in a pandemic, where circulating pathogenic material may linger around for months? But fixing this is fully within the realm of possibility. Engineering the Fc region, or the bottom section of the &#8216;Y&#8217; shape of an antibody,<a href="https://www.nature.com/articles/s12276-022-00870-5"> can reliably and dramatically expand its therapeutic window</a>. <strong>In fact, we needn&#8217;t even theorize on this, because the same 2021 Moderna paper </strong><em><strong>also</strong></em><strong> included these Fc mutations: 2 alterations (M428L and N434S), leading to a 69 day half life.</strong> And there is no reason to believe that this cannot be pushed even further,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7894971/"> given that at least one anti-viral antibody has been shown to have a half-life on the order of 5-6 months.</a></p><p>The next question: where will we get useful antibodies from?</p><p>Modern ML methods for designing antibodies against arbitrary targets are not perfect, but they really are quite good. In 2025, the Baker lab published what is, to my knowledge, the most significant result in computational antibody design to date<a href="https://www.nature.com/articles/s41586-025-09721-5">: a fine-tuned version of RFdiffusion</a> that can generate <em>de novo</em> antibodies&#8212;VHHs, scFvs, and full antibodies&#8212;<strong>targeting user-specified epitopes</strong>. Most relevant for us, when the model was given a particular target and epitope&#8212;<em>C. difficile</em> toxin B and a specific epitope that had never had an antibody designed against it&#8212;the model generated moderate-affinity antibodies, with cryo-EM confirming its binding. Now, as I mentioned in the footnotes, binding to a virus is not the same thing as neutralization of a virus, and we usually only care about the latter. I agree that this is a bottleneck that ML cannot easily solve, but it also does not feel like a <em>huge</em> bottleneck, <strong>especially if these models work well</strong>. Consider the fact that binding is necessary, but not sufficient for neutralization, and if you just screen a bunch of binders, all generated for free, surely you can vastly speed up the process of identifying a neutralizing antibody. </p><p>Of course, in the case of a pandemic going on long enough, you could bypass all this by simply fishing out neutralizing antibodies from infected patients, or at least use those as a parent for further ML-driven optimization.</p><p>Our final problem is that pathogens usually mutate, which means that even if we turn every human into a factory of identical antibodies against a particular sequence, those same antibodies may soon become useless due to immune escape. This is why the natural immune response&#8212;as offered by either an immunogen or antigens from the pathogen itself&#8212;can be so efficacious, as the polyclonal antibody repertoire elicited by natural infection or vaccination targets dozens of epitopes simultaneously, making it extraordinarily difficult for the virus to escape all of them at once. This too is not theory:<a href="https://www.contagionlive.com/view/covid-19-mutations-render-all-monoclonal-antibody-treatments-ineffective"> </a><strong><a href="https://www.contagionlive.com/view/covid-19-mutations-render-all-monoclonal-antibody-treatments-ineffective">every single monoclonal antibody therapy authorized against SARS-CoV-2 was eventually rendered obsolete by Omicron and its descendants.</a></strong></p><p>Are we doomed?</p><p>Let&#8217;s not give up, and instead take a closer look at what two issues we need to solve to overcome this obstacle. First, we need to choose not just <em>any</em> neutralizing antibodies for our vaccine, but ones that target sites where escape is costly to the virus, or functionally constrained epitopes where mutations would compromise receptor binding or some other essential function. Second, we need to deploy <em>cocktails</em> of antibodies targeting non-overlapping epitopes, such that the probability of simultaneous escape across all of them becomes vanishingly small.</p><p>I propose to you that there are viable ML-based solutions to both of these.</p><p>For identification of immune-escape-y-epitopes, we can look to<a href="https://www.nature.com/articles/s41586-023-06617-0"> EVEscape</a>, a protein model from the<a href="https://www.deboramarkslab.com/"> Debora Mark&#8217;s lab at Harvard.</a> The model combines evolutionary sequence information with structural and biophysical data to predict, for a given viral protein, which mutations are most likely to emerge <em>and</em> evade existing immunity. <strong>Flip the interpretation and you get the inverse: sites where EVEscape predicts </strong><em><strong>low</strong></em><strong> escape potential are precisely the sites where you want your antibodies to bind, because the virus cannot easily mutate away from them without crippling itself.</strong> This is not a solved problem, but models like these are surely directionally useful, and certainly better than guessing.</p><p>For cocktail design, consider<a href="https://www.biorxiv.org/content/10.1101/2025.05.12.653592v2"> EscapeMap</a>. EscapeMap integrates deep mutational scanning (DMS) data from SARS-CoV-2 across dozens of neutralizing monoclonal antibodies with a generative sequence model to identify something very useful: <strong>negatively correlated escape routes</strong>. Two antibodies have negatively correlated escape if the mutations that evade one tend to make the virus <em>more</em> sensitive to the other. Cocktails built from such pairs are inherently resistant to simultaneous escape, because the virus cannot run from both at once. As published, EscapeMap is SARS-CoV-2-specific; the underlying DMS data took years to generate, and you wouldn&#8217;t have it on day one of a new pandemic. But the framework (should) generalizes to any pandemic and a DMS-esque dataset will emerge if it goes on for long enough, allowing you to eventually design broadly-neutralizing cocktails of antibodies. <strong>If we&#8217;re being especially galaxy-brained, given a sufficiently good protein model, perhaps you don&#8217;t need any DMS data at all!</strong> After designing your de novo antibodies, you could run in-silico DMS to predict how every possible mutation on the target surface would affect binding to each candidate, cross-reference those with EVEscape-style fitness predictions to filter for mutations the virus can actually tolerate, and look for the same negative correlations. I realize this isn&#8217;t <em>full</em>y possible today, that the impact of single-amino-acid substitutions are still badly grasped by these models, and a<a href="https://www.biorxiv.org/content/10.64898/2026.02.25.708002v1.full"> whole host of other failure modes.</a> But the models will only get better.</p><p>When all of this is put together, this pipeline should allow us to do something extraordinary within weeks of a novel pathogen being sequenced:</p><ol><li><p>Discover neutralizing antibodies against them, either via ML or patient serum.</p></li><li><p>Create a cocktail of antibodies with negatively correlated escape routes via in-silico screening or a DMS dataset. </p></li><li><p>Fc-engineer them for a long half-life.</p></li><li><p>Encode the whole thing into mRNA.</p></li><li><p>Manufacture it.</p></li></ol><p>If we do this early enough, and distribute the vaccines fast enough, we could potentially kill the spread of even the most virulent pathogens. Of course, manufacturing is historically the next major bottleneck, but if our wastewater screening and ensuing rapid-responses are quick enough, we may need to manufacture orders of magnitude fewer doses.</p><p><strong>I realize that there are many catches here, and that what I&#8217;ve presented is grossly optimistic. </strong>All of this is a multi-layered, mostly-computational solution, and every one of these layers are error prone<strong>.</strong><a href="https://medium.com/@enginyapici/i-tried-to-poke-holes-in-chai-2s-antibody-design-paper-here-s-what-i-found-7e51f5581c7d"> All antibody generation methods</a> have plenty of failure modes,<a href="https://www.biorxiv.org/content/10.1101/2025.07.31.667864v1.full"> EveScape is not consistently useful across viruses</a> (though<a href="https://www.biorxiv.org/content/10.1101/2025.07.31.667864v1.full"> further lines</a> of research claim to have improved on it), EscapeMap is hyper-focused on SARS-CoV-2 and it may very be that the framework actually cannot easily transfer to new pathogens, and antibody-encoded-into-mRNA is&#8212;for however clever it may sound&#8212;still in its early days of efficacy-and-adverse-effect studying.</p><p>But <em>each</em> one of these are improving, and I think the trend-lines are promising. I am much more optimistic on the value of ML here than in perhaps any other layer of the biosecurity defense workflow, and time will tell how much that optimism is warranted.</p><h2><strong>Pathogen-agnostic defenses are extraordinary. But who pays for it?</strong></h2><p>Finally, the last section. This one will be short.</p><p>Everything discussed so far shares a common architectural assumption: that you know, or can figure out, what you&#8217;re looking for. This is hard! And it is made all the more difficult by the fact that the coordinated effort needed to <em>respond</em> to these discoveries is not something that we&#8217;re historically very good at. <strong>But there is a one category of biosecurity defense that sidesteps this problem entirely, since they work against </strong><em><strong>all</strong></em><strong> pathogens.</strong> And once they are deployed, they largely work for extended periods (months to years!) by themselves, with no logistical effort needed from anybody.</p><p>What are they?<a href="https://blueprintbiosecurity.org/works/far-uvc/"> Far-UVC</a> and<a href="https://blueprintbiosecurity.org/glycol-vapors/"> glycol vapors</a>.</p><p>I&#8217;m going to be honest: the more I looked into this subject, the more I found that every conceivable thing that could be written about it has been, and where it hasn&#8217;t, it&#8217;d require conversations with a lot more people and significantly lengthen this already long essay. So I&#8217;ll defer to other people here. For far-UVC I&#8217;d recommend visiting<a href="https://www.faruvc.org/"> faruvc.org</a> for an introduction, and, if you&#8217;re sufficiently convinced,<a href="https://aerolamp.net/"> aerolamp.net</a> to pick one up for yourself. Glycol vapors have a lot less easy reading material,<a href="https://blueprintbiosecurity.org/glycol-vapors/"> but there is one article published a year back by Blueprint Biosecurity</a>&#8212;a nonprofit who also funds far-UVC work&#8212;and <a href="https://www.jefftk.com/news/airquality">various related articles written on Jeff Kaufman&#8217;s blog</a>, who works in biosecurity.</p><p><strong>To keep it short: If we could tile the interior of enough buildings with these solutions, you could, in theory, render the entire human indoor environment continuously hostile to airborne pathogens</strong>; far-UVC through physical degradation of their DNA, and glycol vapors through (probably) desiccation. This would affect <em>all</em> airborne pathogens. Named ones, unnamed ones, engineered ones, ones that have never existed before and will never exist again except in the brief window between their release and their death to one of these two. And it would do all this with no harm to you. Of course, these technologies still have room to improve, but their problems are mostly ones of logistics, optimization, and scalability.</p><p>So why don&#8217;t we see these far-UVC lamps and glycol vapor fumers in every building in the world? Why aren&#8217;t we sterilizing our air the same way we sterilize our water?</p><p>You could quibble with the details here, about how far-UVC is still very expensive, the evidence base for glycol vapors is still being figured out, and the like. But it&#8217;s tough for me to consider the question of &#8216;<em>why isn&#8217;t this being massively funded</em>&#8217; without concluding that the problem is that there is no for-profit entity that really benefits from it. The benefits of clean air are diffuse, accruing to everyone who breathes in a building, none of whom are the institution writing the check. Hospitals are the one exception, but they are a sliver of all interior environments that humans reside in, and obviously will not offer the scale necessary to put a dent into pandemics. This means that these technologies can only be deployed and studied by a very small group of hobbyists, early adopters, and academic labs.</p><p>Okay, but isn&#8217;t this the point of governments? This is a clear public good! This is territory that is hard to get perfect visibility into, but my instinct is that the evidence base for governmental-buy-in is simply difficult to produce.</p><p>A <a href="https://worksinprogress.co/issue/the-death-rays-that-guard-life/">recent Works In Progress article over far-UVC had this to say:</a></p><blockquote><p><em>Measuring infection control is challenging and seldom undertaken, particularly in public spaces. Epidemiological data is expensive and difficult to gather, and there is currently no way to measure the amount of viable, infectious pathogens in the air in real time. Office attendance can be tracked, but controlling for how users mix outside the office space is immensely difficult, and measuring the real-world effect of small-scale deployments in public areas is almost impossible. Studies aiming to cause deliberate disease transmission in controlled environments have<a href="https://pubmed.ncbi.nlm.nih.gov/32658939/"> failed to work</a> in<a href="https://www.medrxiv.org/content/10.1101/2025.04.28.25326458v1.full"> practice</a> because they have been too small to generate enough infections.</em></p></blockquote><p>While this is a bitter pill, there is a sweet one that it offers us:<strong> </strong>the implementation problems with pathogen-agnostic defenses are extremely &#8216;money-shaped&#8217; in a way that few other biosecurity solutions are<strong>.</strong> All the subject needs is proof, in the form of randomized control trials, in aggregating individual use experiments, in subsidizing institutions to try it out&#8212;more money to push over to the &#8216;<em>this obviously works</em>&#8217; finish-line. <strong>So, if there are any biosecurity-curious philanthropists reading this: I highly encourage you to explore far-UVC or glycol vapors.</strong></p><p>Especially because unlike almost every other type of biosecurity solution we&#8217;ve discussed so far, <strong>these solutions will yield public benefits even in the </strong><em><strong>absence</strong></em><strong> of bioterrorism</strong>. In fact, the same Works In Progress article over far-UVC never even mentions biosecurity, and is focused more on public health, ending with this line: &#8216;<em>Tuberculosis and coronaviruses [may] join typhoid and cholera as tragedies of the past, and seasonal flu and common colds would become rare rather than routine if clean air were as universal and expected as clean water.&#8217;.</em></p><p>It&#8217;s a great pitch, and I am very excited to see more deployment of these technologies in the coming years. It just feels like one of the more obvious areas to push forwards on in this field.</p><h1><strong>Conclusion</strong></h1><p>So, what should you be scared of?</p><p>I can&#8217;t speak for you, but I can say what <em>I&#8217;m</em> scared of. I am scared of a well-funded terrorist organization constructing their own lab, out of which they create natural pathogens&#8212;potentially with a few AI-assisted mutations to allow them to immune-escape existing defenses&#8212;using either split-order attacks or ordering from DNA synthesis companies who don&#8217;t screen. I am scared of these groups spreading it in well-populated cities or farmland. I am scared that it will either kill several million people and/or cause billions in economic damage, and though its spread will be noticed by wastewater screening, it will be months until the necessary resources are allocated to defend against it. And I am scared that all of this will happen within the next few years. </p><p>What am I not scared of? I am not scared of state-actors, because most states have too much to lose by violating the<a href="https://en.wikipedia.org/wiki/Biological_Weapons_Convention"> Biological Weapons Convention</a> and, if they are willing to let loose anyway, I believe they would opt for either easier-to-use-and-control chemical or nuclear weapons instead. I am not scared of people creating extremely engineered pathogens that have capabilities <em>far</em> beyond existing ones&#8212;because the existing ones are already quite good and difficult enough to work with&#8212;especially because even if the AI tools get good enough to make it worth it, I believe the same AI tools will be just as useful in countermeasure design. And yes, I realize &#8216;<em>attack requires one success while defense requires comprehensive coverage</em>&#8217;, but I also believe the swiss-cheese security model will prevail. Finally, I am not scared of individual actors, because the economics of bioweapons production likely do not work in their favor. Yes, they can rent upstream services&#8212;virus production, purification&#8212;but the downstream weaponization work requires custom protocols that CROs have no economic incentive to develop. Moreover, given that the weaponization will almost definitely be a bespoke, hands-on R&amp;D project and not one that is easily automated, it feels unlikely that nobody at the CRO will raise an eyebrow.</p><p>That&#8217;s my threat model at least. I realize it has holes. For example, it may be the case that state actors <em>are</em> worth worrying about, entirely because the appeal of bioweapons is that you deploy them with plausible deniability. Hard to do that with a nuke! You may also accuse me of not paying close enough attention to the trendlines, and that maybe I am correct about the 2026 threats, but not the 2030 ones, so perhaps a disgruntled salaryman will really be able to someday easily design mega-Ebola to depopulate the planet. Maybe!</p><p>Ultimately, you can get infinitely paranoid about biosecurity if you really want to, or you can assume Nothing Ever Happens, and I think where I have landed is a comfy middle ground. I am grateful that there exist people who work in biosecurity who <em>are </em>infinitely paranoid, and through writing this essay, I have become far more sympathetic to their viewpoint.</p><p>To end this off: in all my conversations, everyone generally agreed that an honest-to-god, bioterrorist attack is unlikely. It is a low probability event. But low probability events with civilizational consequences are still worth preparing for. The heartening thing here is the bottleneck to preparation here is almost entirely institutional, economic, and coordinative, not scientific. The disheartening thing is that fixing these ultimately requires political will, and sans a catalyzing event to unlock it, that political will does not currently exist. Of course, one could argue that perhaps we will never need it, that the Pathogen that people in this space are breathlessly building defenses against will never arrive, that it is all paranoia, tech-rotted minds coming up with entirely hallucinated demons. But that argument feels far less convincing now than before I started writing this essay, and, if I did my job right, I hope it will feel less convincing to you, too.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Where are we at with manufacturing-maxxing? There are certainly more mRNA production facilities around. Moderna brought three new plants online in 2025 in the UK, Australia, and Canada. BioNTech has deployed modular, containerized manufacturing units called BioNTainers to Rwanda, the first mRNA plant on the African continent. But mRNA production is really, really complicated, and there&#8217;s all sorts of weird bottlenecks that can arise in its creation. If you&#8217;re curious to learn more here&#8212;since this is a surprisingly deep subject that could be its own essay&#8212; there are two really incredible articles over the whole logistical apparatus that goes into making one of these drugs:<a href="https://blog.jonasneubert.com/2021/01/10/exploring-the-supply-chain-of-the-pfizer-biontech-and-moderna-covid-19-vaccines/"> </a><em><a href="https://blog.jonasneubert.com/2021/01/10/exploring-the-supply-chain-of-the-pfizer-biontech-and-moderna-covid-19-vaccines/">&#8216;Exploring the Supply Chain of the Pfizer/BioNTech and Moderna COVID-19 vaccines&#8217;</a> </em>and &#8216;<em><a href="https://arxiv.org/html/2602.08988">Analyzing Vaccine Manufacturing Supply Chain Disruptions for Pandemic Preparedness using Discrete-Event Simulation</a></em>&#8217;. The short version is that the specialty raw materials <em>and</em> quality-control personnel needed to actually produce + release vaccines at pandemic scale are in short supply, and, as far as I can tell, continue to remain in short supply. People are working to change this though!</p><p>How about reducing the clinical trial bottleneck?<a href="https://www.mdpi.com/2076-393X/13/8/849"> The paper over the CEPI 100 Day mission has a fun approach to it</a>: just immediately chuck the vaccine into a phase 2b/3 trial. Of course, caveat on those only being a COVID-y situation: known pathogens, available safety data from similar therapeutics, and the like. The trials you run could also be challenge trials, as in, deliberately infecting vaccinated volunteers with a pathogen in a controlled setting, allowing you to immediately observe efficacy of the vaccine (which is, surprisingly,<a href="https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(23)00294-3/fulltext"> a historically safe thing to do</a>).</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><em>Can&#8217;t you just fragment a bacteria or virus into a soup of proteins, and inject <strong>that</strong> alongside an adjuvant? </em>This is not terribly dissimilar to how traditional vaccines function, which is to say: this may work, but you&#8217;d forgo all the advantages of speed advantages of mRNA, and speed is ultimately what we need most here.</p><p><em>Okay, forget fragmentation. Can&#8217;t you identify conserved regions of a virus, and just use those fragments in your vaccine? </em>Sure, and maybe it&#8217;ll work. But maybe it&#8217;ll also massively backfire, and you&#8217;ll up giving your patient antibody-dependent enhancement, or ADE: antibodies that bind tightly to sections of pathogen, but don&#8217;t neutralize it in any meaningful way, crowding out the antibodies that would actually help.<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4783420/"> ADE actually happened for the RSV vaccine: injecting native proteins from the virus made the disease </a><em><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4783420/">worse</a></em><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4783420/">.</a> It took a structural biology breakthrough to get it to work:<a href="https://www.nature.com/articles/s41467-025-63084-z"> using the prefusion conformation of the RSV protein in the vaccine</a>. Crazily, the same conformation trick, by the same guy (<a href="https://molecularbiosci.utexas.edu/directory/jason-mclellan">Jason McClellan)</a>, is what made the COVID-19 spike protein work as an immunogen.</p><p><em>But if we know which antibody we want, which we can grab from patients who naturally recover from the disease, can&#8217;t we just work backwards and find the immunogen that elicits it?</em> Perhaps! But did you know that there are patients with HIV who somehow have gained antibodies against the disease? <a href="https://www.aidsmap.com/about-hiv/faq/what-elite-controller">They are called &#8216;elite controllers&#8217;,</a> making up 0.5% of all HIV patients, and <strong>despite knowing exactly what antibodies these patients have, it has been a struggle to convert this finding to a vaccine.</strong> The path from immunogen to mature antibody involves cascading rounds of somatic hypermutation, cross-reactive antibody-antibody interactions, and a network of immune signaling that cannot be reliably predicted from binding data alone. In fact, from Soham&#8217;s perspective, it isn&#8217;t terribly hard to find an antibody that can neutralize a vaccine. What is hard is understanding which immunogen can reliably cause those antibodies to be elicited, and that process is almost entirely a trial-and-error process. Worse of all, it may be the case that <strong>some patients genuinely lack the immune repertoire necessary for those antibodies to </strong><em><strong>ever</strong></em><strong> be elicited.</strong></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Heuristics for lab robotics, and where its future may go ]]></title><description><![CDATA[8.4k words, 38 minutes reading time]]></description><link>https://www.owlposting.com/p/heuristics-for-lab-robotics-and-where</link><guid isPermaLink="false">https://www.owlposting.com/p/heuristics-for-lab-robotics-and-where</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 09 Feb 2026 12:42:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!S1wJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S1wJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S1wJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S1wJ!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7128718,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/184997794?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!S1wJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!S1wJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F448a6836-96f8-4631-a6f0-6207dd670dc6_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: this article required conversations with a lot of people. A (hopefully) exhaustive, randomized list of everyone whose thoughts contributed to the article: <a href="https://www.linkedin.com/in/lachlan-munro/">Lachlan Munroe</a> (Head of Automation at <a href="https://www.linkedin.com/company/dtubiosustain/">DTU Biosustain</a>), <a href="https://science.xyz/company/team/max-hodak/">Max Hodak </a>(CEO of <a href="https://science.xyz/">Science</a>, former founder of <a href="https://www.ycombinator.com/companies/transcriptic">Transcriptic</a>), <a href="https://www.linkedin.com/in/djkleinbaum/">D.J. Kleinbaum</a> (CEO of <a href="https://www.emeraldcloudlab.com/">Emerald Cloud Labs</a>), <a href="https://keonigandall.com/">Keoni Gandall </a>(former founder of <a href="https://www.trilo.bio/">Trilobio</a>), <a href="https://www.linkedin.com/in/cristian-ponce5/">Cristian Ponce</a> (CEO of <a href="https://tetsuwan.com/">Tetsuwan Scientific</a>), <a href="https://www.linkedin.com/in/bronte-kolar/">Bront&#235; Kolar</a> (CEO of <a href="https://www.zeonsystems.ai/">Zeon Systems</a>), <a href="https://www.linkedin.com/in/jrkelly2/">Jason Kelly</a> (CEO of <a href="https://www.ginkgo.bio/">Ginkgo Bioworks</a>), <a href="https://www.linkedin.com/in/junaxup/">Jun Axup Penman</a> (COO of <a href="https://www.e11.bio/">E11 Bio</a>), <a href="https://nishy.business/">Nish Bhat</a> (current VC, ex-<a href="https://www.color.com/">Color</a> cofounder), <a href="https://www.linkedin.com/in/amulya-garimella-5b408a1b4/">Amulya Garimella</a> (MIT PhD student), <a href="https://www.linkedin.com/in/shelbynewsad/">Shelby Newsad</a> (VC at <a href="https://www.compound.vc/">Compound</a>), <a href="https://www.linkedin.com/in/amichlee/">Michelle Lee</a> (CEO of <a href="https://www.medra.ai/about">Medra</a>), <a href="https://www.linkedin.com/in/charlesxjyang/">Charles Yang</a> (Fellow at <a href="https://www.renaissancephilanthropy.org/">Renaissance Philanthropy</a>), <a href="https://www.linkedin.com/in/chasearmer/">Chase Armer</a> (Columbia PhD student), <a href="https://www.linkedin.com/in/ben-ray-410076b7/">Ben Ray</a> (current founder, ex-<a href="https://www.retro.bio/">Retro Biosciences</a> automation engineer), and <a href="http://linkedin.com/in/jacobfeala">Jake Feala</a> (startup creation at <a href="https://www.flagshippioneering.com/">Flagship Pioneering</a>).</em></p><div><hr></div><ol><li><p><a href="https://www.owlposting.com/i/184997794/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/184997794/heuristics-for-lab-robotics">Heuristics for lab robotics</a></p><ol><li><p><a href="https://www.owlposting.com/i/184997794/there-are-box-robots-and-there-are-arm-robots">There are box robots, and there are arm robots</a></p></li><li><p><a href="https://www.owlposting.com/i/184997794/most-lab-protocols-can-be-automated-they-just-often-arent-worth-automating">Most lab protocols </a><em><a href="https://www.owlposting.com/i/184997794/most-lab-protocols-can-be-automated-they-just-often-arent-worth-automating">can</a></em><a href="https://www.owlposting.com/i/184997794/most-lab-protocols-can-be-automated-they-just-often-arent-worth-automating"> be automated, they just often aren&#8217;t worth automating</a></p></li><li><p><a href="https://www.owlposting.com/i/184997794/you-can-improve-lab-robotics-by-improving-the-translation-layer-the-hardware-layer-or-the-intelligence-layer">You can improve lab robotics by improving the translation layer, the hardware layer, or the intelligence layer</a></p></li><li><p><a href="https://www.owlposting.com/i/184997794/all-roads-lead-to-transcriptic">All roads lead to Transcriptic</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/184997794/conclusion">Conclusion</a></p></li></ol><h1><strong>Introduction</strong></h1><p>I have never worked in a wet lab. The closest I&#8217;ve come to it was during my first semester of undergrad, when I spent 4 months in a neurostimulation group. Every morning at 9AM, I would wake up, walk to the lab, and jam a wire into a surgically implanted port on a rat&#8217;s brain, which was connected to a ring of metal wrapped around its vagus nerve, and deposit it into a <a href="https://en.wikipedia.org/wiki/Operant_conditioning_chamber">Skinner box</a>, where the creature was forced to discriminate between a dozen different sounds for several hours while the aforementioned nerve was zapped. This, allegedly, was not painful to the rat, but they did not seem pleased with their situation. My tenure at the lab officially ended when an unusually squirmy rat ripped the whole port system out of its skull while I was trying to plug it in.</p><p>Despite how horrible the experience was, I cannot in good conscience equate it to True wet lab work, since my experience taught me none of the lingo regularly employed on the <a href="https://www.reddit.com/r/labrats/">r/labrats</a> subreddit.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7PPH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7PPH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7PPH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg" width="1456" height="293" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:293,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7PPH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7PPH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2270419d-f1c5-4849-ade3-ed755b19a518_1456x293.jpeg 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>I mention my lack of background context entirely because it has had some unfortunate consequences on my ability to understand the broader field of lab automation. Specifically, that it is incredibly easy for me to get taken for a ride.</p><p>This is not true for many other areas of biology. I have, by now, built some of the mental scaffolding necessary for me to reject the more grandiose claims spouted by people in <a href="https://www.owlposting.com/p/questions-to-ponder-when-evaluating">neurotechnology</a>, <a href="https://www.owlposting.com/p/a-primer-on-why-computational-predictive">toxicology prediction</a>, <a href="https://www.owlposting.com/p/an-ml-drug-discovery-startup-trying">small molecule benchmarks</a>, and more. But lab robotics eludes me, because to understand lab robotics, you need to understand what <em>actually</em> happens in a lab&#8212;the literal physical movements and the way the instruments are handled and how materials are stored and everything else&#8212;and I do not <em>actually</em> understand what happens in a lab.</p><p>Without this embodied knowledge, I am essentially a rube at a county fair, dazzled by any carnival barker who promises me that their magic box can do everything and anything. People show me robots whirling around, and immediately my eyes fill up with light, my mouth agape. To my credit, I recognize that I am a rube. So, despite how impressive it all <em>looks</em>, I have shied away from offering my own opinion on it.</p><p>This essay is my attempt to fix this, and to provide to you an explanation of the heuristics I have gained from talking to many people in this space. It isn&#8217;t comprehensive! But it does cover at least some of the dominant strains of thought I see roaming around in the domain experts of the world.</p><h1><strong>Heuristics for lab robotics</strong></h1><h2><strong>There are box robots, and there are arm robots</strong></h2><p>This is going to be an obvious section, but there is some groundwork I&#8217;d like to lay for myself to refer back to throughout the rest of this essay. You can safely skip this part if you are already vaguely familiar with lab automation as a field.</p><p>In the world of automation, there exist boxes. Boxes have been around for a very, very long time and could be considered &#8216;mature technology&#8217;. Our ancient ancestors relied on them heavily, and they have become a staple of many, many labs.</p><p>For one example of a box, consider a &#8216;<a href="https://en.wikipedia.org/wiki/Liquid_handling_robot">liquid handler</a>&#8217;. The purpose of a liquid handler is to move liquid from one place to another. It is meant to take 2 microliters from this tube and put it in that well, and then to take 50 microliters from these 96 wells and distribute them across those 384 wells, and to do this fourteen-thousand times perfectly, which is something that humans eventually get bored with doing manually. They must be programmed for each of these tasks, which is a bit of a pain, but once the script is written, it can run forever, (mostly) flawlessly.</p><p>Here is an image of a liquid handler you may find in a few labs, a $40,000-$100,000 machine colloquially referred to as a &#8216;<a href="https://www.hamiltoncompany.com/automated-liquid-handling?srsltid=AfmBOor8C4KXQvDt0aBCJoIFQ76Yfz1xlZ7ldsWQ1seW2N5nuySOoZaO">Hamilton</a>&#8217;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0elK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0elK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 424w, https://substackcdn.com/image/fetch/$s_!0elK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 848w, https://substackcdn.com/image/fetch/$s_!0elK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 1272w, https://substackcdn.com/image/fetch/$s_!0elK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0elK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png" width="600" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:600,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Automated Liquid Handling | Hamilton Liquid Handling Platforms&quot;,&quot;title&quot;:&quot;Automated Liquid Handling | Hamilton Liquid Handling Platforms&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Automated Liquid Handling | Hamilton Liquid Handling Platforms" title="Automated Liquid Handling | Hamilton Liquid Handling Platforms" srcset="https://substackcdn.com/image/fetch/$s_!0elK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 424w, https://substackcdn.com/image/fetch/$s_!0elK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 848w, https://substackcdn.com/image/fetch/$s_!0elK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 1272w, https://substackcdn.com/image/fetch/$s_!0elK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe11f3f3-6c9f-4160-96f0-e5f5bcf952ac_600x400.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Why do this at all? Liquids are awfully important in biology. Consider a simple drug screening experiment: you have a library of 10,000 compounds, and you want to know which ones kill cancer cells. Each compound needs to be added to a well containing cells, at multiple concentrations, let&#8217;s say eight concentrations per compound to generate a dose-response curve. That&#8217;s 80,000 wells. Each well needs to receive exactly between 1 and 8 microliters of compound solution, then incubate for 48 hours, then receive 10 microliters of a viability reagent (something to measure if a cell is alive or dead), then incubate for another 4 hours, then get read by a plate reader. If you pipette 11 microliters into well number 47,832, your dose-response curve for that compound is wrong, and you might advance a false positive, or even worse, miss a drug candidate.</p><p>Difficult! Hence why automation may be useful here.</p><p>Many other types of boxes exist. Autostainers for immunohistochemistry, which take tissue sections and run them through precisely timed washes and antibody incubations that would otherwise require a grad student to stand at a bench for six hours. Plate readers, often used within liquid handlers, measure absorbance or fluorescence or luminescence across hundreds of wells. And so on.</p><p>Boxes, which can contain boxes within themselves, represent a clean slice of a lab workflow, a cross-section of something that could be parameterized&#8212;that is, the explicit definition of the space of acceptable inputs, the steps, the tolerances, and the failure modes of a particular wet-lab task. <strong>Around this careful delineation, a box was constructed, and only this explicit parameterization may run within the box.</strong> And many companies create boxes! There are Hamiltons, created by a company called Hamilton, but there are boxes made by<a href="https://www.beckmancoulter.com/"> Beckman Coulter</a>,<a href="https://www.tecan.com/"> Tecan</a>,<a href="https://www.agilent.com/en?srsltid=AfmBOooeZ-hEg3ZZsx49NVF3AuHBxn9rFQYdPGxcGJsMzy0fgfmgNg5k"> Agilent</a>,<a href="https://www.thermofisher.com/us/en/home.html"> Thermo Fisher</a>,<a href="https://opentrons.com/"> Opentrons</a>, and likely many others; which is all to say, the box ecosystem is mature, consolidated, and deeply boring.</p><p>But for all the hours saved by boxes, there is a problem with them. And it is the unfortunate fact that they, ultimately, are closed off from the rest of the universe. A liquid handler does not know that an incubator exists, a plate reader has no concept of where the plates it reads come from. Each box is an island, a blind idiot, entirely unaware of its immediate surroundings.</p><p>This is all well and good, but much like how<a href="https://www.ebsco.com/research-starters/economics/baumols-cost-disease"> Baumol&#8217;s cost disease</a> dictates that the productivity of a symphony orchestra is bottlenecked by the parts that cannot be automated&#8212;you cannot play a Beethoven string quartet any faster than Beethoven intended, no matter how efficient your ticketing system becomes&#8212; similarly, the productivity of an &#8216;automated lab&#8217; is bottlenecked by the parts that remain manual. A Hamilton can pipette at superhuman speed, but if a grad student still has to walk the plate from the Hamilton to the incubator, the lab&#8217;s throughput is limited by how fast the grad student can walk. An actual experiment is not a single box, but a <em>sequence</em> of boxes, and someone or something must move material between them.</p><p>Now, you could add in extra parts to the box, infinitely expanding it to the size of a small building, but entering Rube-Goldberg-territory has some issues, in that you have created a new system whose failure modes are the combinatorial explosion of every individual box&#8217;s failure modes.</p><p>A brilliant idea may occur to you: could we connect the boxes? This way, each box remains at least somewhat independent. How could the connection occur? Perhaps link them together with some kind of robotic intermediary&#8212;a mechanical grad student&#8212;that shuttles plates from one island to the next, opening doors and loading decks and doing all the mindless physical labor? And you know, if you really think about it, the whole grad student is not needed. Their torso and legs and head are all extraneous to the task at hand. Perhaps all we need are their arms, severed cleanly at the shoulder, mounted on a rail, and programmed to do the repetitive physical tasks that constitute the majority of logistical lab work.</p><p>And with this, we have independently invented the &#8216;arm&#8217; line of lab robotics research. This has its own terminology: when you connect multiple boxes together with arm(s) and some scheduling software, the resulting system is often called a &#8220;workcell.&#8221;</p><p>As it turns out, while only one field benefits from stuff like liquid handlers existing&#8212;the life-sciences&#8212;a great deal of other fields also have a need for arms. So, while the onus has been on our field to develop boxes, arms benefit from the combined R&amp;D efforts of automotive manufacturing, warehouse logistics, semiconductor fabs, food processing, and any other industry where the task is &#8220;<em>pick up thing, move thing, put down thing</em>.&#8221; This is good news! It means the underlying hardware&#8212;the motors, the joints, the control systems&#8212;is being refined at a scale and pace that the life sciences alone could never justify.</p><p>Let&#8217;s consider one arm that is used fairly often in the lab automation space: the UR5, made by a company called <a href="https://www.universal-robots.com/">Universal Robots</a>. It has six degrees of freedom, a reach of about 850 millimeters, a payload capacity of five kilograms, and costs somewhere in the range of $25,000 to $35,000. Here is a picture of one:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fE3o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fE3o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 424w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 848w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fE3o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png" width="1089" height="1200" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1089,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fE3o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 424w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 848w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!fE3o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e3bb-b63e-4207-8fdf-6b43a5137682_1089x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Upon giving this arm the grippers necessary to hold a pipette, to pick up a plate, and to click buttons, as well as the ability to switch between them, your mind may go wild with imagination. </p><p>What could such a machine do?</p><p>Arms within boxes? Wheels to the platform that the robot is mounted upon, allowing it to work with multiple boxes at once? So much is possible! You could have it roll up to an incubator, open the door, retrieve a plate, wheel over to the liquid handler, load it, wait for the protocol to finish, unload it, wheel over to the plate reader, and so on, all night long, while you sleep and dream. This is the future, made manifest.</p><p>Well, maybe. If this were all true, why are there humans in a lab at all? Why haven&#8217;t we outsourced everything to these cute robotic arms and a bunch of boxes?</p><h2><strong>Most lab protocols </strong><em><strong>can</strong></em><strong> be automated, they just often aren&#8217;t worth automating</strong></h2><p>If you were to speak to LLM&#8217;s about the subject of lab robotics, you will find that they are pretty pessimistic on the whole business, mostly because of how annoying the underlying machines are to use. I believed them! Especially because it does match up with what I&#8217;ve seen. For example, there is a somewhat funny phenomenon that has repeated across the labs of the heavily-funded biology startups I&#8217;ve visited: they have some immense liquid handler box lying around, I express amazement at how cool those things are, and my tour guide shrugs and says nobody really uses that thing.</p><p>But as was the case in an<a href="https://www.owlposting.com/p/what-happened-to-pathology-ai-companies?open=false#%C2%A7the-death-of-traditional-pathology-was-greatly-exaggerated"> earlier essay I wrote about why pathologists are loathe to use digital pathology software</a>, the truth is a bit complicated.</p><p>First, I will explain, over the course of a very large paragraph, what it means to <em>work</em> with a liquid handler. You can skip it if you already understand it.</p><p>First, you must define your protocol. This involves specifying every single operation you want the machine to perform: aspirate 5 microliters from position A1, move to position B1, dispense, return for the next tip. If you are using Hamilton&#8217;s Venus software, you pipette from <em>seq_source</em> to <em>seq_destination, </em>and you must do something akin to this for every container in your system. Second, you must define your liquid classes. A liquid class is a set of parameters that tells the robot how to physically handle a particular liquid: the aspiration speed, the dispense speed, the delay after aspiration, the blow-out volume, the retract speed, and a dozen other settings that must be tuned to the specific rheological properties of whatever you&#8217;re pipetting. Water is easy, glycerol is apparently really hard, and you will discover where your specific liquid lies on this spectrum as you go through the extremely trial-and-error testing process. Third, and finally, you must deal with the actual physical setup. The deck layout must be defined precisely. Every plate, reservoir, and tip rack must be assigned to a specific position, and those positions must match reality. The dimensions of the wells, the height of the rim, the volume all must be accurately detailed in the software. If you&#8217;re using plates from a different supplier than the one in the default library, you may need to create custom labware definitions. </p><p>And at any point, the machine may still fail, because a pipette tip failed to be picked up, the liquid detection meter threw a false negative, something clogged, or whatever else.</p><p>To help you navigate this perilous journey, Hamilton, in their infinite grace, offers seminars to teach you how to use this machine, and<a href="https://www.hamiltoncompany.com/services"> it only costs between 3,500 and 5,000 dollars.</a></p><p>And<a href="https://www.reddit.com/r/biotech/comments/145qiu4/comment/jnnvphl/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button"> here&#8217;s a Reddit post with some more details:</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LH7s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LH7s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 424w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 848w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 1272w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LH7s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png" width="988" height="796" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:796,&quot;width&quot;:988,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LH7s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 424w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 848w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 1272w, https://substackcdn.com/image/fetch/$s_!LH7s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5ad7fd-0de8-4304-9f30-f677908e3694_988x796.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, yes, this is annoying, especially if you compare it with manual pipetting. There, a trained researcher picks up a pipette, aspirates the liquid, watches it enter the tip, adjusts instinctively if something seems off, dispenses into the destination well, and moves on. The whole operation takes perhaps fifteen seconds. Perhaps the researcher gets really bored with this and doesn&#8217;t move particularly fast, but if you assemble enough of them together and call it graduate school or an RA position, you can scale things up quite a bit without needing to touch a robot at all. Oftentimes, that may not only be the more efficacious option, but also the cheaper one.</p><p><strong>But there is a very interesting nuance here: if the task is worth automating, it actually isn&#8217;t that big of a deal to automate.</strong></p><p>From talking to automation engineers, there is a distinct feeling I get that if we had an infinite number of them (and scientists to let them know their requirements) worming through our labs, there is a very real possibility that nearly everything in an average wet-lab could be automated. After all, there are centrifuges, incubators, and so on that are all automation compatible! <strong>And the engineers I talked to don&#8217;t actually mind the finicky process of tuning their boxes and arms </strong><em><strong>that</strong></em><strong> much</strong>. Yes, dialing in a protocol can be tough, but it is often a &#8216;<em>solvable over a few hours</em>&#8217; problem. In the edge case of dealing with genuinely strange protocols that bear little resemblance to what the automation engineer has seen before, it could take perhaps weeks, but that&#8217;s it.</p><p>So what&#8217;s the problem?</p><p><strong>Most protocols simply aren&#8217;t run enough times to justify the upfront investment.</strong></p><p>Let&#8217;s assume it takes an automation engineer forty hours to fully dial in a new protocol, which is a reasonable estimate for something moderately complex. At a loaded cost of, say, $100 per hour for the engineer&#8217;s time, that&#8217;s $4,000 just to get the thing working. Now, if you&#8217;re going to run this protocol the <strong>exact</strong> same way ten thousand times over the next month, that $4,000 amortizes to forty cents per run. Trivial! Also, it&#8217;d probably be nearly impossible to do via human labor alone anyway, so automate away. But if you&#8217;re going to run it fifty times? That&#8217;s $80 per run in setup costs alone, and then you might as well just have a human do it.</p><p>This is, obviously, an immense oversimplification. Even if a wet-lab task could be &#8216;automated&#8217;, most boxes/arms still need to be babysat a <em>little</em> bit. But still! The split between robot-easy problems and robot-hard problems&#8212;in the eyes of automation engineers&#8212;has a lot less to do with specific motions/actions/protocols, and a <strong>lot more to do with &#8216;</strong><em><strong>I will run this many times&#8217;</strong></em><strong> versus </strong><em><strong>&#8216;I will run this once&#8217;.</strong></em></p><p>And most protocols in most labs fall into the latter category. <strong>Research is, by its nature, exploratory</strong>. You run an experiment, you look at the results, you realize you need to change something, you run a different experiment. Some labs do indeed run their work in a very &#8216;<em>robot-shaped</em>&#8217; way, where the bulk of their work is literally just &#8216;<em>screening X against Y</em>&#8217;, and writing a paper about the results. They can happily automate everything, because even if some small thing about their work changes, it&#8217;s all roughly similar enough to, say, whatever their prior assumptions on liquid classes in their liquid handler was.</p><p>But plenty of groups do not operate this way, maybe because they are doing such a vast variety of different experiments, or because their work is iterative and the protocol they&#8217;re running this week bears only passing resemblance to the protocol they&#8217;ll be running next week, or some other reason.</p><p>So, how do you improve this? How can we arrive at an automation-maxed world?</p><h2>You can improve lab robotics by improving the translation layer, the hardware layer, or the intelligence layer</h2><p>The answer is very obvious to those working in the space: <strong>we must reduce the activation energy needed to interface with robotic systems.</strong> But, while everybody seems to mostly agree with this, people differ in their theory of change of how such a future may come about. After talking to a dozen-plus people, there seem to be three ideological camps, each proposing their own solution.</p><p>But before moving on, I&#8217;d like to preemptively clarify something. To help explain each of the ideologies, I will name companies that feel like they fall underneath that ideology, and <em>those</em> categorizations are slightly<em> </em>tortured. In truth, they all slightly merge and mix and wobble into one another. While they seem philosophically aligned in the camp I put them in, you should remember that I am really trying to overlay a clean map on a very messy territory.</p><p><strong>The first camp is the simplest fix: create better translation layers between what the human wants and what the machine is capable of doing.</strong> In other words, being able to automatically convert a protocol made for an intelligent human with hands and eyes and common sense, into something that a very nimble, but very dumb, robot can conceivably do. In other words, the automation engineer needn&#8217;t spend forty hours figuring this out, but maybe an hour, or maybe even just a minute. </p><p>This is an opinion shared by three startups of note:<a href="https://www.synthace.com/"> Synthace</a>,<a href="https://briefly.bio/"> Briefly Bio</a>, and<a href="https://tetsuwan.com/"> Tetsuwan Scientific</a>.</p><p><strong><a href="https://www.synthace.com/">Synthace</a></strong>, founded in London in 2011, was perhaps the earliest to take this seriously. They built Antha, which is essentially device-agnostic programming language, which is to say, a protocol written in Antha runs on a Hamilton or a Tecan or a Gilson without modification, because the abstraction layer handles the translation. You drag and drop your workflow together, the system figures out the liquid classes and deck layouts, and you go home while the robot pipettes.</p><p><strong><a href="https://briefly.bio/">Briefly Bio</a></strong>, which launched in mid-2024 and<a href="https://brieflybio.substack.com/"> has perhaps one of the best and least-known-about blogs I&#8217;ve seen from a startup</a>, initially started not as a translation layer between the scientist and the robot, but between the scientist and the automation engineer. Their software uses LLMs to convert the natural-language protocols that scientists can write&#8212;with all their implicit assumptions and missing parameters and things-that-must-be-filled-in&#8212;into structured, consistent formats that an automation team can work with. But since then, the team has expanded their purview to allow these auto-generated protocols (and edits made upon them) to be directly run on arbitrary boxes and arms, alongside a validation check to ensure that the protocol is actually physically possible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7kjS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7kjS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7kjS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg" width="1456" height="741" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7kjS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7kjS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9621c91-c1e4-4171-b89e-634c19834c9e_1456x741.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong><a href="https://tetsuwan.com/">Tetsuwan</a> </strong>is the newest of the trio, announced at the end of 2024, and operates at a higher level of abstraction than Briefly and Synthace. Users do not write commands for transfers between plates, instead, they define experiments via describing high level actions such as combining reagents or applying transformations like centrifugations. Then they specify what their samples, variables, conditions and controls are for that specific run. From this intent-level description, Tetsuwan fully compiles to robot-ready code, automatically making all the difficult downstream process engineering decisions including mastermixing, volume scaling, dead volume, plate layouts, labware, scheduling, and liquid handling strategies. The result of this is fully editable by the scientist overseeing the process, allowing them to specify their preferences on costs, speed, and accuracy trade-offs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XZol!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XZol!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 424w, https://substackcdn.com/image/fetch/$s_!XZol!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 848w, https://substackcdn.com/image/fetch/$s_!XZol!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 1272w, https://substackcdn.com/image/fetch/$s_!XZol!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XZol!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png" width="1354" height="532" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:532,&quot;width&quot;:1354,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XZol!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 424w, https://substackcdn.com/image/fetch/$s_!XZol!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 848w, https://substackcdn.com/image/fetch/$s_!XZol!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 1272w, https://substackcdn.com/image/fetch/$s_!XZol!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf79e9e8-de32-43b2-84e2-19ae0b885c39_1354x532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And that&#8217;s the first camp.</p><p><strong>The second camp also admits that the translation layer must be improved, but believes that physical infrastructure will be an important part of that. </strong>This is a strange category, because I don&#8217;t view this camp as building fundamentally novel boxes or arms, like, say, <a href="https://www.unicornb.io/">Unicorn Bio</a>, but rather building out the physically tangible [stuff] that stitches existing boxes and arms together into something greater than the sum of their parts.</p><p>The ethos of this philosophy can be best viewed by what two particular companies have built:<a href="https://www.automata.tech/"> Automata</a> and<a href="https://www.ginkgo.bio/"> Ginkgo Bioworks</a>.</p><p><strong>Automata</strong> is slightly confusing, but here is my best attempt to explain what they do: they are a vertically-integrated-lab-automation-platform-consisting-of-modular-robotic-benches-and-a-scheduling-engine-and-a-data-backend-as-a-service business. They also call the physical implementation of this service the &#8216;LINQ bench&#8217;, and it is designed to mirror the size and shape of traditional lab benches, such that it can be dropped into existing lab spaces without major renovation. It robotically connects instruments using a robot arm and a transport layer, with them building a magnetic levitation system for high-speed multi-directional transport of plates across the bench. And the software onboard these systems handles workflow creation, scheduling, error handling, and data management. <a href="https://www.automata.tech/case-studies/how-we-made-a-6-hour-cell-culture-assay-into-a-70-minute-process">I found one of their case studies here quite enlightening</a> in figuring out what exactly they do for their clients. </p><p><strong>And of course, Ginkgo</strong>. Yes, Ginkgo is a mild memetic allergen to those familiar with its prior history, but I do encourage you to watch their<a href="https://www.youtube.com/watch?v=KkS58gonQAc"> 2026 JPM presentation over their recent push into automation</a>. It&#8217;s quite good! The purpose of the presentation is to push Ginkgo&#8217;s lab automation solution&#8212;<a href="https://www.ginkgo.bio/product/hardware">Reconfigurable Automation Carts</a>, or RAC&#8217;s&#8212;but serves as a decent view into the pain points of building better lab automation. What are RAC&#8217;s anyway? Basically a big, modular, standardized cart that can have boxes (+other things) inserted in, and has an arm directly installed:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!epZ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!epZ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 424w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 848w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!epZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg" width="1456" height="861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:861,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!epZ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 424w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 848w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!epZ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F777adcd9-071c-470f-b3be-ea4b9ff0f106_1456x861.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There is software that comes onboard to help you use the machines (<a href="https://www.ginkgo.bio/product/software">Catalyst</a>), but their primary focus seems to be hardware-centric. </p><p>This is Ginkgo&#8217;s primary automation play, though both the RAC&#8217;s and scheduling software were really <a href="https://medium.com/@ZymergenTechBlog/the-case-for-modular-lab-automation-c34f214e1276">first created by Zymergen</a>, who Ginkgo acquired. And, just the other day, they demonstrated this hardware-centricity by <a href="https://openai.com/index/gpt-5-lowers-protein-synthesis-cost/">partnering with OpenAI to run an autonomous lab experiment</a>: 36,000 conditions across six iterative cycles, optimizing cell-free protein synthesis costs. </p><p>Moreover, the RAC&#8217;s each include a transport track, making it so they can be daisy-chained together in case you need multiple instruments to run your particular experiment.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ef7o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ef7o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ef7o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg" width="1456" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ef7o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ef7o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6c39c58e-071b-46e9-93b6-e2514a2b317f_1456x805.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And that&#8217;s the second camp.</p><p><strong>The third and final camp believes the future lies in augmenting the existing systems with a greater degree of intelligence. </strong>This differs from the translation camp in that the translation camp is primarily concerned with the <em>input</em> side of the problem&#8212;converting human intent into robot-legible instructions before execution begins&#8212;while the intelligence camp is concerned with what happens <em>during</em> execution.</p><p>This is the newest group, and there are two companies here that feel most relevant:<a href="https://www.medra.ai/"> Medra</a> and<a href="https://www.zeonsystems.ai/"> Zeon Systems</a>.</p><p><a href="https://www.medra.ai/">Medra</a> is the oldest player here, founded in 2022,<a href="https://www.businesswire.com/news/home/20251211748411/en/Medra-Raises-%2452-Million-Series-A-to-Build-Physical-AI-Scientists"> raising 63 million dollars in the years since</a>. Their pitch is that you already have the arms, you already have the boxes, and both are quite good at what they do. Really, what you need most is <em>intelligence</em>. Yes, perhaps the translation layers that the first camp is building, but the Medra pitch is a bit more all-encompassing than that. Onboard robotic intelligence would not only make it easier to do translation, but also error recovery, adaptation to novel situations, ability to interface with arbitrary machines (even ones that are meant to be worked manually), autonomously optimize protocols, design its own experiments outright, and generally <em>handle</em> the thousand small variations that make lab work so resistant to typical, more brittle automation.</p><p><a href="https://www.zeonsystems.ai/">Zeon Systems</a> is our final company, and is fundamentally quite similar to Medra, but with a quirk that I find very endearing: their use of intelligence is not necessarily to make robots more capable, but to make them more forgiving. In 2014,<a href="https://opentrons.com/"> Opentrons</a> started, attempting to democratize automation by making the hardware cheap, but cheap hardware comes with cheap hardware problems&#8212;tips that don&#8217;t seat properly, positioning that drifts, calibration that goes wonky. <strong>The Zeon bet is that sufficiently good perception and intelligence can compensate for these shortcomings</strong>. If the robot can <em>see</em> that the tip didn&#8217;t seat properly and adjust accordingly, you no longer need the tip to seat properly every time. If the robot can <em>detect</em> that its positioning is off and correct in real-time, you no longer need precision machining to sub-millimeter tolerances. Intelligence, in this framing, is not a way to make robots do more, but rather a way to get away with worse machinery. And worse machinery means cheaper machinery, which means more labs can afford to automate, which means more Zeon products (whether that takes the form of software or software + hardware) can be sold. </p><p>Okay, that&#8217;s that. Those are the three camps.</p><p>Now the obvious question: which one of them is correct?</p><p>The most nuanced take is: <strong>all of them</strong>. It feels at least somewhat obvious that all possible futures will eventually demand <em>something</em> of all of these camps, and the companies that thrive will be those that correctly identify which layer is the binding constraint for which customer at which moment in time.</p><p>But here is a more opinionated take on each one:</p><p><strong>The translation layer camp, to my eyes, has the most honest relationship with this problem.</strong> They are not promising to make robots smarter or to sell you better hardware, they are instead promising to make the existing robots easier to talk to, such that the activation energy required to automate a protocol drops low enough that even infrequently-run experiments become viable candidates. If we accept that this problem of protocol building is actually the real fundamental bottleneck to increasing the scale of automation, we should also trust the Tetsuwan/Synthase/Briefly&#8217;s of the world to have the best solutions. </p><p>You can imagine a pretty easy failure case here is that frontier-model LLM&#8217;s get infinitely better, negating any need for the custom harnesses these groups are building, slurping up any market demand that they would otherwise have. To be clear, I don&#8217;t really believe this will happen, for the same reason I think<a href="https://exa.ai/"> Exa</a> and<a href="https://www.clay.com/"> Clay</a> will stick around for awhile; these tools are complicated, building complicated tools requires focus, and frontier model labs are not focused on these particular use-cases. And importantly, many of the problems that constitute translation are solved best through deterministic means (deck &amp; plate layouts, choosing liquid class parameters, pipetting strategies, math of volume calculations). Opus 8 or whatever may be great and an important part of the translation solution, but it probably should not be used as a calculator.</p><p><strong>The hardware camp is curious, because, you know, it doesn&#8217;t actually make a lot of sense if the goal is &#8216;</strong><em><strong>democratizing lab automation</strong></em><strong>&#8217;.</strong> Automata&#8217;s LINQ benches and Ginkgo&#8217;s RACs are expensive&#8212;extremely expensive!&#8212;vertically-integrated systems. They make automation <em>better</em> for orgs that have already crossed the volume threshold where automation makes sense. But they don&#8217;t actually lower that threshold, nor add in new capabilities that the prior systems couldn&#8217;t do. If anything, they raise it! You need even more throughput to justify the capital expenditure! So, what, have they taken the wrong bet entirely? I think to a certain form of purist, yes. </p><p><strong>But consider the customer base these companies are actually chasing.</strong> Automata and Ginkgo alike are pitching their solutions to large pharma and industrial biotech groups. In other words, the primary people they are selling to are not scrappy academic labs, but rather organizations running thousands of experiments per week, with automation teams already on staff, who have <em>already</em> crossed the volume threshold years ago. Their problem has long gone past &#8216;<em>should we automate?</em>&#8217;, and has now entered the territory of &#8220;<em>how can we partner with a trusted institutional consultant to scale to even larger degrees?</em>&#8220;. To those folks, LINQ and RAC&#8217;s may make a lot of sense! But there is an interesting argument that, in the long term, these may end up performing the democratization of automation in a roundabout way. We&#8217;ll discuss that a bit more in the next section.</p><p><strong>Finally, the intelligence camp.</strong> We can be honest with each other: it has a certain luster. It is appealing to believe that a heavy dose of intelligence is All You Need. In fact, visiting the <a href="https://www.medra.ai/">Medra</a> office earlier this year to observe their robots dancing around was the catalyst I needed to sit down and finish this article. Because how insane is it that a robotic arm can swish something out of a centrifuge, pop it into a plate, open the cap, and transfer it to another vial? Maybe not insane at all, maybe that&#8217;s actually fully within the purview of robotics to easily do, but that&#8217;s what the article was meant to discover. But after having talked to as many people as I have, I have arrived at a more granular view than &#8220;<em>intelligence good</em>&#8221; or &#8220;<em>intelligence premature</em>.&#8221;. There are really two versions of the intelligence thesis. The near-term version is about perception and error recovery: the robot sees that a tip didn&#8217;t seat properly and adjusts, detects that its positioning has drifted and corrects in real-time, recognizes that an aspiration failed and retries before the whole run is ruined. This feels quite close! The far-term version is something grander,  where you can trust the robot to handle every step of the process, where you show a robot a video of a grad student performing a protocol and it just does it, perhaps even optimizing it, maybe even designing its own experiments&#8212;the intelligence onboard granting the robot all the necessary awareness and dexterity to complete anything and everything. </p><p>This future may well come! It is not an unreasonable bet. <strong>But, from my conversations, it does seem quite far away</strong>. Yes, it is easy to look at the results <a href="https://www.pi.website/research/human_to_robot">Physical Intelligence</a> are producing and conclude that things are close to being solved, but lab work is <em>very</em> out-of-distribution to what most of these robotics foundation models are learning (and what they have learned is often <em>still</em> insufficient for their own, simpler folding-laundry-y tasks!). I want to be careful not to overstate this, since this greater intelligence may arrive faster than anyone suspects, so perhaps this take will be out of date within the year. </p><p>And wait! Wait! Before you take any of the above three paragraphs as a statement on <strong>companies</strong> rather than <strong>philosophies</strong>, you should recall what I said in the second paragraph of this section: none of these companies, many of whom the founders I talked to for this article, are so dogmatic as to be <strong>entirely</strong> translation/hardware/intelligence-pilled. They may <em>lean</em> that direction in the revealed preferences of how their companies operate, but they are sympathetic to each camp, and nearly all of them have plans to eventually play in sandboxes other than the ones they currently occupy.</p><p>Speaking of, how are any of these companies making money?</p><h2><strong>All roads lead to Transcriptic</strong></h2><p>There is a phenomenon in evolutionary biology called <a href="https://en.wikipedia.org/wiki/Carcinisation">carcinization</a>, which refers to the fact that nature keeps accidentally inventing crabs. Hermit crabs, king crabs, porcelain crabs; many of these are not closely related to each other at all, and yet they all independently stumbled into the same body plan, because apparently being shaped like a crab is such an unreasonably good idea that evolution cannot help itself. It just keeps doing it. I propose to you that there is a nearly identical phenomenon occurring in lab robotics, where every startup, regardless of what its thesis is, will slowly, inexorably, converge onto the same form.</p><p>Becoming <a href="https://www.ycombinator.com/companies/transcriptic">Transcriptic</a>.</p><p>Transcriptic was founded in 2012 by <a href="https://www.linkedin.com/in/maxhodak/">Max Hodak</a> (yes, the same Max who co-founded <a href="https://neuralink.com/">Neuralink</a>, and then later <a href="https://science.xyz/">Science Corp</a>). The pitch of the company was simple: we&#8217;ll build out a facility stuffed with boxes and arms and software to integrate them all, and invite customers to interact with them through a web interface, specify experiments in a structured format, and somewhere in a facility, the lab will autonomously execute your will (alongside humans to pick up the slack). <strong>In other words, a &#8216;cloud lab&#8217;</strong>.</p><p>The upside is that the sales pitch basically encompasses the entirety of the wet-lab market: don&#8217;t set up your own lab, just rent out the instruments in ours! And with sufficiently good automation, and software to use that automation, the TAM of this is a superset of a CRO.</p><p><strong>The obvious downside is that doing this well is really, really hard</strong>. Transcriptic later merged with the automated microscopy startup &#8216;3Scan&#8217;, which rebranded as &#8216;Strateos&#8217;, which folded in 2023. This tells us something about the difficulty of this model. This said, <a href="https://www.emeraldcloudlab.com/">Emerald Cloud Labs </a>(ECL) is a startup that appeared two years after Transcriptic with similar product offerings, and they&#8217;ve held out, with a steady 170~ employees over the past two years. Yet, while they ostensibly <em>are</em> a cloud lab, they are not the platonic ideal of one in the same way Transcriptic was, in that anyone and everyone can simply log in, and run whatever experiment they&#8217;d like; ECL&#8217;s interface is gate-kept by a contact page.</p><p>Despite the empirical difficulty of making it work, it feels like going down the Transcriptic path is the logical conclusion of nearly any sufficiently good lab automation play.</p><p>Why?</p><p>Here, I shall refer to &#8216;<a href="https://synbio25.com/">Synbio25</a>&#8217;, a wonderful essay by<a href="https://keonigandall.com/"> Keoni Gandall</a> that I highly recommend you read in full. In this essay, Keoni discusses <em>many</em> things, but what I find most interesting is his comment on the immense economic efficiencies gained by batching experiments:</p><blockquote><p><em>Robots, in biotechnology, are shamefully underutilized. Go visit some biology labs &#8212; academic, industrial, or startup &#8212; and you are sure to see robots just sitting there, doing nothing, collecting dust&#8230;.</em></p><p><em>The benefit of aggregating many experiments together in a centralized facility is that we can keep robots busy. Even if you just want to run 1 protocol, there may be 95 others who want to run that 1 protocol as well &#8212; together, you can fill 1 robot&#8217;s workspace optimally. A centralized system lets you do this among many protocols &#8212; otherwise, you&#8217;d need to ship samples between labs, which is just too much. While the final step, testing your particular hypothesis, might still require customized attention and dedicated robot time, the heavy lifting &#8212; strain prep, validation, etc &#8212; can be batched and automated.</em></p></blockquote><p>And one paragraph that I really think is worth marinating in (bolding by me):</p><blockquote><p><em>The key, then, is to pull these robots towards projects and protocols that are closer and closer to the raw material side of biology, so that you can build everything else on top of those. For example, PCR enzyme, polymerase, is very widely used, but rather expensive if you buy proprietary enzymes. On the other hand, you can produce it for yourself very cheaply. If you utilize your robots to produce enzymes, you can then use this enzyme in all other experiments, dropping the costs of those experiments as well. <strong>The reason is quite simple: without a middleman, your costs approach chemical + energy + labor costs. A billion years of evolution made this, relative to other industries, very inexpensive. You just need to start from the bottom and move up.</strong></em></p></blockquote><p>There is a very neat logical train that arises from this!</p><p>If you are to accept that lab centralization (as in, cloud labs) means you can most efficiently use lab robotics&#8212;which feels like a pretty uncontroversial argument&#8212;it <em>also</em> means that the further you lean into this, <strong>the more able you are to vertically integrate upstream</strong>. If you&#8217;re running enough experiments such that your robots are constantly humming, you can justify producing your own reagents. If you&#8217;re producing your own reagents, your per-experiment costs drop. If your per-experiment costs drop, you can offer lower prices. If you offer lower prices, you attract more demand. If you attract more demand, your robots stay even busier. If your robots stay even busier, you can justify producing even more of your own inputs. And so on, ad infinitum, until you devour the entirety of the market, and the game of biology becomes extraordinarily cheap and easy for everyone to play in.</p><p>As an example, the Synbio25 essay<a href="https://synbio25.com/#chapter5"> offered this picture</a> showing the plasmid production cost differences between unoptimized and optimized settings (read: producing enzymes + cells in-house and using maximum-sized sequencing flow cells). Over twice as cheap!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uNWj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uNWj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uNWj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg" width="563" height="323.15695067264573" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1338,&quot;resizeWidth&quot;:563,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uNWj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uNWj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61ccff25-9369-4e90-84f2-1265d21942b5_1338x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How dogmatic am I being here? Surely there are other business models that could work.</p><p>Perhaps for the next decade! <strong>But on a long-enough time horizon, it does feel like eventually everything becomes a cloud lab</strong>. Nothing else besides this really seems to work, or, if they do, their upside is ultimately capped and not &#8216;<em>venture-scalable&#8217;</em>, which is to say they may work, but you better not take any money to make it happen. Selling software means you&#8217;re easy to clone, being a CRO means you&#8217;re ultimately offering a subset of services that a cloud-lab can, and automation consulting has limited upsides. The best potential alternative on the table is to become a Thermo-Fisher-esque entity, selling boxes and arms and reagents to those who want to keep things in-house. But how many of those will there realistically be? How many holdouts could possibly remain as the cloud labs get more and more trustable, all while the business of biotech becomes (probably) <a href="https://en.wikipedia.org/wiki/Eroom%27s_law">more and more economically challenging</a>, making it so justifying your own lab expenses become ever-more difficult? </p><p>But how things shake out in the short-term may be different. Because while Transcriptic doesn&#8217;t exist today, Emerald Cloud Labs does! And yet, they aren&#8217;t necessarily a juggernaut. As of today, we exist in a sort of purgatory, mid-way state, where neither the capability nor the trust to fully rely entirely on cloud labs yet exists. But it is coming. You can see it on the horizon. And so the interesting question to ask is: who stands to benefit the most from the wave in the coming years?</p><p><strong>And here is where the hardware camp&#8217;s bet becomes a lot more convincing in retrospect.</strong> Yes, Automata and Ginkgo are selling expensive hardware systems to large pharma. But you can see the inklings of at least Ginkgo attempting lab centralization themselves by<a href="https://datapoints.ginkgo.bio/"> dogfooding their own machines to sell data to customers</a>. Right now, it is functionally a CRO, with a menu of options. But what comes next? I don&#8217;t <em>personally<strong> </strong></em>know how much easier RAC&#8217;s are to set-up for high-mix (read: highly heterogeneous lab experimentation), but the general sense I get from people is that they are. And if that&#8217;s true, then the Ginkgo play starts to look less like &#8220;<em>we are selling expensive hardware to pharm</em>a&#8221; and more like &#8220;<em>we are building the infrastructure that will eventually become the dominant cloud lab, and we&#8217;re getting pharma to pay for the R&amp;D in the meantime</em>.&#8221; Which is, if you squint, actually quite clever. Will they pull it off? I don&#8217;t know! Something similar for Automata could be said as well; the institution who gathered up a decades-worth of information on how automation is <em>practically</em> used may be well-poised to eventually operate their own cloud lab, having already learned&#8212;on someone else&#8217;s dime&#8212;exactly where the workflows break down and how to fix them.</p><p>How about the other groups? What can the intelligence and translation layer groups do during this interim period?</p><p>There&#8217;s a lot of possibilities. The simplest one is to get acquired. If the endgame is cloud labs, and cloud labs need both intelligence and translation layers to function, then the most straightforward path for these startups is to build something valuable enough that a cloud-lab-in-waiting (like Ginkgo or Automata themselves) decides to buy them rather than build it themselves. Similarly, these startups could become the picks-and-shovels provider that <em>every</em> cloud lab depends on. </p><p><strong>But you could imagine more ambitious futures here too.</strong> Remember: you can just <em>buy</em> the hardware. Ginkgo&#8217;s RACs, Hamilton&#8217;s liquid handlers&#8212;none of this is proprietary in a way that prevents a sufficiently well-capitalized would-be-cloud lab from simply buying or even making it themselves. The hardware is a commodity, or at least it&#8217;s becoming one. What&#8217;s <em>not</em> a commodity is the intelligence to run it and the translation layer to make it accessible. So you could tell a story where the hardware companies win the short-term battle&#8212;racking up revenue, raising money, building out their systems&#8212;only to lose the long-term war to translation/intelligence groups who buy their hardware off the shelf and differentiate on software instead.</p><p>Of course, the steelman here is that the hardware companies could simply use their revenue advantage to build the software themselves. </p><p>We&#8217;ll see what happens in the end. Smarter people than me are in the arena, figuring it out, and I am very curious to see where they arrive.</p><p>This section is long, but we have one last important question to ask: <strong>why did the first generation of cloud labs not do so well?</strong> Was it merely a technological problem? Were they simply too early? This is unlikely according to the automation engineers I talked to; there aren&#8217;t <em>massive </em>differences between the machinery back then, and the machinery today. Could blame be placed on the translation layer that these companies had? It doesn&#8217;t seem like it; using Transcriptic, as documented in a <a href="https://blog.booleanbiotech.com/genetic_engineering_pipeline_python">2016 blog post by Brian Naughton</a> to create a protein using their service, doesn&#8217;t seem so terrible.</p><p>What else could be the issue?</p><p>There is one pitch offered by <a href="https://www.linkedin.com/in/shelbynewsad/">Shelby Newsad</a> that I found interesting. The problem is not that these companies were too early, but rather that they <a href="https://x.com/shelbynewsad/status/2018402785226834377">were simply too general</a>, and because they were too general, <strong>they could never actually make any single workflow frictionless enough to matter.</strong></p><p>In the comments of that post made by Shelby,<a href="https://x.com/koeng101/status/2018415434257842538"> the same Keoni we referenced earlier explained what it was actually like to use a cloud lab</a> (Transcriptic): you had to buy your own polymerase from New England Biolabs, ship it to their facility, pay for tube conversion, and <em>then</em> implement whatever cloning and sequencing pipeline you wanted to run. By the time you&#8217;d coordinated all of this, you might as well have just done it yourself. The automation was there! The robots were ready! But because Transcriptic had attempted the &#8216;AWS for biotech&#8217; strategy right out of the gate, they offloaded the logistical headaches to the user. There is also a side note on how fixing issues with your experiment was annoying, as Brian Naughton states in his blog post: &#8216;<em>debugging protocols remotely is difficult and can be expensive &#8212; especially differentiating between your bugs and Transcriptic&#8217;s bugs</em>.&#8217;</p><p><strong>Delighting the customer is important! </strong>Compare this to<a href="https://plasmidsaurus.com/"> Plasmidsaurus</a>. They (mostly) do one thing: plasmid DNA sequencing. You mail them a tube, they sequence it, you get results. That&#8217;s it, no coordination needed on your end, the entire logistics stack is their problem. And it has led to them utterly dominating that market, and slowly expanding their way to RNA-seq, metagenomics, and AAV sequencing. In fact, if we&#8217;re being especially galaxy-brained: there is a very real possibility that none of the companies we&#8217;ve discussed so far end up ushering in the cloud labs of the future, and instead, that prize shall be awarded to Plasmidsaurus and other, Plasmidsaurus-shaped CROs, expanding one vertical at a time. </p><p>Either way, this reframes the earlier question of which camp will win. Perhaps it&#8217;s not just about translation layers versus hardware versus intelligence. <strong>It&#8217;s about who can solve the logistics problem for a set of high-value workflows, and then use that beachhead to expand.</strong></p><h1><strong>Conclusion</strong></h1><p>This field is incredibly fascinating, and the future of it intersects with a lot of interesting anxieties. China is devouring our preclinical lunch, will lab robotics help? The frontier lab models are getting exponentially better, will lab robotics take advantage of that progress to perform autonomous science? Both of these, and more, are worthy of several thousand more words devoted to them. However, this essay is already long, so I leave these subjects to another person to cover in depth.</p><p>But there is one final thing I want to discuss.<strong> It is the very real possibility that lab robotics, cloud labs, and everything related to them, will not actually fundamentally alter the broader problems that drug discovery faces.</strong></p><p>You may guess where this is going. It is time to read a Jack Scannell paper.</p><p>In Jack&#8217;s 2022 Nature Reviews article, &#8216;<em><a href="https://gwern.net/doc/statistics/order/2022-scannell.pdf">Predictive validity in drug discovery: what it is, why it matters and how to improve it</a></em>&#8217;<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, he and his co-authors make a simple argument: the thing that matters most in drug R&amp;D is not how many candidates you can test, but how well your tools for evaluating those candidates correlate with what actually works in humans. They call this &#8216;predictive validity&#8217;, and they operationalize it as the correlation coefficient between the output of whatever decision tool you&#8217;re using&#8212;a cell-based assay, an animal model, a gut feeling&#8212;and clinical utility in actual patients. <strong>The primary takeaway here is their demonstration that a 0.1 absolute change in this correlation&#8212;shifting from, say, 0.5 to 0.6&#8212;can have a bigger impact on the positive predictive value of one&#8217;s R&amp;D pipeline than screening ten times, or even a hundred times, more candidates.</strong> </p><p>They illustrate this with a fun historical example: in the 1930s, Gerhard Domagk screened a few hundred dyes against Streptococcus in live mice and discovered sulfonamide antibiotics. Seven decades later, GSK ran 67 high-throughput screening campaigns, each with up to 500,000 compounds, against isolated bacterial protein targets, and found precisely zero candidates worthy of clinical trials. How could this be? It is, of course, because the mice were a better decision tool than the screens, as they captured the in-vivo biology that actually mattered.</p><p>What is the usual use-case for lab robotics? <strong>It is meant to be a throughput multiplier.</strong> It lets you run more experiments, faster. And Scannell is stating that moving along the throughput axis&#8212;running 10x or 100x more experiments through the same assays&#8212;is surprisingly unimpressive compared to even modest improvements in the quality of those assays. And given the failure rate of drugs in our clinical trials, <a href="https://www.technologynetworks.com/drug-discovery/articles/why-97-of-oncology-clinical-trials-fail-to-receive-fda-approval-327724">which hover as high as 97% in oncology</a>, the assays are empirically not particularly good. </p><p>But, to be clear, this is not an anti-automation take. It is a reframing of what automation should be for.</p><p>It feels like the value that the lab-robotics-of-tomorrow will bring to us will almost certainly not be in gently taking over the reins of existing workflows and running them themselves. <strong>It will be in enabling different experiments, </strong><em><strong>better</strong></em><strong> ones, ones with higher predictive validity, at a scale that would be impossible without automation.</strong> And this doesn&#8217;t require any suspension of disbelief about what &#8216;autonomous science&#8217; or something akin to it may one day bring! The arguments are fairly mundane.</p><p>In the same Scannell paper, he argues that companies should be pharmacologically calibrating their decision tools, as in, running panels of known drugs, with known clinical outcomes, through their assays to measure whether the assay can actually distinguish hits from misses. Almost nobody does this, because it is expensive, tedious, and produces neither a publication nor a patent. But if per-experiment costs drop far enough, if they no longer require expensive human hands to perform, calibration becomes economically rational, and the industry could move from <em>assuming</em> that a given assay is predictive to <em>measuring</em> whether it is. Similarly, <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4461318/">given the 50% irreproducibility rate in preclinical research</a>, it may be the case that many otherwise &#8216;normal&#8217; assays are yielding useless results, entirely because they are performed manually, by individual researchers with slightly different techniques, in labs with slightly different conditions, who did not have the instruments needed to validate their reagents. <strong>Sufficiently good cloud automation could free these assays from their dependence on individual hands, and allow higher-standard experimentation to be reliably performed at scale.</strong></p><p>In other words: if you follow the trend-lines, if per-experiment costs continue to fall, if the translation layers keep getting better, if the cloud labs keep centralizing and vertically integrating and driving prices down further still, then at some point, perhaps not far from now, <strong>it becomes rational to do the things that everyone already knows they </strong><em><strong>should</strong></em><strong> be doing but can't currently justify</strong>. And this alone, despite its relative banality, may be enough to alter how drug discovery as a discipline is practiced. </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Shout out to <a href="https://www.linkedin.com/in/cristian-ponce5/">Cristian</a> for sending me this paper!</p></div></div>]]></content:encoded></item><item><title><![CDATA[Questions to ask when evaluating neurotech approaches ]]></title><description><![CDATA[5.2k words, 24 minutes reading time]]></description><link>https://www.owlposting.com/p/questions-to-ponder-when-evaluating</link><guid isPermaLink="false">https://www.owlposting.com/p/questions-to-ponder-when-evaluating</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sun, 25 Jan 2026 16:11:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qlTr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qlTr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qlTr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qlTr!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8165467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/162969083?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qlTr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!qlTr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7b0ae3b-31c7-479c-87f7-ad0414f3aace_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: Extraordinarily grateful to <a href="https://www.linkedin.com/in/milancvitkovic/">Milan Cvitkovic</a>, <a href="https://www.linkedin.com/in/sumnernorman/">Sumner Norman</a>, <a href="https://www.linkedin.com/in/ben-woodington/">Ben Woodington</a>, and <a href="https://www.linkedin.com/in/adam-marblestone-87202813/">Adam Marblestone</a> for all the helpful conversations, comments, and critiques on drafts of this essay. </em></p><p><em>Second note: I am co-hosting an NYC Biotech x ML meetup on Feb 11th, <a href="https://luma.com/jl9guyun">here is the link.</a></em><a href="https://luma.com/jl9guyun"> </a></p><ol><li><p><a href="https://www.owlposting.com/i/162969083/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/162969083/questions">Questions</a></p><ol><li><p><a href="https://www.owlposting.com/i/162969083/how-relevant-are-the-state-measurements-to-the-application">How relevant are the state measurements to the application?</a></p></li><li><p><a href="https://www.owlposting.com/i/162969083/what-are-the-costs-and-burdens-for-the-user">What are the costs and burdens for the user?</a></p></li><li><p><a href="https://www.owlposting.com/i/162969083/how-much-is-the-approach-fighting-physics">How much is the approach &#8216;fighting physics&#8217;?</a></p></li><li><p><a href="https://www.owlposting.com/i/162969083/do-they-know-whether-their-advantages-translates-to-clinical-benefit">Do they know whether their advantages translates to clinical benefit?</a></p></li><li><p><a href="https://www.owlposting.com/i/162969083/could-this-be-done-without-touching-the-central-nervous-system">Could this be done without touching the central nervous system?</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/162969083/conclusion">Conclusion</a></p></li></ol><h1><strong>Introduction</strong></h1><p>Neurotech is complicated. This is because you need to understand at least five fields at once to actually grasp what is/isn&#8217;t possible: electrical engineering, mechanical engineering, biology, neuroscience, and computer science. And, if you&#8217;re really trying to cover all the bases: surgery, ultrasound and optical physics as well. And I&#8217;ve met relatively few people in my life who can operate at the intersection of three fields, much less eight! As a result, I&#8217;ve stayed away from the entire subject, hoping that I&#8217;d eventually learn what&#8217;s going on via osmosis.</p><p>This has not worked. Each time a new neurotech startup comes out, I&#8217;d optimistically chat about them with some friend in the field and they inevitably wave it off for some bizarre reason that I would never, ever understand. But the more questions I asked, the more confused I would get. And so, at a certain point, I&#8217;d just start politely nodding to their &#8216;<em>Does that make sense?</em>&#8217; questions.</p><p>I have, for months, been wanting to write an article to codify the exact mental steps these people go through when evaluating these companies. After talking to many experts, I have decided that this is a mostly impossible task, but that there are at least a few, small, <em>legible</em> fractions of their decision-making framework that are amenable to being written out. This essay is the end result.</p><p>My hope is that this helps set up the mental scaffolding necessary to triage which approaches are tractable, and which ones are more speculative. Obviously, take all of my writing with a grain of salt; anything that touches the brain is going to be complicated, and while I will try to offer as much nuance as possible, I cannot promise I will offer as much as an Actual Expert can. Grab coffee with your local neurotech founder!</p><h1><strong>Questions</strong></h1><h2><strong>How relevant are the state measurements to the application?</strong></h2><p>At least some forms of neurotech, like brain-computer-interfaces, perform some notion of &#8216;<em>brain state reading</em>&#8217; as part of their normal functionality.</p><p>Well, what <strong>exactly</strong> is &#8216;<em>brain state</em>&#8217;?</p><p>Unfortunately for us, &#8216;<em>brain state</em>&#8217; lies in the same definitional scope as &#8216;<em>cell state&#8217;</em>. As in, there isn&#8217;t really a great ground truth for the concept. <strong>But there are things that we hope are related to it!</strong> For cells, those are counts of mRNA, proteins found, chromatin landscape of the genome, and so on. For brains, there are four main possibilities to get at a notion of <em>state</em>:</p><ol><li><p>Measure the spiking activity of singular neurons (very invasive)</p></li><li><p>Measure the activity of local field potentials (can be slightly less invasive)</p></li><li><p>Measure hemodynamics (blood flow or oxygenation) changes (can be non-invasive, though higher-res invasive)</p></li><li><p>Measure electromagnetic fields outside the skull (usually non-invasive)</p></li></ol><p>There is an ordering here; at the top, we have measurements that are closest to the actual electrical signaling that (probably) defines moment-to-moment neural computation. As we move down the list, each method becomes progressively more indirect, integrating over larger populations of neurons, longer time windows, and/or more layers of intermediary physiology.</p><p>This is perhaps overcomplicating things, but there&#8217;s one also, slightly more exotic approach not mentioned here (and that I won&#8217;t mention again), <a href="https://science.xyz/news/biohybrid-neural-interfaces/">called biohybrid devices</a>. In these systems, neurons grown ex-vivo are engrafted to a brain, and <strong>those</strong> neurons are measured directly, so it&#8217;s sort of an aggregate measure like LFP, but also it&#8217;s technically able to measure single spikes. </p><p>But keep in mind: none of these actually work at understanding the full totality of every single neuron firing in a brain, <a href="https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2013.00137/full">which is a largely physically intractable thing to perform</a>. Which is fine and fair! Understanding <strong>totalities</strong> is a tall bar to meet. But it does mean that whenever we stumble across a new company, we should ask the question: <strong>how relevant is their method of understanding brain state to the [therapeutic area] they actually care about? </strong>Superficial cortical hemodynamics won&#8217;t reveal hippocampal spiking, 2-channel EEG won&#8217;t decode finger trajectories, and so on.</p><p><strong>With this context, let&#8217;s consider<a href="https://www.kernel.com/"> Kernel</a>, a neurotech company founded by the infamous <a href="https://en.wikipedia.org/wiki/Bryan_Johnson">Bryan Johnson</a> in the mid-2010&#8217;s.</strong> Their primary product is called <strong>Kernel Flow</strong>, a headset that does <em>time-domain functional near-infrared spectroscopy</em> (TD-fNIRS) to measure brain state, which tracks blood oxygenation by measuring how light scatters through the skull. In other words, this is a hemodynamics measurement device.</p><p>It is non-invasive, portable, and looks like a bike helmet (which is an improvement compared to many other neurotech headsets!).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WFBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WFBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WFBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg" width="487" height="487" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1080,&quot;width&quot;:1080,&quot;resizeWidth&quot;:487,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Kernel Flow - AI for Good&quot;,&quot;title&quot;:&quot;Kernel Flow - AI for Good&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Kernel Flow - AI for Good" title="Kernel Flow - AI for Good" srcset="https://substackcdn.com/image/fetch/$s_!WFBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WFBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb18299e1-8748-473e-8552-98e850d5eab8_1080x1080.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One common thing you&#8217;ll find on most neurotech websites is a &#8216;spec sheet&#8217; of their device. For most places, you&#8217;ll need to formally request it, but Kernel helpfully provides easy access to it<a href="https://www.kernel.com/specs/Flow%202%20Spec%20Sheet.pdf"> here</a>.</p><p>In it, they note that the device has an imaging rate of 3.76Hz, which means it&#8217;s taking a full hemodynamic measurement about every <strong>266 milliseconds </strong>across the surface of the brain<strong>. </strong>This is fast in absolute terms, but slow on the level of (at least some) cognitive processes, which often unfold on the order of tens of milliseconds. For example, the neural signatures involved in<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6508977/?utm_source=chatgpt.com"> recognizing a face</a> or<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6674116/?"> initiating a movement</a> can happen in less than 100 milliseconds. And to be clear, this is not something that can be altered by increasing the sampling rate; the slowness is inherent to hemodynamic measurements in general.</p><p><strong>This means that by the time Flow finishes one hemodynamic snapshot, many of the neural events we care about have started and finished.</strong></p><p>The spec sheet also notes that the device comes with 4 EEG electrodes, which have a far higher sampling rate of 1kHZ, or 1,000 measurements per second. At first glance, this seems like it might compensate for the sluggish hemodynamic signal by offering access to fast electrical activity. But in practice, 4 channels are entirely insufficient for learning really <strong>anything</strong> about the brain. Keep in mind that clinical-grade usually operates at the 32-channel-and-above level!</p><p>I found<a href="https://www.sciencedirect.com/science/article/pii/S0165027015003064"> one paper that investigated the localization errors of EEG&#8217;s</a>&#8212;as in, can you correctly place where in the brain a spike is occurring&#8212;across a range of channels: 256, 128, 64, 32, and 16. Not even 4! Yet, even at the 16-channel level, spatial localization was incredibly bad; one example of its failure case being that it mis-localized a temporal-lobe spike to the frontal lobe.<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6458265/"> Past that, noise like muscle and eye movement artifacts often dominates the EEG signal at the lowest channel counts.</a></p><p>And, again, this was on 16 channels! One can only imagine how much worse 4 channels is.</p><p>Of course, 4-channels of EEG data clearly offer <strong>something</strong>. In the context of the device, they may serve as a coarse sanity check or a minimal signal for synchronizing with the slower hemodynamic measurements. Which maybe is enough to be useful?</p><p>But we may be getting ahead of ourselves by getting lost in these details. It is entirely irrelevant to consider the absolute value of any given measurement decision being made here, because, again, what <strong>actually</strong> matters is the <strong>relevancy </strong>of those measurements to whatever the intended use case is.<a href="https://www.biorxiv.org/content/10.1101/2024.03.12.584660v1.full.pdf"> Clearly the devices measurements are, at least, trustworthy.</a> But what is it meant to be used for?</p><p>Well&#8230;it&#8217;s vague. Kernel&#8217;s public messaging has shifted over the years&#8212;from &#8220;<a href="https://www.linkedin.com/pulse/most-incredible-technology-youve-neverseen-bryan-johnson/">neuroenhancement</a>&#8221; and &#8220;<a href="https://www.thetimes.com/business-money/technology/article/kernel-flow-the-50m-fitbit-for-your-brain-vddmknjh2">mental fitness</a>&#8221; to, most recently, &#8220;<a href="https://www.kernel.com/">brain biomarkers</a>.&#8221;. I am not especially well positioned to answer whether this final resting spot is relevant to what Kernel is measuring, but it feels like it is? At least if you look at their<a href="https://www.kernel.com/research"> publications</a>, which do show that the device is capable of capturing global brain state changes when under the influence of psychoactive substances, e.g.<a href="https://www.nature.com/articles/s41598-023-38258-8"> ketamine</a>. So, even if hemodynamics doesn&#8217;t meet the lofty goal of being able to detect face recognition, that&#8217;s fine! Static-on-the-order-of-minutes biomarkers are fully within their measuring purview. </p><p>Does that make Kernel useful? I don&#8217;t know the answer to that, but we&#8217;ll come back to the subject in a second.</p><h2><strong>What are the costs and burdens for the user?</strong></h2><p>In short: a device must earn its place in a patient's life. </p><p>The historical arc of neurotech companies lay mainly in serving desperate people that have literally no other options: ALS, severe spine damage, locked-in syndrome, and the like. The giants of the field&#8212;<a href="https://synchron.com/">Synchron</a>,<a href="https://blackrockneurotech.com/"> Blackrock Neurotech</a>, and<a href="https://neuralink.com/"> Neuralink</a>&#8212;have all positioned themselves around these, and so their maximally invasive nature is perfectly fine with their patients. Now, fairly, Synchron apparently doesn&#8217;t have the greatest reputation and Blackrock is somewhat old-fashioned, so Neuralink could be considered the <strong>only</strong> giant, but all three did pop up a lot during my research! </p><p>Blackrock Neurotech are the creators of the<a href="https://en.wikipedia.org/wiki/Microelectrode_array"> Utah Array</a>, which remains the gold standard for invasive, in-vivo neural recording.<a href="https://neuralink.com/technology/"> Neuralink, the newest and most-hyped, have iterated on the approach, developing ultra-thin probes</a> that can be inserted into the brain to directly record signals. Synchron has the least invasive approach, with its primary device being an endovascular implant called the<a href="https://beingpatient.com/stentrode-synchron-bci/"> </a><em><a href="https://beingpatient.com/stentrode-synchron-bci/">Stentrode</a></em>, allowing neural signals to be read less invasively than a Utah Array or Neuralink (from a blood vessel in the brain rather than in the parenchyma), though at a severe cost of signal quality. </p><p>You could find faults with these hyper-invasive neurotech companies on the basis of &#8216;<em>how realistically large is the patient population?</em>&#8217;, but you can&#8217;t deny that amongst the patient population that <strong>does</strong> exist, they&#8217;d certainly benefit!</p><p>So&#8230;if you do spot a neurotech company that is targeting a less-than-desperate patient population, you should ask yourself: why would anyone sign up for this? Why would an insurance company pay for it? And most importantly, why would the FDA ever approve something with such a lopsided risk-reward ratio? This is also why you see a lot of neurotech companies pivot toward &#8220;<em>wellness</em>&#8221; applications when their original clinical thesis doesn&#8217;t pan out. Wellness doesn&#8217;t require FDA approval or insurance reimbursement! But it also doesn&#8217;t require the device to actually work.</p><p><strong>But even if a neurotech company is targeting a less-than-desperate patient population and aren&#8217;t trying to push them towards surgery, it&#8217;s still worth thinking about the burdens they pose!</strong></p><p>Neurotech devices can be onerous in more boring ways too, so much so that they can completely kill any desire for any non-desperate person to use it. One example is a device we&#8217;ve talked about: the Kernel Flow. Someone who I chatted with for this essay mentioned that they had tried it, and had this to say about it: </p><blockquote><p><strong>&#8220;</strong><em><strong>[the headset] weighs like 4.5lbs. That is so. fucking. uncomfortable.&#8221;.</strong> </em></p></blockquote><p>Now, it may be the case that the information that the device tells you is of such importance that it is <em>worth </em>putting up with the discomfort. Is the Kernel Flow worth it? I don&#8217;t know, I haven&#8217;t tried it! But in case you ever do personally try one of these wellness-focused devices, it is worth pondering how big of a chore it&#8217;d be to deal with. </p><h2><strong>How much is the approach &#8216;fighting physics&#8217;?</strong></h2><p>Speaking of &#8216;<em>building things for less desperate patients</em>&#8217;, two big neurotech names that often come up are<a href="https://nudge.com/"> Nudge</a> and<a href="https://forestneurotech.org/"> Forest Neurotech</a> (the founder of whom I talked to for this article, <a href="https://www.corememory.com/p/exclusive-openai-and-sam-altman-back-merge-labs-bci">who has since moved to Merge Labs</a>). </p><p>Both of these startups are focusing on brain stimulation for mental health, though Forest&#8217;s ambitions also include TBI and spinal cord injuries. Depression, anxiety, and PTSD can be quite awful, but only the most severely affected patients (single-digit percentages of the total patient population) would likely be willing to receive a brain implant. And both of these companies are fully aware of that, which is why neither of them do brain implants.</p><p>But, even if you aren&#8217;t directly placing wires into the brain, there is still some room to play with how invasive you <em>actually</em> are. I think it&#8217;d be a useful exercise to discuss both Nudge and Forest&#8217;s approaches&#8212;the former non-invasive, the latter invasive (albeit slightly less invasive than a Neuralink, which goes directly into the brain parenchyma)&#8212; because they illustrate an interesting dichotomy I&#8217;ve found amongst neurotech startups: <strong>the degree to which they are attempting to &#8216;fight&#8217; physics.</strong></p><p>At the more invasive end, there&#8217;s Forest Neurotech. Forest was founded in October 2023 by two Caltech scientists&#8212;<a href="https://www.linkedin.com/in/sumnernorman/">Sumner Norman</a> and <a href="https://www.linkedin.com/in/tyson-aflalo-277293229/">Tyson Aflalo</a>&#8212;alongside <a href="https://www.linkedin.com/in/willbiederman/">Will Biederman</a> from Verily. They&#8217;re structured as a nonprofit<a href="https://fas.org/publication/focused-research-organizations-a-new-model-for-scientific-research/"> Focused Research Organization</a> and backed by $50 million from Eric and Wendy Schmidt, Ken Griffin, ARI, James Fickel, and the Susan &amp; Riley Bechtel Foundation. Their approach relies on ultrasound, built on <a href="https://www.butterflynetwork.com/technology?srsltid=AfmBOooDPMHtRwTq_BAMAdTMUonTJ0s5k4_RAlRApguGtSaXjaZUIAr1">Butterfly Network&#8217;s ultrasound-on-chip technology</a>, that sits inside the skull but outside the brain&#8217;s dura mater; also called an &#8216;epidural implant&#8217;. Still invasive, but again, not touching the brain!</p><p>At the less invasive end, there&#8217;s Nudge,<a href="https://x.com/nudge/status/1947673512107524333"> who just raised $100M back in July 2025</a> and has<a href="https://en.wikipedia.org/wiki/Fred_Ehrsam"> Fred Ehrsam</a>, the co-founder of Coinbase, as part of the founding team. They also have an ultrasound device, but theirs is <strong>entirely</strong> non-invasive, and comes with a<a href="https://www.nudge.com/blog/about/"> nice blog post to describe exactly what it is</a>: <em>&#8230;a</em> <em>high channel count, ultrasound phased array, packed into a helmet structure that can be used in an MRI machine.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ltJh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ltJh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ltJh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg" width="479" height="464.15774647887326" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1376,&quot;width&quot;:1420,&quot;resizeWidth&quot;:479,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ltJh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ltJh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2badf02a-3087-4650-9b86-201494af7cbe_1420x1376.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So, yes, both of these are essentially focused ultrasound devices meant for neural stimulation, though I should add the nuance that Forest&#8217;s device is also capable of imaging. But, despite the surface similarities, one distinct split between the two is that, really, Nudge is attempting to fight physics a <em>lot</em> more than Forest. </p><p>Why? Because they must deal with the skull.</p><p>Nudge&#8217;s device works by sending out multiple ultrasound waves from an array of transducers that are timed so precisely that they constructively interfere at a single millimeter-scale point deep in the brain, stimulating a specific neuron population, usually millions of them. <strong>It is not dissimilar to the basic principle as noise-cancelling headphones, but in reverse</strong>: instead of waves cancelling each other out, they add up. The hope is that all the peaks of the waves arrive at the same spot at the same moment&#8212;constructive interference&#8212;and you get a region of high acoustic pressure that can change brain activity. As a sidepoint: you&#8217;d think this works by <strong>stimulating</strong> neurons! But apparently it can work both via <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8789820/">stimulation</a> or <a href="https://www.nature.com/articles/s41598-022-05226-7">inhibition</a>, depending on how the ultrasound is set up.</p><p>How is the Nudge approach fighting physics?</p><p><strong>First, there&#8217;s absorption.</strong> The skull soaks up a substantial chunk of the emitted ultrasound energy and converts it into heat.<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8891811/"> One study found that the skull causes 4.7 to 7 times more attenuation than the scalp or brain tissue combined.</a></p><p><strong>Second, aberration.</strong><a href="https://thejns.org/view/journals/j-neurosurg/132/5/article-p1392.xml"> Because the skull varies in thickness, density, and internal structure across its surface</a>, different parts of your ultrasound wavefront travel at different speeds, so, by the time the waves reach the brain, they&#8217;re no longer in phase. If the whole point of focused ultrasound is getting all your waves to constructively interfere at a single point, the skull messes that up, and the intended focal spot gets smeared, shifted, or might not form properly at all.</p><p><strong>And, finally, the skull varies enormously between individuals.</strong> The &#8220;skull density ratio&#8221;&#8212;a metric that captures how much trabecular (spongy) bone versus cortical (dense) bone you have&#8212;<a href="https://pubs.aip.org/asa/jasa/article-abstract/157/4/2336/3342085/Effects-of-skull-properties-on-continuous-wave?redirectedFrom=fulltext">differs from person to person</a>, and it dramatically affects how well ultrasound gets through.</p><p><strong>Now, to be clear, Nudge is aware of all of these things, and the way they&#8217;ve structured their device is attempting to fight all these problems</strong>. For example,<a href="https://www.nudge.com/blog/about/"> Nudge talks a fair bit about how their device is MRI-compatible</a>. This is great! If you want to correct for aberrations (and for everyone&#8217;s brain being a different shape), you need to know what you&#8217;re correcting <em>for</em>, which means you need a detailed 3D model of that specific patient&#8217;s skull, which means you need an MRI (or better CT). You image the skull, you build a patient-specific acoustic model, you compute the corrections needed to counteract the distortions, and then you program those corrections into your transducer array. Problem solved!</p><p>Well, maybe. Fighting physics is a difficult problem, and we&#8217;ll see what they come up with. While there is already a <a href="https://insightec.com/healthcare-professionals/">focused ultrasound, FDA-approved device</a> that has been used in thousands of surgeries similar to Nudge&#8217;s that can target the brain with millimeter-scale accuracy (albeit for <em>ablating</em> brain tissue, not stimulating it, but the physics are the same!), it is an open question whether Nudge can dramatically improve on the precision and convenience needed to make it useful for mental health applications.</p><p><strong>On the other hand, Forest, by bypassing the skull, is almost certainly assured to hit the brain regions they most want, potentially reaching accuracies at the micron scale.</strong>  Remember that these differences cube, i.e. the number of neurons in a 150 micron wide voxel vs. a 1.5 millimeter wide voxel is (1500^3)/(150^3) =1,000 times more neurons. So it&#8217;s safe to say that the Forest device is, theoretically, 2-3 orders of magnitude more precise in the volumes it interacts with than Nudge is. <strong>Now, Forest still isn&#8217;t exactly an easy bet</strong>, given that they now have to power something near an organ that really, really doesn&#8217;t like to get hot, figure out implant biocompatibility, and a bunch of other problems that come alongside invasive neurotech devices. But they at least do not have to fight the skull, and are thus assured a high degree of precision.</p><p>There is, of course, a reward for Nudge&#8217;s trouble. Nudge, if they succeed, <strong>also</strong> gets access to a much larger potential patient population, since no surgery is needed. This is opposed to Forest, who must limit themselves to a smaller, more desperate demographic.</p><p>As with anything in biology, there is an immense amount of nuance I am missing in this explanation. People actually in the neurotech field are likely at least a little annoyed with the above explanation, because it does leave out something important in this Nudge versus Forest, non-invasive versus invasive, physics-fighting versus physics-embracing debate: <strong>how much does it all matter anyway?</strong></p><h1><strong>Do they know whether their advantages translates to clinical benefit?</strong></h1><p>The brain computer interface field is in a strange epistemic position where devices are being built to modulate brain regions whose exact anatomical boundaries aren&#8217;t agreed on (<a href="https://www.johnsallen.com/wp-content/uploads/2016/06/5-Normal-Neuroanatomical-Variation-AJPA-2002.pdf">and may even diverge between individuals!)</a>, using mechanisms that aren&#8217;t fully understood, for conditions whose neural circuits are still being studied.</p><p>Because of this, despite all the problems I&#8217;ve listed out with going through the skull, Nudge will almost certainly have <em>some</em> successful clinical readouts. Why? It has nothing to do with the team at Nudge being particularly clever, but rather, because<strong> there is already existing proof that non-invasive ultrasound setups somehow work for some clinically relevant objectives.</strong></p><p>Nudge is fun to refer to because they have a lot of online attention on them, but there are other players in the ultrasound simulation space too, ones who are more public with their clinical results.<a href="https://spire.us/"> SPIRE Therapeutics</a> is one such company and they, or at least people associated with the company (<a href="https://scholar.google.com/citations?user=N-mu98AAAAAJ&amp;hl=en">Thomas S Riis)</a>, have papers demonstrating<a href="https://pubmed.ncbi.nlm.nih.gov/38335553/"> tremor</a> alleviation (n=3),<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11562753/"> chronic pain</a> reduction (n=20), and, most relevant to this whole discussion, and<a href="https://www.biologicalpsychiatryjournal.com/article/S0006-3223(24)01662-7/abstract"> depressive symptom</a> improvement (n=22 + randomized + double-blind!), all using their noninvasive ultrasound device.</p><p><strong>How is this possible? How do these successful results square with the skull problems from earlier? </strong></p><p>Clearly, <em>something</em> is getting through the skull, and it seems to be having <em>some</em> clinically significant effect. Because of this, it could very well be possible that the relative broadness of Nudge&#8217;s and SPIRE&#8217;s (and others like them) stimulation is, in fact, perfectly fine, and being incredibly precise is simply not worth the effort. This all said, it is hard to give Forest a fair trial here, since they are basically the only ones going the invasive route for ultrasound, and their clinical trials (which use noninvasive devices) have just started circa early 2025. Maybe their results will be spectacular, and I&#8217;d recommend watching <a href="https://www.corememory.com/p/the-history-and-future-of-brain-implants-ultrasound-sumner-norman">Sumner&#8217;s (the prior Forest CEO) appearance on Ashlee Vance&#8217;s podcast</a> to learn more about early results there.</p><p>But really, this debate between invasive and non-invasive really belongs in the previous section, because the point I am trying to make here is a bit more broad than these two companies. <strong>What I&#8217;m really gesturing at is that being really good at [X popular neurotech metric] doesn&#8217;t alone equal something better!</strong> This is as true for precision as it is for everything else.</p><p>Staying on the example of precision, consider the absolute dumbest possible way you could approach brain stimulation: simply wash the entire brain with electricity and hope for the best.</p><p>This is, more or less, what<a href="https://en.wikipedia.org/wiki/Electroconvulsive_therapy"> electroconvulsive therapy (ECT)</a> does. Electrodes are placed on your scalp, a generalized seizure is induced, and you repeat this a few times a week. You are, in the most literal sense, overwhelming the entire brain with synchronized electrical activity. And yet despite the insane lack of specificity, <a href="https://x.com/therealRYC/status/2004627547515224370">ECT remains the single most effective treatment we have for severe</a>, treatment-resistant depression. Response rates hover around 50-70% in patients for whom nothing else has worked, with some rather insane outcomes, one review paper stating: <a href="https://mentalhealth.bmj.com/content/28/1/e302083">&#8220;</a><em><a href="https://mentalhealth.bmj.com/content/28/1/e302083">For the primary outcome of all-cause mortality, ECT was associated with a 30% reduction in overall mortality</a></em>.&#8221; For some presentations, like depression with psychotic features, catatonia, or acute suicidality, it is essentially first-line.</p><p><strong>This should be deeply humbling for anyone looking into the neuromodulation space.</strong> There are companies raising hundreds of millions of dollars to hit specific brain targets with millimeter, even micron precision, and meanwhile, the most effective neurostimulation-for-depression approach we&#8217;ve ever discovered involves no targeting whatsoever. Now, of course, there are<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7191622/"> genuine downsides to the ECT approach</a> (cognitive side effects, the need for anesthesia, the inconvenience of repeated hospital visits, obviously doesn&#8217;t work for every neuropsychiatric disorder) that make it worth pursuing alternatives! But it does suggest that the relationship between targeting precision and clinical outcome is much more complex than you&#8217;d otherwise assume.</p><p>Consider the opposite failure mode. Early<a href="https://www.mayoclinic.org/tests-procedures/deep-brain-stimulation/about/pac-20384562"> deep brain stimulation</a>&#8212;the most spatiotemporally precise neurostimulation method currently available&#8212;trials for depression are instructive here. Researchers identified what they believed was &#8220;the depression circuit,&#8221; implanted electrodes in that exact area, delivered stimulation, and then watched as several major trials burned tens of millions of dollars on null results. Most infamously, the<a href="https://pubmed.ncbi.nlm.nih.gov/28988904/"> BROADEN trial, targeting the subcallosal cingulate</a>, and the<a href="https://pubmed.ncbi.nlm.nih.gov/25726497/"> RECLAIM trial, targeting the ventral capsule/ventral striatum,</a> both of which failed their primary endpoints. </p><p>Yet, <a href="https://www.ninds.nih.gov/about-ninds/what-we-do/impact/ninds-contributions-approved-therapies/deep-brain-stimulation-dbs-treatment-parkinsons-disease-and-other-movement-disorders">DBS is FDA-approved for Parkinson&#8217;s treatment </a>and is frequently<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8474989/"> used to treat OCD</a>. Each indication is a world unto itself in how amenable it is &#8216;precision&#8217; being a useful metric. </p><p><strong>But again, this point extends beyond precision.</strong></p><p>As a second example, consider the butcher number, a metric first coined by the Caltech neuroscientist<a href="https://scholar.google.com/citations?user=QKhjs2YAAAAJ&amp;hl=en"> Markus Meister</a>, which captures the ratio of the number of neurons destroyed for each neuron recorded. Now, you&#8217;d ideally like to reduce the butcher number, because killing neurons is (probably) bad. And one way you could reliably reduce the butcher number is by simply making your electrodes thinner and more flexible. This is, more or less, at least part of Neuralink&#8217;s thesis:<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6914248/"> their polymer threads are 5 to 50 microns wide and only 4 to 6 microns thick</a> (dramatically smaller than the<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9954796/"> Utah array&#8217;s 400-micron-diameter electrodes</a>!) and thus almost certainly has a low butcher number.</p><p>Here&#8217;s the Neuralink implant:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zbN-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zbN-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zbN-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg" width="470" height="313.3333333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:470,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Building Safe Implantable Devices | Updates | Neuralink&quot;,&quot;title&quot;:&quot;Building Safe Implantable Devices | Updates | Neuralink&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Building Safe Implantable Devices | Updates | Neuralink" title="Building Safe Implantable Devices | Updates | Neuralink" srcset="https://substackcdn.com/image/fetch/$s_!zbN-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zbN-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e34bd5b-67c7-4104-b33c-54bb5bbafe2c_1200x800.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"></figcaption></figure></div><p>And here&#8217;s the Utah array:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TPax!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TPax!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TPax!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TPax!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TPax!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TPax!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg" width="447" height="290.2597402597403" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:770,&quot;resizeWidth&quot;:447,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;How the Utah Array is advancing BCI science&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="How the Utah Array is advancing BCI science" title="How the Utah Array is advancing BCI science" srcset="https://substackcdn.com/image/fetch/$s_!TPax!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TPax!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TPax!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TPax!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76a5e57d-698e-46c6-b066-957faa0c28eb_770x500.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But does having a lower butcher number actually translate to better clinical outcomes? <strong>As far as I can tell, nobody knows!</strong> It&#8217;s largely unstudied! It&#8217;s conceivable that yes, lowering this number is useful, but surely there is a point where the priority of the problem dramatically drops compared to the litany of other small terrors that plague most neurotech startups.</p><p>The point here is not that the butcher&#8217;s number is useless. The point also isn&#8217;t that precision is useless. The point is that the relationship between any given engineering metric and clinical success (in your indication) is rarely as straightforward as anyone hopes, and<strong> it&#8217;s worth considering whether that relationship has actually been </strong><em><strong>established</strong></em><strong> before believing that success on the metric is at all useful.</strong></p><h2><strong>Could this be done without touching the central nervous system?</strong></h2><p>Finally: something that repeated across the neurotech folks I talked to <strong>was that people consistently underestimate how extraordinarily adaptable the peripheral nervous system is</strong>. For example, a company that claims to, say, automatically interpret commands to a digital system via EEG should probably make absolutely certain that attaching an <a href="https://en.wikipedia.org/wiki/Electromyography">electromyography</a> device to a person&#8217;s forearm (and training them to use it) wouldn&#8217;t wind up accomplishing the exact same thing.</p><p>In fact, there was a company that did exactly this. Specifically,<a href="https://dsvw8jmuo45j8.cloudfront.net/"> CTRL-labs</a>, a New York City-based startup. They came up over and over again in my conversations as a prime example of someone solving something very useful, in a way that completely avoided the horrifically challenging parts of touching the brain. Their device was a simple wristband that reads neuromuscular signals from the wrist (via electromyography, or EMG) to control external devices.<a href="https://x.com/SussilloDavid/status/1762960425392513059"> Here&#8217;s a great video of it in action.</a></p><p>Now, if CTRL-labs was so great, what happened to their technology? They were acquired by Meta in 2019, joining Facebook Reality Labs. And if you look at the ex-CEO&#8217;s Twitter (who is now a VP at Meta), you can see that he<a href="https://x.com/rowancheung/status/1968476034518630607"> recently retweeted a September 2025 podcast with Mark Zuckerberg</a>, in which Mark says that their next generation of glasses will include an EMG band capable of allowing you to type, hands free, purely by moving your facial muscles. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2hwz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2hwz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2hwz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg" width="367" height="387.35120147874306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:1082,&quot;resizeWidth&quot;:367,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2hwz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2hwz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff40cbea5-68e0-4c22-af74-e762939c485f_1082x1142.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Not too far of a stretch to imagine that this is based on CTRL-labs work! And, by the time I finally finished this essay, <a href="https://www.meta.com/emerging-tech/emg-wearable-technology/?srsltid=AfmBOoopXuRQA2n-PgnbdIIQJfNzOq-8u4JNw0HB3zvABeQqUKFHtOBk">the device now has a dedicated Meta page!</a></p><p>What about something that exists today?</p><p>Another startup that multiple people were exuberant over was one called<a href="https://www.augmental.tech/"> Augmental</a>. Their device is something called &#8216;Mouthpad^&#8217;, and a blurb from the site best describes it:</p><blockquote><p><em>The MouthPad^ is smart mouthwear that allows you to control your phone, computer, and tablet hands-free. Perched on the roof of your mouth, the device converts subtle head and tongue gestures into seamless cursor control and clicks. It&#8217;s virtually invisible to the world &#8212; but always available to you.</em></p></blockquote><p>And here&#8217;s a wild video of a 19-year old quadriplegic using this device to interact with a computer and even <strong>code</strong>.</p><div id="youtube2-d9U8BaNx3ZM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;d9U8BaNx3ZM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/d9U8BaNx3ZM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Isn&#8217;t this insane? I remember being shocked by the<a href="https://www.nbcnews.com/tech/tech-news/neuralink-livestream-shows-paralyzed-person-playing-chess-laptop-rcna144374"> Neuralink demo videos showing paralyzed patients controlling cursors on screens</a>. But this is someone doing essentially the same thing! All by exploiting both the tongue, which happens to have an extremely high density of nerve endings and remarkably fine motor control, and our brain, which can display remarkable adaptivity to novel input/output channels.</p><p>Now, fairly enough, a device like Augmental cannot do a lot of things. For someone with complete locked-in syndrome, there really may be no alternative to inserting a wire into the brain. And in the limit case of applications that genuinely require reading (or modifying!) the <em>content</em> of thought, the periphery again won&#8217;t cut it. But for a surprising range of use cases, the peripheral route seems to offer a dramatically better risk-reward tradeoff, and it feels consistently under-appreciated when people are mentally pricing how revolutionary a new neurotech startup is.</p><h1><strong>Conclusion</strong></h1><p>This piece has been in production for the last five months and, as such, lots of discarded bits of it can be found on the cutting room floor. There are lots of other things, not mentioned in this essay, that I think are <strong>also</strong> worth really pondering, but I couldn&#8217;t come up with a big, universal statement about what the takeaway is, or the point is pretty specific to a small subset of devices. I&#8217;ve attached three such things in the footnotes.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>Before ending, I&#8217;d like to repeat the sentiment I mentioned at the start: this field is complicated. A lot of the readers of this blog come from the more cell-biology or drug-discovery side of the life-sciences field, and may naturally assume that they can safely use that mental framework to grasp the neurotech field. I once shared this optimism, but I no longer do. After finishing this essay, I now believe that the relevant constraints in this domain come from such an overwhelming number of directions that it bears little resemblance to most other questions in biology, and more-so resembles the assessment of a small nation&#8217;s chances of surviving a war. The personality required to perform such a feat matches up with the archetype of individual I&#8217;ve found to work in this field, all of whom display a startling degree of scientific omniscience that, in any other field, would be considered extraordinary, but here is equivalent to competence. It would be impossible to recreate these people&#8217;s minds in anything that isn&#8217;t a seven-hundred-page text written in ten-point font, but I hope this essay serves as a rough first approximation.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><strong>Think about how they are powering the device.</strong><a href="https://www.ncbi.nlm.nih.gov/books/NBK3932/"> Brains really, really don&#8217;t like heat</a>. The FDA limit is that an implant in or touching the brain can rise at most 1C above the surrounding tissue. So, if a device is promising to do a lot of edge compute and is even slightly invasive, it is worth being worried about this.</p><p><strong>Think about whether they are closed-loop or open-loop.</strong> An open-loop technology intervenes on the brain without taking brain state into account, like ECT or Prozac. A closed-loop device reads neural activity and adjusts its intervention in real-time. Many companies gesture toward closed-loop as a future goal without explaining how they&#8217;ll get there. You may think that this should lead one to being especially optimistic about devices that can easily handle <strong>both</strong> reading and writing at the same time, because the pathway to closed-loop is technically much cleaner. But again, how <em>much</em> does &#8216;continuous closed loop&#8217; matter, as opposed to a write-only device that is rarely calibrated via an MRI? Nobody knows!</p><p><strong>Think about how they plan to deal with the specter of China&#8217;s stranglehold on the parts they need, and their rapidly advancing neurotech industry.</strong> This is a surprisingly big problem, and while there is almost certainly plenty of material here for its own section, I ended up not feeling super confident about the takeaway message here. Free article idea for those reading!</p><p>And there&#8217;s almost certainly a lot more that I&#8217;m not even thinking about, because I&#8217;m just not aware of it. </p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[How do you use a virtual cell to do something actually useful?]]></title><description><![CDATA[5.5k words, 25 minute reading time]]></description><link>https://www.owlposting.com/p/how-do-you-use-a-virtual-cell-to</link><guid isPermaLink="false">https://www.owlposting.com/p/how-do-you-use-a-virtual-cell-to</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sat, 20 Sep 2025 12:58:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6lFR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6lFR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6lFR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6lFR!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7085002,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/173697659?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6lFR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!6lFR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F127246d3-98a9-4374-b98a-e23f9088aa26_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: I haven&#8217;t posted here in a bit, but I have been writing! Over the last month, <a href="https://www.noetik.blog/">on my companies blog</a> (you should follow if you&#8217;re interested in cancer!), I have been putting together a series of case studies as to how one can use &#8216;virtual cells&#8217; (specifically, the ones that we&#8217;ve built) for actual, grounded, real, genuine, and overwhelmingly important problems in cancer drug development. This was fun! But, more importantly, it seemed to be helpful for people who have no idea what the point of that research field is.</em> </p><p><em>Here, I present those three articles, compiled into one, for your reading pleasure.</em></p><p><em>I won&#8217;t always cross-post articles, but I am doing it here purely because it&#8217;s a set of essays I&#8217;ve wanted to write for months <strong>and</strong> I think everything in it is really cool. If you think this is all a bit much to read, I&#8217;d recommend just reading just the &#8216;<strong>virtual perturbations</strong>&#8217; section, that&#8217;s my favorite. If you&#8217;d like to chat further about this work, feel free to reach out to me! </em></p><p><em>Finally: if you&#8217;ve already seen these, apologies for the inbox intrusion. Starting next week, Owl Posting will return back to weekly essays. </em></p><div><hr></div><ol><li><p><a href="https://www.owlposting.com/i/173697659/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/173697659/identifying-anti-pd-responders">Identifying anti-PD-1 responders</a></p></li><li><p><a href="https://www.owlposting.com/i/173697659/refining-clinical-trial-eligibility-to-the-right-subgroups">Refining clinical trial eligibility to the right subgroups</a></p></li><li><p><a href="https://www.owlposting.com/i/173697659/virtual-perturbations-that-shift-t-cell-effector-state-in-humans">Virtual perturbations that shift T cell effector state in humans</a></p></li></ol><h1>Introduction</h1><p>A lot of people have been very interested in &#8216;virtual cells&#8217; lately. An exact definition is difficult to find, but one offered by a<a href="https://www.cell.com/cell/fulltext/S0092-8674(24)01332-1"> recent Cell perspective paper</a> is the following:</p><blockquote><p><em>Our view of [a virtual cell] is a learned simulator of cells and cellular systems under varying conditions and changing contexts, such as differentiation states, perturbations, disease states, stochastic fluctuations, and environmental conditions. In this context, a virtual cell should integrate broad knowledge across cell biology. [Virtual cells] must work across biological scales, over time, and across data modalities and should help reveal the programming language of cellular systems and provide an interface to use it for engineering purposes.</em></p></blockquote><p>It&#8217;s an exciting idea! A computational simulation of a cell should be, theoretically, exceedingly useful for all sorts of clinical and preclinical research, by virtue of being able to eschew expensive wet-lab efforts in favor of cheaper (and potentially more reliable) GPU time. So it is no surprise that a great deal of research is already being actively done in this area. Elliot Hershberg, a venture capitalist at <a href="https://www.amplifypartners.com/">Amplify Partners</a>, recently<a href="https://centuryofbio.com/p/virtual-cell"> compiled a small summary of ongoing work here</a>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IfMR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IfMR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 424w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 848w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IfMR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png" width="424" height="506.9868421052632" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1454,&quot;width&quot;:1216,&quot;resizeWidth&quot;:424,&quot;bytes&quot;:848975,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://noetikai.substack.com/i/168304206?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!IfMR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 424w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 848w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!IfMR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85de8f40-e1f3-4f01-a997-ed3078c97d67_1216x1454.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But as with every promised revolution in the life sciences, the revolution will hesitantly admit<strong> </strong>some nuances upon questioning.</p><p>Of highest concern is the fact that nearly all virtual cell model efforts being worked on are not virtual cells of human biology, but rather <strong>cancer cell lines</strong>, which&#8212;while convenient, well-characterized, and infinitely malleable&#8212;are far from the true physiological complexity of healthy or diseased human tissue. Due to this, figuring out how their insights extend into assisting with the drug development process is usually another hard problem in and of itself. But, to be clear, this doesn&#8217;t mean they aren&#8217;t useful.<strong> </strong>Biological research being done on cancer cell lines is a common phenomenon at the preclinical research stage, which is what nearly all virtual cell models are currently geared towards assisting<strong>.</strong></p><p>This partially answers the question why, despite how exciting &#8216;virtual cells&#8217; <strong>seem</strong>, there are very few, clear-cut examples of how such methods will be ultimately used. <strong>That vagueness is partly built into the reality of early-stage biology, so it&#8217;ll be years before the ultimate impact of this line of research is felt.</strong></p><p>But one area of virtual cells that could have a concrete value-add in the immediate short-term is the deployment of them at the <strong>clinical stage of drug development</strong>. After all, this is where the real bottlenecks lie: trials are slow, expensive, and fraught with uncertainty, and even small improvements here can ripple into huge downstream gains. Of course, while the opportunity here is massive, the downside of touching this area is that it is hard to do. Very, very hard. As a result, there is almost no virtual cell effort meant to operate at the clinical stages of drug development, even though the translation problem there is, theoretically, &#8216;<em>easy</em>&#8217;.</p><p>Other than us. <a href="https://www.noetik.ai/">Noetik</a> is building virtual cells with the explicit goal of assisting with clinical-stage problems: identifying responders to drugs and refining patient inclusion criteria for trials. At the same time, we believe that the tools we create in this process will also have powerful applications in pivotal, high-risk areas of preclinical research, such as target selection, while remaining grounded in human-level data. All three will be discussed in this essay series.</p><p>How do we do this? <strong>Our view is simple: the shortest path to usefulness is not maximal simulation on unrealistic biology, but grounded observations into realistic biology</strong>. We built that foundation first. Every datapoint that trains our virtual cell models comes from human tumor resections: 77M cells across ~2,500 patients across a dozen+ cancers, with paired spatial transcriptomics, spatial proteomics, exomes, and H&amp;E&#8217;s from each one collected in our lab. <strong>In total, this is easily one of the largest datasets of its kind.</strong> And not a single cell line. We strongly believe that this means the path from in-silico workflows to something clearly translatable is far more direct: human to human, rather than detouring through unrealistic animal or cell models.</p><p>That difference matters! In cancer, translation is the bottleneck. <strong>Drugs fail, not because they don&#8217;t work in preclinical settings, but because they don&#8217;t work in real human patients</strong>.</p><p>Using this human-derived tumor data, one of the virtual cell models we&#8217;ve created is <strong>&#8216;OCTO-VC&#8217;</strong>. This model is entirely trained on <a href="https://nanostring.com/products/cosmx-spatial-molecular-imager/single-cell-imaging-overview/">1000-plex spatial transcriptomes</a>, and its core task is deliberately prosaic: given the transcriptome of a few neighboring cells, reconstruct the &#8220;center cell&#8221; transcriptome&#8212;over every cell, in every tumor, for every patient. We released a <a href="https://www.noetik.ai/octo-vc">(very long) post late last year discussing it in depth</a> for those who are curious about the machine-learning details, <a href="https://celleporter.noetik.ai/">alongside an online demo</a>.</p><p>But what wasn&#8217;t discussed in that earlier post is how one can use models like this for clinically meaningful, non-trivial problems.</p><p>In this essay series, we hope to do exactly that, by showing three case studies of times where OCTO-VC was directly useful for our therapeutics team. </p><h2><strong>Identifying anti-PD-1 responders</strong></h2><p><strong>Therapeutic Context</strong>: </p><p>One of the most common (and effective) therapies in cancer are anti-PD-1 drugs. The underlying biology is straightforward: many tumor cells express <strong>PD-L1</strong> on their surface, which binds <strong>PD-1</strong> receptors on T cells to dampen T-cell activity. Anti-PD-1 (or anti-PD-L1) antibodies block this inhibitory interaction, allowing T cells to attack the tumor. But not all cancers rely on this pathway. Some tumors have little to no PD-L1 expression, meaning that drugs operating on that mechanism would, in principle, have limited effect. This has led to a common clinical rule of thumb: <strong>patients are considered potential candidates for anti-PD-1/PD-L1 therapy if &#8805;1% of their tumor cells express PD-L1, or are PD-L1+.</strong></p><p>But this still isn&#8217;t perfect. Even with this inclusion metric,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7192862/"> roughly half of patients still do not respond to this therapy, even if they are PD-L1+, and it is unclear why.</a></p><p><strong>Question: </strong></p><p>Can OCTO-VC improve how well we can identify responders to this anti-PD1 drugs?</p><p><strong>What we found: </strong></p><p>Seeing how well OCTO-VC can help us here is quite straightforward: create a high-dimensional embedding of each of our tumor cores that we have responder data for, and see if the embeddings of responders differ from those of non-responders.<strong> And, most importantly, is the clustering better than a good baseline, as in, the usual patient inclusion criteria?</strong></p><p>You may be instinctively surprised by the fact that OCTO-VC&#8217;s value here doesn&#8217;t come from the usual virtual cell trick of simulating perturbations, but instead from the far simpler act of representation. But this is, in fact, the most reliable way to rely on models like this; it allows our underlying, extremely rich data to &#8216;speak for itself&#8217; without needing human intervention.</p><p>Using a small cohort of patients&#8212;15 responders and 24 non-responders, with both groups meeting the &#8220;ideal candidate&#8221; criterion above, or PD-L1 tumor proportion score &#8805;1%&#8212;we generated an embedding for each of their tumors using OCTO-VC. The below graph shows the embeddings for all our core samples, reduced to two dimensions via PCA. The ones we have PD-1 responder data for are colored in either green or magenta.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DSFr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DSFr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 424w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 848w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 1272w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DSFr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png" width="615" height="356.5966386554622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:690,&quot;width&quot;:1190,&quot;resizeWidth&quot;:615,&quot;bytes&quot;:656202,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://noetikai.substack.com/i/168304206?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!DSFr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 424w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 848w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 1272w, https://substackcdn.com/image/fetch/$s_!DSFr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78372134-a7a0-4cd9-ae49-a1c7f8bb68a9_1190x690.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The responders seem to mostly be in the lower right quadrant, so there&#8217;s meaningful separation in the entirely unsupervised embedding. And, training a basic model on the PCA reduction allows us to quantify the signal, showing that predictions match up well with the response cluster, and that it is above chance. Here is the associated confusion matrix of the trained model: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7YX4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7YX4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 424w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 848w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 1272w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7YX4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png" width="393" height="316.4684210526316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:612,&quot;width&quot;:760,&quot;resizeWidth&quot;:393,&quot;bytes&quot;:51725,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://noetikai.substack.com/i/168304206?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!7YX4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 424w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 848w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 1272w, https://substackcdn.com/image/fetch/$s_!7YX4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3270cba8-1ac3-4eac-8101-37ed8e2e48b7_760x612.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Remember, we&#8217;re working off a pre-selected patient population here. If &#8220;<em>1% of tumor cells expressing PD-L1</em>&#8221; were a <strong>really</strong> good biomarker with no real room left to improve upon, <strong>we wouldn&#8217;t be able to further subdivide the likely-to-respond patient population any further.</strong> The fact that we&#8217;re able to easily spot the &#8220;response cluster&#8221; in the embedding space is encouraging to us, and implies that OCTO-VC is capturing response-relevant biology that the 1% rule misses.</p><p><strong>Future Directions:</strong> </p><p>The cancer field has been through a lot of definitions on what is the &#8216;<em>most</em>&#8217; important factor to care about regarding a tumor. At first, it was about histology. Lung cancer could be separated into small-cell, squamous, non-small-cell, and so on. Then arrived the genetic era, when <em>EGFR</em> mutations or <em>ALK</em> fusions could, by themselves, dictate treatment. Now we are in the protein marker era, with PD-L1 expression being the most commonly deployed stratifier for checkpoint blockade.</p><p>None of these were wrong, but each was only capturing a small fragment of the whole.</p><p>Tumors are not uniform entities, but rather shifting ecosystems of cells, pathways, and immune structures. Understanding this complexity, to a large extent, may be beyond human intuition or comprehension. <strong>We at Noetik strongly believe that machine intelligence is the only way to grasp the tumor microenvironment in full.</strong></p><p>The building of OCTO-VC, and the fact that anti-PD1 responders are so clearly separated in its embedding space, implies to us that this conviction is directionally accurate. Also, since the underlying data is all sourced from human tumors, we can easily pin down what other biological features predicted anti-PD-1 responders correlate with, both to reassure ourselves that they make sense and that they can be converted to usable assays. And indeed, both are present: <strong>CD8 infiltration, high interferon gamma levels, and antigen presentation markers (to name a few) align with responder status.</strong></p><p>But the real significance here is not in what we can do for anti-PD1 therapies&#8212;many people have worked on this exact subject before&#8212;but rather, <strong>in how easily our methodology can be extended to any arbitrary cancer drug</strong>. In other words, if OCTO-VC can isolate a subset of checkpoint responders from within an already enriched PD-L1+ population, then it <strong>should</strong> also be able to refine other trial cohorts. Our first <a href="https://investor.agenusbio.com/news/news-details/2025/Agenus-and-Noetik-Enter-Collaboration-to-Develop-AI-Enabled-Predictive-Biomarkers-for-BOTBAL-Using-Foundation-Models-of-Virtual-Cell-Biology/default.aspx">partnership is with Agenus</a>, a public biotechnology company, to see if our model is capable of accurately distinguishing between responders/non-responders<a href="https://www.nature.com/articles/s41591-024-03083-7"> from a recent clinical trial that Agenus ran</a>. We&#8217;re looking forward to reporting what our results are here!</p><p>Of course, one won&#8217;t always have response data. We think there are useful ways that OCTO-VC can be used in those situations as well, which is something we&#8217;ll discuss next, covering how our model can be used for expanding eligibility in clinical trial design even when lacking access to true response/non-response data.</p><h2><strong>Refining clinical trial eligibility to the right subgroups</strong></h2><p><strong>Therapeutic Context</strong>:</p><p>In the last section, we talked about how tumor complexity is beyond any human to genuinely grasp, and how we arrived at an understanding of anti&#8211;PD-1 response through model embeddings. But in cases when explicit response labels aren&#8217;t available, the challenge then shifts. We cannot ask whether responders and non-responders separate. Instead, we must reason through proxy; <strong>using machine-learned patterns that recur across tumors that closely correlate with suspected mechanisms of action (MoA) of the drug being studied.</strong></p><p>Some background information: as of today, most patient inclusion criteria in cancer clinical trials rely on disturbingly coarse, overly-reductionistic patient inclusion criteria: % of PD-L1 expression across a tumor biopsy (like the previous case study), whether the histology says a tumor is &#8220;triple-negative,&#8221; or if sequencing shows the presence of a particular mutation. But even when the cancer field explores the value of <a href="https://europepmc.org/article/MED/31942075?">more complex markers</a> &#8212; which oncologists clearly recognize as important! &#8212; the published signals are nearly always fragmentary, with a single local motif, and rarely grasp the full neighborhood or architectural context of a tumor microenvironment.</p><p>Why is this? Why aren&#8217;t markers more complex? Much of it comes down to the fact that <strong>the logistics of designing and validating even mildly complex assays are essentially intractable</strong>. Every hypothesis requires years of prospective planning, the right tissue samples, and the ability to multiplex the correct set of markers from the start. If an important signal is missed, the entire study has to be restarted. This is to say nothing of biomarkers that are ML-enabled, operating across dozens or hundreds of axes at once, which is virtually never explored.</p><p>As a consequence, trial sponsors are forced into the simplest, most reductive criteria, not because they believe those are the best biology, but because they are the only practical levers available within trial timelines.</p><p><strong>Question:</strong></p><p>One of the things we&#8217;ve been most excited about is using OCTO-VC to take previously impractical hypotheses for drug response prediction, and test them out at scale.</p><p>The question here requires some extra context and, because we&#8217;re actively exploring it, some obfuscation.</p><p>Last year, the FDA halted a late-stage cancer clinical trial run by a large biopharma, not because efficacy wasn&#8217;t observed, but because, midway through the trial, <strong>efficacy was observed only in &#8216;Subgroup Z&#8217;</strong>. As a result, this forced the biopharma to submit a protocol amendment to restrict follow-up trials to only be on the Subgroup Z cohort. This is quite a blow to them, since that cohort is a fair bit smaller!</p><p>But as is typical in cancer trials, patients wind up in Subgroup Z due to an extremely coarse biomarker. Theoretically, the drug that was part of this trial is over a well-trodden target, so there should be a much better way to separate out the ideal patient population. Unfortunately, like we mentioned, doing any sort of large-scale biomarker study would normally require an enormous multi-year biomarker program&#8212;prospectively designing assays, collecting new tissue samples, and validating them across multiple sites. That&#8217;s the standard, slow way.</p><p>With OCTO-VC, we can invert the order of operations. Instead of starting with a hypothesis, locking in the markers, and then waiting years for data to trickle back,<strong> we start with the existing atlas of human tumors and ideate on new ways to separate out responder/non-responder patients.</strong></p><p>So, our question is:<strong> can OCTO-VC come up with better stratification criteria for selecting responders?</strong></p><p><strong>What we found:</strong></p><p>First, a basic sanity check was met. Subgroup Z cohort is quite distinct in our embedding space; in the graph below, it is the small, bright yellow-green segment on the left.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wPVM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wPVM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 424w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 848w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 1272w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wPVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png" width="384" height="348.44444444444446" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2033423-b533-4886-9b0c-037f92554f83_648x588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:648,&quot;resizeWidth&quot;:384,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!wPVM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 424w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 848w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 1272w, https://substackcdn.com/image/fetch/$s_!wPVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2033423-b533-4886-9b0c-037f92554f83_648x588.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And we <strong>knew</strong> that that yellow-green segment was filled with responders to the drug. So an easy question to ask OCTO-VC is: <strong>what other, more complex marker overlaps with that segment and has a mechanistic rationale for the overlap?</strong> After some iterative searching, our therapeutics team found a strong signal: a particular &#8216;<em>tumor microenvironment concept</em>&#8217; that seems highly enriched in Subgroup Z, <strong>but also extends outside of it. </strong>While we won&#8217;t expand on what the concept is, we believe that it is unlikely to be noise given how biologically relevant it is to the therapy in question.</p><p>Here, that concept is shown in the same embedding plot through color; high meaning &#8216;highly enriched for that concept&#8217;:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y-DN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y-DN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 424w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 848w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 1272w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y-DN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png" width="410" height="350.5181347150259" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:772,&quot;resizeWidth&quot;:410,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!y-DN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 424w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 848w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 1272w, https://substackcdn.com/image/fetch/$s_!y-DN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9edfe355-42cb-44d7-9a05-d87720bdde8b_772x660.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Circled is the true-response cohort, which is high in that concept. But you can notice, slicing through the region outside of Subgroup Z, <strong>another large pocket of people high in that concept.</strong></p><p>In other words, we believe this biopharma has set overly conservative inclusion criteria. By doing so, they not only leave billions of dollars in potential revenue untapped but, more importantly, will leave an immense number of patients without access to a therapy that has a clear mechanistic reason for meaningfully improving, or even saving, their lives.</p><p><strong>Future Directions:</strong></p><p>One striking aspect of the OCTO-VC&#8217;s embedding space&#8212;something that continues to surprise even us&#8212;is how clearly it aligns with therapeutic problems, despite having had absolutely no access to perturbational or labeled data. After all, OCTO-VC could separate higher-order cancer definitions (e.g., Subgroup Z vs. not-Subgroup-Z) directly from human tissue, and, with some human judgement, was able to surface MoA-relevant subtypes within them; ones that would be far too costly to ever pull out in the real world. And this phenomenon of &#8216;<em>clinically meaningful organization</em>&#8217; seems to reoccur across the embedding space!</p><p>As an example, basic Leiden clustering of an OCTO-VC embedding space (the same one discussed in this section, but different from the previous sections PD-1 embedding space) demonstrates tissue-level characteristics that align with therapeutic MoA&#8217;s. Annotations here are provided by humans:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7hew!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7hew!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 424w, https://substackcdn.com/image/fetch/$s_!7hew!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 848w, https://substackcdn.com/image/fetch/$s_!7hew!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 1272w, https://substackcdn.com/image/fetch/$s_!7hew!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7hew!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png" width="1456" height="832" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!7hew!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 424w, https://substackcdn.com/image/fetch/$s_!7hew!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 848w, https://substackcdn.com/image/fetch/$s_!7hew!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 1272w, https://substackcdn.com/image/fetch/$s_!7hew!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1556ae69-e205-4b3f-9ee6-28ed3cd9631a_1600x914.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How is this possible? How can the model, without explicit supervision, uncover patterns that map so directly onto biological mechanisms and therapeutic relevance?</p><p>One argument is that cancer is a particularly special disease, extremely well-suited to self-supervision tasks. Unlike many other therapeutic areas, oncology has historically been driven by mechanism-based stratification; cancer drugs are often developed and approved not for a broad, undifferentiated population, but for genetically or phenotypically defined subgroups. As a result, <strong>the very axes that determine drug response are the same ones that structure our human, tissue-level data. </strong>And machine learning is very, very good at dissolving complex, high-dimensional data into those underlying axes.</p><p>Of course, turning this work, and others like it, from an analysis into an actual regulatory argument is often another challenge in of itself for many virtual cell efforts. A model can suggest that patients with a certain microenvironmental signature are likely to respond, but to satisfy regulators, that suggestion has to be translated into a practical assay. But this is the core of what makes &#8220;virtual cells&#8221; particularly useful if they are derived from human data, and not cancer cell lines: <strong>this translation is straightforward.</strong></p><p>After all, the signatures that OCTO-VC surfaces always have a direct connection to real human tumors. The signatures are often intricate, something that would require years of effort and millions of dollars to define through traditional approaches, but still <strong>can</strong> be boiled down to a set of measurable markers, morphologies, or local interactions if needed. As an example, the tumor microenvironment concept we discussed above is something that is very amenable to being turned into an assay.</p><p>We strongly believe that this ability to create these complex definitions of<strong> </strong>responder cohort&#8212;given only hours of GPU time&#8212;can not only expand patient cohorts (as we&#8217;ve discussed here), but also <strong>rescue otherwise unpromising drugs and open entirely new therapeutic opportunities that were previously invisible under traditional stratification methods.</strong></p><p>Finally, in the next section, we&#8217;ll discuss how even though OCTO-VC is most useful for clinical-stage problems, like patient selection, the same human-tissue-grounded programs also help <strong>prioritize targets</strong>, without changing the core principle: grounding every insight in real human tissue.</p><h1><strong>Virtual perturbations that shift T cell effector state in humans</strong></h1><p><strong>Therapeutic Context</strong>:</p><p>Two particularly common lung cancer mutations you&#8217;ll often see people discussing are <em><a href="https://pubmed.ncbi.nlm.nih.gov/29773717/">KRAS</a></em><a href="https://pubmed.ncbi.nlm.nih.gov/29773717/"> and</a><em><a href="https://pubmed.ncbi.nlm.nih.gov/29773717/"> STK11</a></em><a href="https://pubmed.ncbi.nlm.nih.gov/29773717/">.</a> <em>KRAS</em> is one of the most frequent oncogenic drivers (i.e. causes the cancer in the first place), whereas <em>STK11</em> is a tumor suppressor gene whose inactivation disrupts cellular metabolism and immune signaling. And, while <em>KRAS</em>-mutant tumors are quite common, <em>STK11</em> shows up alone less frequently, more-so appearing alongside <em>KRAS</em>.</p><p>Tumors with this genetic combination are often referred to, unsurprisingly, as &#8216;<em>KRAS STK11</em>&#8217;. And, when the two mutations do appear together, the combination produces a particularly aggressive biology: <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10304809/">tumors that are metabolically rewired, immunologically &#8220;cold,&#8221; and broadly resistant to both standard chemotherapies and immune checkpoint blockade</a>. As expected, the clinical data consistently show the impact of this on patients: <a href="https://pubmed.ncbi.nlm.nih.gov/38428265/">significantly shorter lifespans</a>.</p><p>As of today, there are no approved therapies that directly address the <em>KRAS STK11</em> genotype. Patients are typically treated with the same immunotherapy regimens offered to the broader non-small cell lung cancer population: immune checkpoint blockades. While this often <a href="https://aacrjournals.org/clincancerres/article-abstract/27/8/2209/671912/Treatment-Outcomes-and-Clinical-Characteristics-of">works fine in </a><em><a href="https://aacrjournals.org/clincancerres/article-abstract/27/8/2209/671912/Treatment-Outcomes-and-Clinical-Characteristics-of">KRAS</a></em><a href="https://aacrjournals.org/clincancerres/article-abstract/27/8/2209/671912/Treatment-Outcomes-and-Clinical-Characteristics-of"> patients</a>, the <a href="https://pubmed.ncbi.nlm.nih.gov/39545922/">efficacy of this class of drugs is far worse for the </a><em><a href="https://pubmed.ncbi.nlm.nih.gov/39545922/">KRAS STK11 </a></em><a href="https://pubmed.ncbi.nlm.nih.gov/39545922/">patients</a>. And, given that the latter group isn&#8217;t particularly rare, millions of patients are likely underserved.</p><p><strong>Question:</strong></p><p>Which therapeutic targets, if targeted, would help cancer patients with <em>KRAS STK11</em> mutations?</p><p><strong>What we found:</strong></p><p>Well, perhaps we should first ask a simpler question<strong>: </strong>what exactly is the fundamental difference between <em>KRAS</em> and <em>KRAS STK11</em> patients in cell-types most relevant to immunotherapy? <em>KRAS</em> patients, after all, respond well to immunotherapy, so they could be considered a model population for understanding what &#8220;good&#8221; looks like in terms of immune biology. Afterwards, we can move onto assessing what targets are most relevant to shifting <em>KRAS STK11</em> tumors to have <strong>that</strong> particular phenotype.</p><p>For both of these, we leaned heavily on OCTO-VC&#8217;s ability to simulate cellular states.</p><p>First, to assess differences between the two population genotypes, we set up a &#8216;<em>virtual CD8&#8314; T cell simulation</em>&#8217;. Here, we asked OCTO-VC to predict the &#8220;expected&#8221;, or virtual, CD8&#8314; T cell in the genetic and microenvironmental context of each patient's tumor. And what we found is that one of the strongest differences in gene expression between <em>KRAS </em>and <em>KRAS STK11</em> patients were a class of genes called granzymes, specifically GZMA and GZMK, which are known to be a <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8836949/">practical readout of &#8216;</a><em><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8836949/">CD8&#8314; T-cell effector function</a></em><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8836949/">&#8217;</a>, the capacity for a T cell to kill cancer via cytotoxic mechanisms.</p><p>In the below plot, Gene A is <em>GZMA</em> and Gene B is <em>GZMK</em>. We&#8217;ll discuss in the next section why we believe these virtual cell predictions are a much better way to assess patient-level differences compared to the raw transcript values, but for now, we&#8217;ll move on.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cQFD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cQFD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 424w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 848w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 1272w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cQFD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png" width="378" height="405.40785498489424" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:710,&quot;width&quot;:662,&quot;resizeWidth&quot;:378,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!cQFD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 424w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 848w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 1272w, https://substackcdn.com/image/fetch/$s_!cQFD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e1f61f5-ef14-4da3-ab36-6bc74676f1c9_662x710.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Step one completed, we&#8217;ve identified a therapeutically relevant difference between the two genotypes. Importantly, the marker does meet some sanity checks too. Granzyme expression has shown strong <a href="https://pubmed.ncbi.nlm.nih.gov/27757299/">associations with response to PD-1/PD-L1 therapy</a> in human tissue, clearly indicating that it is clinically meaningful for immunotherapy. So, one particular axis of improving the prospects of <em>KRAS STK11</em> patients could be to simply find some way to increase granzyme levels.</p><p><strong>But understanding the best ways to do this has been far from straightforward</strong>. <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7610876/">Cytokine stimulation</a> or <a href="https://www.nature.com/articles/s43018-024-00870-6?">blocking checkpoint molecules like TIGIT</a> have all been shown in preclinical animal models to boost granzyme expression. Yet the current translational record is mixed: <a href="https://www.roche.com/media/releases/med-cor-2024-11-26">interventions that should theoretically raise granzyme levels often fail to yield durable tumor clearance</a> in human clinical trials.</p><p>What&#8217;s going on here? Are granzymes the wrong lever to pull?</p><p>Perhaps, but there&#8217;s some reason to believe that some of the previous attempts to increase granzymes (in humans) did not, in fact, actually increase granzymes. After all, <a href="https://www.nature.com/articles/s41586-024-07121-9.pdf?">the molecular impact of at least one of those attempts seems to rely on entirely different mechanisms of action</a>, ones that, empirically, ended up having no real patient benefit. The fundamental problem here may not be that granzymes aren&#8217;t worth modulating in humans, but rather,<strong> the targets that modulate them depend on the species.</strong> In other words, if you study mice only, you&#8217;re going to arrive at the wrong target.</p><p>After all, <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC2064598/">the structures of granzymes substantially differ between humans and mice</a>. Broader than this is that the fact that immunity is a very, very species-specific topic. Consider inflammation, a close relative to our subject, and what a <a href="https://www.pnas.org/doi/10.1073/pnas.1222878110">2013 PNAS paper has to say about the role of mouse studies here</a> (bolding added by me):</p><blockquote><p><em>Murine models have been extensively used in recent decades to identify and test drug candidates for subsequent human trials. However, few of these human trials have shown success. The success rate is even worse for those trials in the field of inflammation, a condition present in many human diseases. To date, there have been nearly 150 clinical trials testing candidate agents intended to block the inflammatory response in critically ill patients, <strong>and every one of these trials failed.</strong></em></p></blockquote><p>All this to say: if we want to modulate granzymes, and come to useful conclusions about how to do so, we should work directly with human data. One way to do this (perhaps the only way to do it!) is to rely on OCTO-VC&#8217;s ability to perform virtual perturbations in real human data.</p><p>With the same computational framework as before&#8212;asking the model to predict virtual CD8&#8314; T cells in a specific tumor microenvironment&#8212;we added one more step: knocking out a single gene across the tumor. <strong>From there, we ask how the virtual CD8&#8314;</strong> <strong>T cell&#8217;s transcriptome would shift in response to that, comparing it to the baseline expected transcriptome of that cell type. </strong>We can do this systematically across thousands of genes to run a virtual screen. The knockout serves as a proxy for a drug, and the predicted impact on the virtual CD8&#8314; T cell serves as a proxy for patient response. Of course, this impact is not at all guaranteed to be causal, merely strongly correlated and conditional on the spatial environment, but it can lead to useful hints.</p><p>We did exactly this perturbation across our <em>KRAS STK11</em> patient cohort, searching for targets that consistently increased one of the granzymes, <em>GZMA,</em> expression in CD8&#8314; T cells in real tumors. The virtual screen produced a clear signal: the top-scoring hit (Gene 20) was an adhesion protein, which we&#8217;ll call Target A.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GGlm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GGlm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 424w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 848w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 1272w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GGlm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png" width="505" height="351.30434782608694" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1150,&quot;resizeWidth&quot;:505,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GGlm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 424w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 848w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 1272w, https://substackcdn.com/image/fetch/$s_!GGlm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81a2b91b-015c-4658-9018-ce5fb218364d_1150x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Target A is particularly intriguing because a study published only a few years ago showed that inhibiting this target (in co-cultured human tumors with T cells) <strong>leads to increased T cell expression of a granzyme</strong>. One nuance is that that papers granzyme studied GZMB, not GZMA, but the two can be quite correlated. But most compelling of all, beyond in-vitro results, is that there are two human cancer trials that have tested drugs meant to inhibit Target A!</p><p>How have these trials gone? It&#8217;s a mixed bag: patients responded decently in one trial, but not in the other. But both of them are using the exact same inclusion criteria: elevated levels of Target A. <strong>We strongly believe that this may have hurt both of the trial readouts.</strong></p><p>Remember, inhibiting Target A in <em>KRAS</em> tumors is unlikely to yield immense benefits, since we suspect the primary mechanism-of-action of Target A is in in increasing granzyme activity, <strong>and those tumors already harbor abundant granzyme activity</strong>. In contrast, <em>KRAS STK11</em> tumors, which have depressed granzyme levels, stand to gain the most from Target A inhibition<strong>.</strong> So, by enrolling patients purely on the basis of &#8216;high Target A expression&#8217;, the trials were almost certainly accidentally enriched for <em>KRAS</em> patients&#8212;by virtue of <em><a href="https://pubmed.ncbi.nlm.nih.gov/33777798/">KRAS</a></em><a href="https://pubmed.ncbi.nlm.nih.gov/33777798/"> being found in 30% of all cancers,</a> while <em><a href="https://www.lung.org/lung-health-diseases/lung-disease-lookup/lung-cancer/symptoms-diagnosis/biomarker-testing/stk11">KRAS STK11 </a></em><a href="https://www.lung.org/lung-health-diseases/lung-disease-lookup/lung-cancer/symptoms-diagnosis/biomarker-testing/stk11">are found in 10% of all cancers</a>&#8212; inadvertently selecting a patient demographic least likely to respond to the drug.</p><p><strong>Both of the trials, in other words, potentially stacked the deck against themselves.</strong> The correct strategy would have been to include <em>KRAS STK11</em> status in the inclusion criteria, thereby focusing on patients with the greatest mechanistic rationale for benefit. But the trials did not do this and, as a result, the final efficacy readouts of the drug may be worse than it could&#8217;ve been.</p><p><strong>Future Directions:</strong></p><p>In one fell swoop, our virtual cell model uncovered not only a therapeutically relevant target, but also inclusion criteria on what patients it is most relevant for. Though there are already ongoing trials for this particular target, we strongly believe that the correct inclusion criteria for it are not being used.</p><p>Is there a principled way this could&#8217;ve been done without OCTO-VC?</p><p>For finding the granzyme difference between the two genotypes, theoretically yes, but practically no.</p><p>For finding target A, neither practically nor theoretically.</p><p>One, on the granzyme difference: though granzymes are known to be markers of CD8&#8314; T cell effector function, their modulation in genetic subcontexts like <em>KRAS STK11</em> has not, as far as we can tell, been systematically mapped. <strong>But even if you had collected the same spatial transcriptomics dataset we had, discovering this relationship without OCTO-VC would&#8217;ve been challenging. </strong>Why? Because raw transcript values are, generally speaking, untrustworthy. To assess the transcriptional differences between two cohorts based on raw genes, you would need CD8&#8314; T cells to actually be present in sufficient numbers within the tumor microenvironment <strong>and</strong> ensure that those cells were captured with sufficiently-high resolution.</p><p>This is rarely the case! In most of our samples, even correctly tagging a cell as being a CD8&#8314; T cell is difficult, to say nothing of their transcripts, which are often sparse, heterogeneous, and noisy, making it difficult to detect consistent patterns. Virtual cells, produced by OCTO-VC, solve this bottleneck by being able to reconstruct what a CD8&#8314; T cell state <strong>would</strong> look like in that genetic and microenvironmental context; conditioned on the spatial transcriptomic environments the model has observed across millions of cells.</p><p>And two, on finding target A: even if you could extract a clean signal from the raw data, discovering targets that modulate the granzyme phenotype further would be largely intractable. The typical way people would study this further is via animal studies, and, as we&#8217;ve mentioned, there is a massive gulf between what mice immune systems tell you and what human immune systems tell you. The only way to reliably explore the area is via screening targets in a human, <em>in vivo</em> context, which necessitates the usage of virtual cell models like OCTO-VC to do it at any reasonable scale.</p><p>And though Target A was discovered without OCTO-VC, its discovery relied on cell culture data. The results of this coincidentally translated to humans, but, given how often cancer drugs fail, it&#8217;s a <strong>very</strong> expensive coin flip to make and not something we consider particularly principled.</p><p>These results are, to put it lightly, exciting. The history of cancer drug development has shown us time and time again that <em>translation is the bottleneck. </em>The problem has always been that what works in a mouse, or in a dish, rarely works in a patient. That&#8217;s it. <strong>Fixing this is how we make a dent in stopping the millions of lives that are lost to cancer every year</strong>. And we fix it by not being able to predict the results of a functional assay, or cell-line experiment, or mouse experiment. We fix it by trying to predict what happens when <em>a human being with a real tumor gets treated</em>. That&#8217;s the only question that matters. Everything else is a proxy, a bad proxy, <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5852245/">one that has led to 90%+ of all cancer drugs failing during clinical trials.</a></p><p>We are not the first ones to claim that predictions like that are possible, but we believe that we are one of the first to show concrete evidence of <strong>it</strong> <strong>actually being done</strong>. And remember, the results we have today are the worst ones we&#8217;ll ever have. Each day, the practical utility of the model that fueled these results gets better and better, both as its underlying dataset grows richer and our understanding of how to best deploy it is refined.</p><p>The trajectory to us feels obvious; in time, models like OCTO-VC will become routine parts of how oncology as a field functions. In such a world, patients don&#8217;t waste precious time on ineffective treatments, entirely new targets that once seemed unworkable become viable options, and trial populations are enriched for the responders who stand to benefit the most. We have strong conviction that not only is this world possible, but that it is already beginning to emerge.</p><p>If any of this seems interesting, please reach out to <a href="mailto:info@noetik.ai">info@noetik.ai</a> or me directly to chat further.</p><p>And, if you&#8217;re curious to understand more ML-specific technical details about how the virtual CD8&#8314; T cell&#8217;s actually work, <a href="https://www.noetik.ai/octo-vc">we have an older post that discusses exactly that.</a></p><p>Thank you for reading!</p><div><hr></div><p>If you&#8217;d like to cite this, please use the following:</p><pre><code><em>@online{Mahajan2025VirtualCell,
  author       = {Abhishaike Mahajan},
  title        = {How do you use a virtual cell to do something actually useful?},
  year         = {2025},
  month        = sep # "." # 20,
  url          = {https://www.owlposting.com/p/how-do-you-use-a-virtual-cell-to},
}</em></code></pre>]]></content:encoded></item><item><title><![CDATA[Endometriosis is an incredibly interesting disease]]></title><description><![CDATA[5k words, 23 minutes reading time]]></description><link>https://www.owlposting.com/p/endometriosis-is-an-incredibly-interesting</link><guid isPermaLink="false">https://www.owlposting.com/p/endometriosis-is-an-incredibly-interesting</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Fri, 13 Jun 2025 21:31:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!saiu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Some notes: I will be in SF next week, and <a href="https://lu.ma/bvqh80or">am co-hosting this event</a> with the wonderful <a href="https://www.convoke.bio/">convoke.bio</a> from 6:30pm-8:30pm on June 24th. Location TBD, but it will be in SF! You should come! Also, I very bravely chatted with <a href="https://www.youtube.com/watch?v=S2AQ1mFhT24">Endpoints News a few days back about AI in the life-sciences</a>. Finally, extremely grateful to <a href="https://x.com/shilpap_">Shilpa Pothapragada</a> for both inspiring + reviewing this essay. </em></p><p><em>Edit on June 20th 2025: Lots of interesting perspectives/corrections have been given in response to this article, so I decided to <a href="https://www.owlposting.com/p/comments-on-endometriosis-is-an-incredibly">wrap up the most interesting 8 of them here.</a></em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!saiu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!saiu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!saiu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!saiu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!saiu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!saiu!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:9449838,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/161616300?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!saiu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!saiu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!saiu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!saiu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc418668-9998-48c8-865d-c9f01aa84f6b_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.owlposting.com/i/161616300/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/161616300/why-is-endometriosis-interesting">Why is endometriosis interesting?</a></p><ol><li><p><a href="https://www.owlposting.com/i/161616300/the-primary-hypothesis-of-why-it-exists-is-not-complete">The primary hypothesis of why it exists is not complete</a></p></li><li><p><a href="https://www.owlposting.com/i/161616300/it-is-nearly-equivalent-to-cancer">It is nearly equivalent to cancer</a></p></li><li><p><a href="https://www.owlposting.com/i/161616300/there-is-no-real-cure">There is no (real) cure to it</a></p></li><li><p><a href="https://www.owlposting.com/i/161616300/there-are-few-diseases-on-earth-as-widespread-and-underfunded-as-it-is">There are few diseases on Earth as widespread and underfunded as it is</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/161616300/conclusion">Conclusion</a></p></li></ol><h1><strong>Introduction</strong></h1><p>There are several diseases that are canonically recognized as &#8216;<em>interesting</em>&#8217;, even by laymen. Whether that is in their mechanism of action, their impact on the patient, or something else entirely. It&#8217;s hard to tell <em>exactly</em> what makes a medical condition interesting, it&#8217;s a you-know-it-when-you-see-it sort of thing.</p><p>One such example is measles. Measles is an unremarkable disease based solely on its clinical progression: fever, malaise, coughing, and a relatively low death rate of 0.2%~. What <strong>is</strong> astonishing about the disease is its capacity to infect cells of the adaptive immune system (memory B&#8209; and T-cells). This means that if you do end up surviving measles, <strong>you are left with an immune system not dissimilar to one of a just-born infant, </strong>entirely naive to polio, diphtheria, pertussis, and every single other infection you received protection against either via vaccines or natural infection. It can take up to 3 years for one's &#8216;immune memory&#8217; to return, prior to which you are entirely immunocompromised.</p><p>There&#8217;s a wide range of such diseases, each one their own unique horror. Others include rabies (t<a href="https://pubmed.ncbi.nlm.nih.gov/21601048/">rans-synaptic transmission</a>), ebola (<a href="https://mednorthwest.com/ebola-virus-when-bad-things-happen-to-good-blood-vessels/">causes your blood vessels to become porous</a>), tetanus (<a href="https://web.archive.org/web/20150212225108/http://www.cdc.gov/tetanus/about/symptoms-complications.html">causes muscle contractions so strong that they can break bones</a>), and so on.</p><p>Very few people would instinctively pigeonhole endometriosis as something similarly <em>physiologically</em> interesting, or at least I wouldn&#8217;t have. But via a mutual friend, I recently had a chat with<a href="https://x.com/shilpap_"> Shilpa Pothapragada</a>, a Schmidt Fellow studying at the Wyss Institute at Harvard. She studies better ways to diagnose endometriosis, and, as a result of the fascinating conversation, I now consider the disease one of the strangest conditions I&#8217;ve ever heard of. </p><p>Honestly, prior to my discussion with Shilpa, I didn&#8217;t even know what endometriosis even <strong>was, </strong>only that it was painful to have and affects women. To judge whether I was simply deeply ignorant, or the disease genuinely didn&#8217;t have much mindshare, I took an informal poll amongst a dozen friends outside of the life-sciences. Even amongst cisgender women (!), knowledge of what endometriosis was <strong>astonishingly</strong> sparse &#8212; most people could only say something like &#8216;<em>that&#8217;s a uterus condition, right?</em>&#8217;, and a sum total of zero people actually knew what the disease entailed. </p><p>So I decided to write this essay in an attempt to fix that knowledge gap amongst the small population of people who follow me. </p><h1>Why is endometriosis interesting?</h1><p>But, before we get to my points: what actually <strong>is</strong> the clinical definition of endometriosis?</p><p>Plainly put, it is when tissue that resembles the uterine lining, or endometrial-like tissue, grows outside the uterus. The tissue can implant itself in nearby tissues, like the ovaries and fallopian tubes, or even more distal organs like the bladder and bowel. Due to the continuous influx of hormonal growth factors (mainly estrogen), these misplaced endometrial-like cells respond cyclically, just as the normal uterine lining does. They thicken, break down, and bleed with each menstrual cycle, but unlike the uterine lining, <strong>they have nowhere to go.</strong></p><p>Instead of exiting through menstruation, this trapped tissue and blood accumulates, causing severe pain, inflammation, fibrosis (scar formation), and adhesion between organs. Over time, these repeated cycles of inflammation and fibrosis may lead to permanent structural changes within the abdomen and pelvis, contributing to chronic pelvic pain and infertility.</p><p>To segue into the first interesting aspect of the disease, how did the tissue <strong>get</strong> there in the first place? What caused it to be trapped? Well, it&#8217;s a curious question, because&#8230;</p><h2><strong>The primary hypothesis of why it exists is not complete</strong></h2><p><a href="https://my.clevelandclinic.org/health/diseases/24432-retrograde-menstruation">Retrograde menstruation</a> is perhaps the most culturally dominant theory as to why endometriosis occurs at all, first proposed by gynecologist John Sampson nearly 100 years ago. The theory is straightforward: during menstruation, some sloughed-off endometrial cells flow backward through the fallopian tubes into the pelvic cavity instead of outward through the cervix. Once there, these cells implant themselves, continue growing, and become the endometrial-like tissue characteristic of endometriosis.</p><p>It&#8217;s a clean and simple idea, one that is repeated by gynecologists constantly. I don&#8217;t have a great mental model for what this looks like, so I found the following picture helpful:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-9cb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-9cb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 424w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 848w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 1272w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-9cb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Retrograde Menstruation: Symptoms, Causes, Diagnosis&quot;,&quot;title&quot;:&quot;Retrograde Menstruation: Symptoms, Causes, Diagnosis&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Retrograde Menstruation: Symptoms, Causes, Diagnosis" title="Retrograde Menstruation: Symptoms, Causes, Diagnosis" srcset="https://substackcdn.com/image/fetch/$s_!-9cb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 424w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 848w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 1272w, https://substackcdn.com/image/fetch/$s_!-9cb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ba5b75f-c2ae-405f-8e59-0dcf31682cab_1456x971.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.verywellhealth.com/retrograde-menstruation-overview-4685604">From here.</a></figcaption></figure></div><p>If you look this theory up on endometriosis forums, most patients generally consider it to be unilaterally false. But there is some decent evidence for it being at least <strong>partially</strong> explanatory. A strong piece of proof for it is that women with obstructive M&#252;llerian anomalies, or uterine deformations that can lead to more retrograde menstruation,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10968216/#:~:text=One%20risk%20factor%20for%20endometriosis,25%20years%20old%20and%20over."> have a higher risk of endometriosis</a> compared to women with no abnormalities.</p><p>But, despite how often the theory is repeated amongst doctors, it cannot explain <strong>all</strong> endometriosis cases. Why not?</p><p>For one,<a href="https://academic.oup.com/humupd/article-abstract/8/1/84/624513"> retrograde menstruation occurs in between 75-90% of women</a>, most of whom never go on to develop endometriosis. Keep in mind that<a href="https://radiopaedia.org/articles/mullerian-duct-anomalies"> obstructive M&#252;llerian anomalies occur in 1-5% of the population</a>, and<a href="https://www.who.int/news-room/fact-sheets/detail/endometriosis"> endometriosis rates are about 10%</a> of the reproductive age women. And endometriosis itself is almost<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8517707/"> certainly under diagnosed</a>! We&#8217;ll discuss the insanity of this 10% number later, but, suffice to say, while retrograde menstruation may play <strong>some</strong> role in the development of the disease, it cannot cover the full scope of cases. To be fair, M&#252;llerian anomalies are<a href="https://www.sciencedirect.com/science/article/abs/pii/S1553465024013724#:~:text=Patients%20with%20mullerian%20anomalies%20are,well%2Destablished%20societal%20classification%20systems."> </a><strong><a href="https://www.sciencedirect.com/science/article/abs/pii/S1553465024013724#:~:text=Patients%20with%20mullerian%20anomalies%20are,well%2Destablished%20societal%20classification%20systems.">also</a></strong><a href="https://www.sciencedirect.com/science/article/abs/pii/S1553465024013724#:~:text=Patients%20with%20mullerian%20anomalies%20are,well%2Destablished%20societal%20classification%20systems."> underdiagnosed</a>, but it feels unlikely to make up the gap.</p><p>Two, endometriosis comes in multiple forms, some of which stay localized to the pelvic region, yes, but endometriosis can occur in <strong>absurdly</strong> distal regions as well. Where else? <strong>Literally everywhere.</strong> Type &#8216;[an organ system] + endometriosis&#8217; into Google and you&#8217;ll find at least one case report of it happening there. The<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6397811/"> gastrointestinal tract</a>, the<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10336989/"> lungs</a>, and, insanely enough,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9738496/"> the brain as well.</a> How could backwards flow of uterine blood explain that?</p><p>And three, perhaps most damning of all<strong>, is that endometriosis has been found in people who have never menstruated at all: </strong>such as <a href="https://www.sciencedirect.com/science/article/pii/S0015028204030158">premenarchal girls between the ages of 8.5-13,  </a><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4293785/">women who genetically lack a uterus</a>, and even <strong>cisgender men</strong>.<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5833878/"> This last bit is a particularly rare phenomenon, with only 16 cases reported in the literature circa 2018</a>, but it conclusively <strong>exists. </strong>One interesting note: all 16 of the male endometriosis patients likely had increased estrogen levels, either due to liver cirrhosis (which leads to a decreased ability to break down estrogen), high-dose estrogen therapy for prostate cancer, or obesity. You may instinctively wonder: does endometriosis also occur in transgender women on hormone replacement therapy? Unfortunately, I was unable to find any evidence of this, but I chalk it up to the relatively small number of people in this demographic having a similarly rare condition.</p><p>So what <strong>does</strong> cause endometriosis? Well, we&#8217;ll need a theory that deals with multiple issues at once:</p><ol><li><p>Accounts for endometriosis occurring in regions far from the pelvic region.</p><ol><li><p>Rules out retrograde menstruation hypothesis.</p></li></ol></li><li><p>Accounts for endometriosis being a<a href="https://www.nature.com/articles/s41588-023-01323-z"> </a><strong><a href="https://www.nature.com/articles/s41588-023-01323-z">very</a></strong><a href="https://www.nature.com/articles/s41588-023-01323-z"> heritable disease.</a></p><ol><li><p>Rules out purely environmental explanations (e.g. greater air pollution, microplastics, etc).</p></li></ol></li><li><p>Accounts for endometriosis occurring, albeit very rarely, in non-menstruating individuals.</p><ol><li><p>Rules out any theory that requires the endometrial lining to exist at all.</p></li></ol></li></ol><p>It&#8217;s a tough set of criteria. And a lot of theories have spawned to explain it.</p><p>There&#8217;s the<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10001466/"> </a><strong><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10001466/">embryonic rest theory</a></strong>, which blames the condition on pockets of endometrial cells that never migrated properly during embryogenesis and instead remained dormant in various tissues, only to be activated later by hormones like estrogen. This explains the disease occurring in cisgender men (as everyone starts off with progenitor cells capable of differentiation into endometrial cells) and cisgender women who theoretically should lack a shedding endometrial layer. But, unfortunately, it fails to account for why <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4303753/">90% of all clinically visible endometriosis lesions still cluster on the pelvic regions/ovaries</a> rather than turning up at random sites, and why<a href="https://academic.oup.com/humupd/article/24/3/290/4859612?"> onset tracks so tightly with the start of the menstrual cycle and mysteriously improves during pregnancy.</a></p><p>A cousin to this is the<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10001466/"> </a><strong><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10001466/">coelomic metaplasia theory</a></strong>, which asserts that the coelomic epithelium (the layer of cells that lines the surfaces of all abdominal organs) retains the plasticity to transform into endometrial-like tissue under specific stimuli, such as hormonal signals, inflammation, or genetic predisposition. But this too suffers from similar issues; if this transformation can happen anywhere there is coelomic epithelium, why is it almost always clustered in the same anatomical zones, and how can it appear elsewhere?</p><p>And there are so many more theories beyond this. Still others blame it on<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9265783/"> immune dysfunction</a>, entirely<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5555376/"> somatic mutations</a>, and even<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5902457/"> bacterial contamination</a>.</p><p>So&#8230;which one is true? Unfortunately, as is often the case with biology, the answer may very well be &#8216;<em>a combination of all of them</em>&#8217;. As you go from papers from the 2010&#8217;s to the 2020&#8217;s, there is increasingly more and more hesitance in ascribing a single cause to the condition. <strong>Instead, it is likely that endometriosis itself results from a general process of heterogenous events. </strong>I&#8217;ll give the general take I&#8217;m seeing people swirl around. There isn&#8217;t a great paper covering the following few paragraphs, but rest assured that it isn&#8217;t mine, but rather a synthesis of multiple review papers I&#8217;ve gone through.</p><p><strong>First, there must be a seed: a founding cell with latent endometrial potential.</strong> This can be embryonic stem cells that never completed M&#252;llerian migration or circulating multipotent stem cells (which accounts for cis men/non-menstruating women cases), but, more often than not, it will be likely be<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC9856091/"> endometrial stem cells found in menstruation blood</a> (accounts for cis women).</p><p><strong>Next, the seed must reach the correct &#8216;soil&#8217; for growth to occur.</strong> Retrograde menstruation would deposit endometrial stem cells in the ideal place: the pelvic peritoneum, which is bathed in the estrogen-laden menstrual fluid that pushed the seed there in the first place. <strong>That&#8217;s right, we&#8217;re back to the retrograde menstruation theory!</strong> But unlike that theory, embryonic stem cells or circulating multipotent stem cells<strong> also</strong> have the opportunity to result in endometriosis, but they only develop into lesions under unusual conditions; such as<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7542014"> hormonal therapy or chronic inflammation (e.g. cesarean scars).</a> This explains a <strong>lot</strong> of things! The rarity and the distribution of extra-pelvic endometriosis, why increased rates of retrograde menstruation lead to a higher risk for endometriosis, and <strong>why</strong> high-estrogenic conditions usually co-occur alongside endometriosis in people who theoretically shouldn&#8217;t develop the disease.</p><p><strong>Finally, the seed must survive, adapting its local environment to suit its needs.</strong> There is evidence that<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC4499658/"> endometriosis lesions can secrete immunomodulatory factors that help them evade immune clearance</a>,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3677669/"> release angiogenesis factors that promote blood supply to them</a>, and<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10138736/"> tamp down its responsiveness to progesterone levels as to prevent natural hormonal suppression of growth</a>. How does it do all of these? Simple: the acquisition of<a href="https://portlandpress.com/clinsci/article-abstract/126/2/123/69231/Genetic-epigenetic-and-stem-cell-alterations-in?redirectedFrom=fulltext"> somatic mutations and epigenetic changes that reprogram the lesion&#8217;s cellular behavior.</a> Which explains why people have observed so many genetic anomalies in endometriosis lesions, and also why simple retrograde menstruation isn&#8217;t alone enough to cause endometriosis.  </p><p>And yet, we still haven&#8217;t explained everything about the origins of endometriosis. What causes circulating/latent stem cells to transform into endometrial-like cells? Why does <a href="https://obgyn.onlinelibrary.wiley.com/doi/full/10.1111/aogs.14491">spontaneous regression of endometriosis sometimes occur</a>? Why do some endometriosis lesions remain stable for years? Why don&#8217;t all genetically or hormonally predisposed people develop it?</p><p>All unclear! Still much work left to do to account for all of these.</p><p>And, you know, upon reading about the above pathogenesis of endometriosis, one may immediately remark on how similar it feels to another condition&#8230;</p><h2><strong>It is nearly equivalent to cancer</strong></h2><p>Seeds? Somatic mutations? Spreading? Spontaneous start and stop?</p><p>That&#8230;sounds an awful lot like cancer, doesn&#8217;t it? And not just the typical, innocuous gynecological disease that one may initially assume endometriosis is.</p><p>If you aren&#8217;t yet convinced, let&#8217;s return back to the seed metaphor we were using in the last section. For a seed to survive, it must manipulate its immediate environment, happening upon somatic mutations that allow it to do so. Very cancer-y sounding! <strong>And, curiously enough, many of the mutations that cells found in endometrial lesions are identical to those found in cancerous tumors.</strong></p><p><a href="https://www.cell.com/cell-reports/fulltext/S2211-1247(18)31127-6#">A Cell paper from 2018,</a> which compared somatic mutations between normal endometrial tissue and endometriosis tissue had this to say:</p><blockquote><p><em>While we were preparing to submit this manuscript, Anglesio et al. reported that 21% of the lesions in patients with deep-infiltrating endometriosis harbored somatic mutations in ARID1A, PIK3CA, KRAS, and PPP2R1A. Our results corroborated their findings in a larger cohort of subjects with a more common type of endometriosis&#8230;.</em></p></blockquote><p><strong>For context, all of the named genes are recognized as known oncologic mutations.</strong> </p><p>Of course, <strong>clonality</strong> should be considered when assessing results like this, as in, what fraction of the assessed cell population had the mutation? If it&#8217;s a low proportion, it is background noise. If it&#8217;s high, it may be what is keeping the cell population afloat. And indeed, endometrial tissue, on average, had much higher fractions of<a href="https://en.wikipedia.org/wiki/KRAS"> KRAS</a> and<a href="https://en.wikipedia.org/wiki/P110%CE%B1"> PIK3CA</a> mutations than normal endometrial tissue. </p><p>We could push this even further and ask the question: does a higher mutational burden of these genes cause endometriosis to be even <strong>more</strong> aggressive,<a href="https://news.vumc.org/2023/06/05/study-discovers-that-tumor-mutation-burden-predicts-survival-outcome/"> just like how it does for cancer aggressiveness</a>? One paper studied this, though only through the lens of KRAS, and the answer was pretty clear: <strong>yes</strong>.</p><blockquote><p><em>KRAS mutation presence was higher in subjects with deep infiltrating endometriosis or endometrioma lesions only (57.9%; 11/19) and subjects with mixed subtypes (60.6%; 40/66), compared with those with superficial endometriosis only (35.1%; 13/37) (p = 0.04). KRAS mutation was present in 27.6% (8/29) of Stage I cases, in comparison to 65.0% (13/20) of Stage II, 63.0% (17/27) of Stage III, and 58.1% (25/43) of Stage IV cases (p = 0.02). KRAS mutation was also associated with greater surgical difficulty (ureterolysis) (relative risk [RR] = 1.47, 95% CI: 1.02&#8211;2.11) and non&#8208;Caucasian ethnicity (RR = 0.64, 95% CI: 0.47&#8211;0.89). Pain severities did not differ based on KRAS mutation status, at either baseline or follow&#8208;up. Re&#8208;operation rates were low overall, occurring in 17.2% with KRAS mutation compared with 10.3% without (RR = 1.66, 95% CI: 0.66&#8211;4.21).</em></p><p><em>In conclusion, KRAS mutations were associated with greater anatomic severity of endometriosis, resulting in increased surgical difficulty. Somatic cancer&#8208;driver mutations may inform a future molecular classification of endometriosis.</em></p></blockquote><p>Given all of this, it is worth wondering in what manner endometriosis is genuinely distinct from cancer. Why don&#8217;t we simply consider the two one and the same? Is there some obvious dividing line between endometriosis and tumor that I have simply left out?</p><p><strong>No. There really is a striking level of similarity between the two.</strong></p><p>Of course, I am not the first person to draw this connection between endometriosis and cancer, far from it. One paper from 2017 titled &#8216;<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5667960/">Endometriosis: benign, malignant, or something in between?</a>&#8221; had this to say:</p><blockquote><p><em>Should endometriosis be considered a &#8216;benign&#8217; neoplasm, which harbors oncogenic driver mutations, along with the capacity for invasion and potentially for distant metastasis? Although exhibiting classic hallmarks of cancer, it is not lethal, is morphologically normal, and does not form an expansile tumor mass. The recent findings invite us to revisit our notions of what constitutes cancer, and should re-ignite interest in the biology of endometriosis, an entity which could aptly be described as &#8220;a riddle, wrapped in a mystery, inside an enigma.&#8221;</em></p></blockquote><p>I&#8217;m not the first one to point out how strange this disease is!</p><p>But perhaps even comparing this to cancer understates the true horror of late-stage endometriosis. In the absolute worst-case prognosis, lesions can form deep fibrotic adhesions that tether organs to each other: the bladder fused to the uterus, the bowel glued to the pelvic wall, the ovaries fixed in unnatural positions.<a href="https://pubmed.ncbi.nlm.nih.gov/32283226/"> One commentary paper</a> states this:</p><blockquote><p><em>It doesn&#8217;t take more than a short inspection into the peritoneal space of a patient with widespread superficial or deep infiltrating endometriosis to understand that this is not the appearance of a usual benign disease. It&#8217;s not uncommon that a surgeon walks out of the operating theatre after a long and exhausting endometriosis case, saying: &#8220;This is worse than metastatic cancer&#8221;!</em></p></blockquote><p>But at least with late-stage cancers, there are <strong>some</strong> miracles that can be accomplished. After all,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC6928196/"> the birth of cancer immunotherapy</a> came from finding that, in rare cases, patients with late-stage tumors could see a complete and miraculous remission, as if entirely by magic, all through their immune system. And if a patient's immune system <em>could</em> do this once, even rarely, then perhaps it could be trained&#8212;or unshackled&#8212;to do it more reliably. So we got checkpoint inhibitors, CAR-T therapy, and so on.</p><p>For patients who have endometriosis, is there anything remotely analogous?</p><h2><strong>There is no (real) cure</strong></h2><p>Unfortunately, no.</p><p>Currently, there are two routes for endometriosis treatment: noninvasive chemical treatments, and invasive surgical treatments.</p><p>In the former case,<a href="https://www.mdpi.com/1424-8247/18/4/588"> the primary strategy revolves around hormonal therapy.</a> The logic is simple: starve the lesions of the chemical cues they need to grow and cycle. Oral contraceptives are used to flatten the hormonal fluctuations of the menstrual cycle, progestin mimics to induce an atrophic state in the endometrial-like tissue, and, in short doses, GnRH agonists to induce a reversible state of complete estrogen suppression. There are other treatment paths too, but these are the most commonly-used ones.</p><p>In the latter cases, usually for endometriosis that is either resistant to hormonal therapy or has progressed to the point of causing anatomical distortion, organ dysfunction, or intolerable pain,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3881735/"> direct surgical intervention is used</a>. The goal is twofold: remove/destroy visible lesions, and restore normal pelvic anatomy that have become fused together through endometrial-tissue-like overgrowth.</p><p><strong>Neither of these do anything to actually cure the disease.</strong> I want to be fair and give the necessary nuance to that statement, because it is a strong one to make, but I want to be clear. What they both do is <strong>management</strong> of endometriosis. But they do not represent a cure in any functional capacity.</p><p>In the case of hormonal treatments, the endometrial tissue doesn&#8217;t <strong>starve</strong>, not really. There are cases where hormonal treatments do genuinely reduce the size of a malignant entity, such as in estrogen-dependent breast cancer. But this is not the case for endometriosis lesions.<a href="https://obgyn.onlinelibrary.wiley.com/doi/full/10.1111/aogs.14887"> One review paper found that while hormonal therapy helps slow progression in </a><strong><a href="https://obgyn.onlinelibrary.wiley.com/doi/full/10.1111/aogs.14887">some</a></strong><a href="https://obgyn.onlinelibrary.wiley.com/doi/full/10.1111/aogs.14887"> patients</a>, there is minimal evidence of change in the size of endometrial lesions over months of continued therapy. Moreover,<a href="https://link.springer.com/article/10.1007/s42000-025-00636-4?"> oral contraceptives do not seem to stop the expression of angiogenesis factors within endometrial lesions</a>, and may in fact <em>somehow</em> accelerate it. And if a patient ever stops the hormone therapy, relapse is the norm;<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10201290/?"> one study found that a majority of patients saw symptom-recurrence within 5 years</a> after finishing a year-long cycle of GnRH-agonist therapy.</p><p>The case for surgery isn&#8217;t much better.<a href="https://www.imrpress.com/journal/CEOG/46/5/10.12891/ceog4949.2019/htm"> While surgery certainly, on average, helps with endometriosis-associated pain</a>, it isn&#8217;t curative. The numbers obviously depend on a lot of different factors, but<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3881735/?utm_source=chatgpt.com"> 5-year post-operative recurrence rates are between 20-45%, and the 8-year rate is squarely in the 40% range</a>. Which maybe doesn&#8217;t sound <strong>terrible</strong>, but consider that endometriosis surgeries aren&#8217;t risk free at all,<a href="https://obgyn.onlinelibrary.wiley.com/doi/10.1111/j.1471-0528.2010.02774.x"> with roughly 1% of patients developing a major-post operative complication</a> (e.g. bladder injury, bowel injury, vaginal dehiscence, etc). Mildly good news is that it <a href="https://link.springer.com/article/10.1007/s00404-023-07193-4/tables/5">doesn&#8217;t seem like that development of these complications go up based on whether you&#8217;ve had the surgery in the past.</a> </p><p>There are options outside of hormonal therapy and surgical operations rising, but they are largely still in their infancy. There&#8217;s <a href="https://www.pnas.org/doi/10.1073/pnas.1916144116">dichloroacetate</a>, which, interestingly, is also a promising drug for <a href="https://www.nature.com/articles/6604554">cancer</a> for the same reason it works for endometriosis. Both cancer and endometrial tissue seem to display the same unique form of cell metabolism (the <a href="https://en.wikipedia.org/wiki/Warburg_effect_(oncology)">Warburg Effect</a>), which dichloroacetate disrupts. There&#8217;s also cabergoline, a drug meant for Parkinson&#8217;s disease that also coincidentally hinders angiogenesis, and has been shown in at least <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8655411/">one randomized trial to reduce pelvic pain caused by the disease.</a> There are other <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8508913/">burgeoning non-hormonal chemical treatments being developed</a>, but, again, none of them seem to be in active use.</p><p>This all said, having no curative procedures at all, and only management ones, isn&#8217;t the <strong>worst</strong> thing in the world. After all, that&#8217;s the status quo for HIV! And that disease went from a death sentence to something that just requires a simple pill per day to keep it at bay, no other functional impacts. It can&#8217;t be cured, but that doesn&#8217;t matter. The patient's life is basically the same either way.</p><p>Is that the case for endometriosis? If you follow the currently recommended set of hormone therapies and surgeries (if needed), can a patient return back to their pre-disease state?<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC10905511/"> In one of the most comprehensive review papers </a>I found, they examine that exact question, combing through 139 past studies to come up with an issue. <strong>And, generally speaking, the answer is no.</strong> The (heavily simplified) results are here:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wyAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wyAq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wyAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg" width="1456" height="380" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:380,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wyAq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 424w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 848w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!wyAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23fd81df-cf80-428b-9f8a-add310c98220_1456x380.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are certainly improvements with the currently approved treatment plans, but the situation is a fair bit worse here than it is for HIV. There is a long, long way to go for endometriosis to become a &#8216;background&#8217; chronic condition, rather than one that continues to cause a lowering of quality-of-life even when treated.</p><p>Well, this all said, endometriosis isn&#8217;t alone here. Lots of diseases also have enormous, chronic impacts on quality of life and have little in the way of dependable treatment. Cancer of course, but also Alzheimer's, Crohn&#8217;s disease, ALS, and so on.</p><p>But within this observation lies perhaps the most curious part of endometriosis&#8230;</p><h2><strong>There are few diseases on Earth as widespread and underfunded as it is</strong></h2><p>One potential way to assess how overlooked <strong>and</strong> widespread a disease is is by considering the ratio of &#8216;DALYs&#8217;, or Disability-Adjusted-Life-Years, to the amount of NIH funding. Or, Dollars:DALYS. </p><p>The former is an indication of how institutional focus is being placed on the disease; more money, the more attention. There are obviously money sources outside of the NIH, but the NIH remains the single largest public funder of biomedical research in the world (at least for now&#8230;), and its budget choices set the tone for scientific priorities across academia and industry.</p><p>The latter is an indicator of how severe <strong>and </strong>widespread the disease is, with a simple calculation: <strong>DALY = YLL + YLD, </strong>where <strong>YLL</strong> (<em>Years of Life Lost</em>) is years lost due to premature death and <strong>YLD</strong> (<em>Years Lived with Disability)</em> is years lived while disabled. Keep in mind that this is years <strong>across</strong> some given population, not in one person, which also gives an indication of how widespread the condition is. Of course,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11525889/"> DALYs are a limited metric</a> given how subjective &#8216;<em>disability</em>&#8217; is, but it&#8217;s a helpful starting point.</p><p>Thus, if Dollars:DALYs is high, that implies that funding is either proportionate to suffering, or is perhaps over allocated to. If it is low, the funding is almost certainly not enough. This is obviously a fuzzy metric, since research doesn&#8217;t necessarily go faster just because you throw more money at it, but it is helpful as a data point.</p><p>To keep things consistent, let&#8217;s work at the order of millions of dollars and DALYs per 100k people, since that is how they both are typically reported. Let&#8217;s also focus on chronic conditions, since that is where DALYs are most relevant. Finally, we can pull<a href="https://report.nih.gov/funding/categorical-spending#/"> NIH funding numbers</a> from here, and find the DALY counts from review papers. </p><p>On the highest end, Alzheimer's received $3538M in funding in 2023, and caused<a href="https://pubmed.ncbi.nlm.nih.gov/39837288/"> 451</a> DALYs per 100k people worldwide. So, 3538:451, or 7.8.</p><p>Then Crohn&#8217;s Disease, which has the ratio 92:<a href="https://www.sciencedirect.com/science/article/abs/pii/S1568997224001460">20.97</a> (4.3). </p><p>Slightly lower is diabetes, 1187:<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC8915203/">801.5</a> (1.4). </p><p>Close to it is epilepsy, 245:<a href="https://jhpn.biomedcentral.com/articles/10.1186/s41043-025-00783-9#:~:text=Despite%20this%20rise%2C%20the%20age,Supplementary%20Table%202%2C%20Supplementary%20Fig.">177.84</a> (1.6).</p><p><strong>Finally, near the bottom of the list is endometriosis, 29:<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11754495/?utm_source=chatgpt.com">56.61</a>, or .5.</strong></p><p>Quite bad. That said, endometriosis isn&#8217;t alone in being deeply underfunded. Another condition with an even worse ratio is chronic obstructive pulmonary disease (COPD), which lies at 148:<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11697803/#:~:text=The%20GBD%202019%20estimated%20a,countries%20%5B3%2C%2021%5D.">926.</a>1, or .15. Why COPD? Mostly<a href="https://www.thelancet.com/journals/lanres/article/PIIS2213-2600(21)00316-7/fulltext"> given its stigma as a self-inflicted disease</a>. But there is some reason to believe that the true ratio for endometriosis is even worse than what they appear to be on face value.</p><p>Why? Diagnosis lag.</p><p><a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11625652/#:~:text=Diagnostic%20delay%20in%20endometriosis%20is,over%2015%20years%20%5B5%5D.">Endometriosis takes, on average, 7&#8211;10 years to be diagnosed after symptom onset.</a> This is not due to the subtlety of the condition; patients often present with debilitating pain, irregular bleeding, gastrointestinal symptoms, and infertility. But diagnosing endometriosis is difficult,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7100540/"> the gold standard for it being an invasive and expensive procedure that requires general anesthesia: laparoscopic surgery.</a> While noninvasive imaging (like ultrasound or MRI) can sometimes detect large lesions, many forms of endometriosis, particularly superficial or deep-infiltrating types,<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7100540/"> are not easily visible this way.</a> As a result, a patient must often endure years of symptoms before someone is willing to escalate their care to diagnostic surgery<strong>. Because of this, it feels deeply likely that many endometriosis cases simply never enter official registries, making the total burden likely massively undercounted in global DALY calculations.</strong></p><p>Notably, this is unlike most other diseases, even historically underfunded ones, which typically have clear diagnostic criteria that can be confirmed through inexpensive blood tests, imaging, or clinical presentation alone. For example, COPD can be easily diagnosed via spirometry: just blow into a tube.</p><p>Given this, how should we update our Dollars:DALYs ratio for endometriosis? One way would be to ask the question: how many cases of endometriosis are currently undiagnosed?<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154227"> A 2014 study answers this question</a>, albeit limited to a specific region in Italy, by actively searching for endometriosis in a sample of 2,000 premenopausal women who had visited a GP for non-gynecological reasons. Of these, 28 had already been diagnosed with endometriosis. Using a symptom-based questionnaire and surgical follow-up, the authors discover 37 more cases amongst the 2,000 women. <strong>In other words, 60%~ of endometriosis cases would not be discovered if there was no active search for them.</strong></p><p>In the absolute worst-case situation, this should lead us to bump our DALYs for endometriosis up by 60%. Starting with a base DALY of 56.61 per 100k people, this leads us to 141.52. </p><p><strong>Thus our ratio becomes 29:141.52, which is a dismal .2, close to that of COPD.</strong></p><p>Now, again, that is a worst-case calculation, where the &#8216;density&#8217; of DALYs in the &#8216;undiscovered endometriosis&#8217; patient population is identical to that of patients with the diagnosis. This may not be the case, but it&#8217;s hard to tell from the literature alone. One study notes that<a href="https://pubmed.ncbi.nlm.nih.gov/22990516/"> diagnosis is slowest when symptoms are vague or non-disabling</a>, potentially implying that this undiscovered set of endometriosis would yield fewer DALYs. On the other hand, one could imagine the DALYs being <em>higher</em> for some fraction of the undiagnosed condition if the lack of hormonal or surgical management leads to more severe complications down the line.</p><h1><strong>Conclusion</strong></h1><p>Endometriosis is a remarkable disease. It is something that, despite being studied for centuries, has eluded an understanding of its origins, has an uncanny resemblance to cancer, and lacks any effective curative or management methods. Yet, it stands almost entirely alone in terms of how little funding the condition receives relative to the absolute number of lives it irrevocably alters for the worse: 10% of reproductive age women (or 190 million) worldwide, with only $29M earmarked for them.</p><p>Understandably, characterizing any disease as &#8216;<em>interesting</em>&#8217; runs the risk of seeming flippant. Especially given how intensely emotional the impact of it on patient lives can be: chronic pain, infertility, and life-altering disability. This is not my intention! Here, I use &#8216;interesting&#8217; as a way to convey a sense of <strong>unexpectedness. </strong>Many aspects of endometriosis are deeply unexpected. And, perhaps more practically and actionably for readers, it is unexpected in ways that are surely fertile ground for more research.</p><p>Of course, it&#8217;s certainly a hard disease to tackle. But so are cancer, Alzheimer's, and HIV, all of which have inspired generations of scientists to feverishly work towards understanding. This is partially due to how high the expected impact of such research would likely be, but it&#8217;d be rewriting history to not also mention how deeply interesting those conditions were to the people studying them! Talk to any oncology researcher about pancreatic cancer, and they will mention the awfully high death rate, but will also light up when<a href="https://www.sciencedirect.com/science/article/abs/pii/S0304383517300149"> discussing the strangely dense stromal microenvironment that seems to shield it from treatment.</a> </p><p>The fascination is inseparable from the fight. And it feels like very few people have tried to cover the fascinating part of the disease, only the fight. </p><p>I hope, in this essay, I have convinced you that endometriosis is an interesting condition. <strong>And, if you are in a position in your life to do so, it may be worth your energy to work on it</strong>. In a 2009 essay written by Uri Alon, a well-known systems biologist,<a href="https://www.cell.com/molecular-cell/pdf/S1097-2765(09)00641-8.pdf"> he discusses what makes for a scientific problem worth studying</a>. He comes up with the following graph:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eJpc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eJpc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eJpc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg" width="1388" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1388,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eJpc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 424w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 848w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!eJpc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2c4a560-2f3a-4927-b6e0-34bdc5f9e20f_1388x486.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It is obviously impossible for me to state for certain that research into endometriosis will surely result in a large gain of knowledge if more attention, resources, and money is poured into it. But, given the relatively premature state of things in this field, it&#8217;s difficult to imagine otherwise. Not a bad bet to make!</p>]]></content:encoded></item><item><title><![CDATA[A primer on machine learning in cryo-electron microscopy (cryo-EM) ]]></title><description><![CDATA[7.9k words, 36 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-ml-in-cryo-electron-microscopy</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-ml-in-cryo-electron-microscopy</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sat, 21 Dec 2024 16:34:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iztB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iztB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iztB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!iztB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!iztB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!iztB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iztB!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:7008657,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/149989677?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iztB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!iztB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!iztB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!iztB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14b9ae18-ac3b-47bb-9f23-7df93b25d18d_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: thank you to<a href="https://molbiosci.rutgers.edu/faculty-research/faculty/faculty-detail/84-k-l/1093-kaelber-jason"> Jason Kaelber</a>, a professor at Rutgers University and director of their cryo-EM facility, for commenting on drafts of this essay! Also thank you to <a href="https://www.linkedin.com/in/surge-biswas-a8b61270/">Surge Biswas</a>, the founder of <a href="https://www.nabla.bio/">Nabla Bio,</a> for answering questions I had over cryo-EM.</em></p><ol><li><p><a href="https://www.owlposting.com/i/149989677/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/why-do-cryo-em">Why do cryo-EM?</a></p><ol><li><p><a href="https://www.owlposting.com/i/149989677/the-alternatives">The alternatives</a></p><ol><li><p><a href="https://www.owlposting.com/i/149989677/x-ray-crystallography">X-ray crystallography</a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/nuclear-magnetic-resonance-nmr">Nuclear magnetic resonance (NMR)</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/149989677/why-cryo-em-is-better">Why cryo-EM is better</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/149989677/the-cryo-em-workflow">The cryo-EM workflow</a></p><ol><li><p><a href="https://www.owlposting.com/i/149989677/sample-preparation">Sample preparation</a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/imaging">Imaging</a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/reconstruction">Reconstruction</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/149989677/some-machine-learning-problems-in-the-area">Some machine learning problems in the area</a></p><ol><li><p><a href="https://www.owlposting.com/i/149989677/conformational-heterogeneity">Conformational heterogeneity </a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/ab-initio-reconstruction">Ab initio reconstruction</a></p></li><li><p><a href="https://www.owlposting.com/i/149989677/compositional-heterogeneity">Compositional heterogeneity</a></p></li></ol></li><li><p><a href="https://www.owlposting.com/i/149989677/whats-left">What&#8217;s left?</a></p></li></ol><h1>Introduction</h1><p>Cryo-electron microscopy (cryo-EM) has been gaining increasing popularity over the past few years. Used as a way to perform macromolecular structure determination for decades, cryo-EM really hit its stride around 2010, when it crossed the resolution thresholds needed to determine protein structures. The technique was so deeply powerful, so able to answer biological questions for which no alternative tool existed, that its creators were awarded the 2017 Nobel Prize in chemistry. </p><p>But I wasn&#8217;t really aware of that when I first stumbled across cryo-EM. </p><p>My initial thought was that it was a cool-sounding name, and the output of the process made for similarly cool images. <br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lAcR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lAcR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png" width="448" height="448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ...&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ..." title="Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ..." srcset="https://substackcdn.com/image/fetch/$s_!lAcR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Weird looking, isn&#8217;t it? </p><p>I first came across cryo-EM as a concept via <a href="https://www.cs.princeton.edu/~zhonge/#:~:text=Ellen%20D.%20Zhong.%20Email:%20zhonge%20[at]%20princeton.edu.%20Office:">Ellen Zhong</a> (a machine learning professor at Princeton) in 2022. Because she co-wrote <a href="https://www.biorxiv.org/content/10.1101/2020.07.08.193946v1">what has become one of my favorite papers of all time</a>, I was also interested in what else she had worked on. But very much unlike my favorite paper, which had to do with viral language models, <strong>almost all of her work had to do with applying ML to cryo-EM. </strong></p><p>This was weird! Cryo-EM wasn&#8217;t something I ever saw much. While, admittedly, I was entirely ignorant of the field until 2022, it still felt like it wasn&#8217;t a very popular topic. Most people seem to work in small molecule behavior prediction or antibody modeling or something you&#8217;d see dozens of papers about at a NeurIPS workshop. </p><p>Cryo-EM feels almost like&#8230;pure physics or chemistry, something that distinctly wasn&#8217;t an ML problem. As such, I mentally tossed it away as something beyond my mental paygrade. But I kept seeing more and more cryo-EM news. </p><p>More cryo-EM papers from Zhong&#8217;s lab.</p><p><a href="https://gandeeva.com/gandeeva-raises-40m-in-series-a-funding/#:~:text=Gandeeva%20Therapeutics,%20a%20precision%20biotechnology%20company">Gandeeva Therapeutics raising $40M in 2022 to do drug discovery work using ML-assisted cryo-EM.</a> </p><p><a href="https://generatebiomedicines.com/news/generatebiomedicines-unveils-state-of-the-art-cryoem-laboratory-to-accelerate-generative-ai-drug-discovery-and-development#:~:text=Generate:Biomedicines%20Unveils%20State-of-the-Art%20CryoEM">Generate:Biomedicine, a very well-known biology-ML Flagship startup, creating a 70,000 square feet cryo-EM lab in 2023.</a> </p><p>There was something going on here, something important. </p><p>Yet, there are shockingly few resources on how to learn about this field, starting from the ground up. I&#8217;ve written technical introduction for <a href="https://www.owlposting.com/p/a-primer-on-molecular-dynamics">molecular dynamics</a>, <a href="https://www.owlposting.com/p/a-primer-on-why-computational-predictive">toxicology</a>, and <a href="https://www.owlposting.com/p/a-primer-on-ai-in-antibody-engineering">antibody engineering</a> before. All of those felt like I was rehashing a collection of a dozen-or-so review papers, just phrased in a way I found more appealing. </p><p>But here&#8230;there&#8217;s almost nothing, <a href="https://dspace.mit.edu/handle/1721.1/144512">outside of maybe Zhong&#8217;s PhD thesis</a>. I hope to add to that body of work. </p><p>This essay will first explain the alternatives to cryo-EM, why cryo-EM exists at all, and why so many people seem to be interested in it. Then we&#8217;ll move into how cryo-EM works, including sample prep, imaging, and reconstruction. Then we&#8217;ll finally be ready to approach how people are throwing ML at the problem to both solve fundamental issues with cryo-EM and, most interestingly, extend it beyond what people originally thought it was capable of. </p><p>Lots to go through. Let&#8217;s start!</p><h1>Why do cryo-EM?</h1><p>Cryo-EM is a method to understand the three-dimensional structure of extremely small structures: proteins, small molecules, and so on. <strong>It shares this categorization with (primarily) two others of note: X-ray crystallography and nuclear magnetic resonance (NMR) imaging.</strong> It&#8217;s worth going over them first before we discuss the advantages of cryo-EM. </p><p>But this post is only meant to deeply discuss cryo-EM and I&#8217;d like to avoid turning this essay into a textbook, I won&#8217;t deeply cover the other two, only a very quick overview of how it works, their advantages, and disadvantages. </p><h2>The alternatives</h2><h3>X-ray crystallography</h3><p>X-ray crystallography is one of the most established methods of protein characterization, having existed in prototype forms since the early 1900&#8217;s. The technique involves purifying and crystallizing the target molecule, arranging it into a highly ordered, repeating lattice structure. When X-rays are directed at the crystal, they are scattered by the electrons in the atoms, producing a diffraction pattern. This pattern is then mathematically transformed into an electron density map, from which the atomic model of the molecule can be built. </p><p>There are lots of benefits to the method. It can achieve extremely high resolution (often below 1 &#197;) and has well-established protocols and analysis methods backed by decades of research. <strong><a href="https://www.rcsb.org/stats/summary">As such, this method has been responsible for solving the vast majority of protein structures in the Protein Data Ban</a></strong><a href="https://www.rcsb.org/stats/summary">k.</a></p><p>Unfortunately, the need for crystallization is a huge problem. For one, large protein complexes are nearly impossible to crystallize at all, and large complexes are one of the most understudied parts of our biology. Two, crystalizing a protein implies you&#8217;ll be fixing it in place. In turn, this means that any measurement of the resulting structure will captures a single, static image of the crystallized molecule, missing out on any alternative conformations. </p><h3>Nuclear Magnetic Resonance (NMR)</h3><p>NMR spectroscopy is a departure from crystallization, instead relying on purified molecules placed in a solution, typically water. The technique exploits the fact that certain atomic nuclei behave like tiny magnets. When these atoms are placed in a powerful magnetic field, they can absorb and release specific frequencies of radio waves. By precisely controlling the magnetic field and sending carefully timed pulses of radio waves, researchers can measure how atoms within a molecule interact with each other. From this, researchers can gather information about the distances and angles between atoms, allowing them to calculate a set of possible structures consistent with the observed interactions. </p><p>Because you're eschewing crystallization, NMR allows one to study protein motion. As in, you can observe dynamic changes in protein structure, protein-protein interactions, and even study partially unfolded states. NMR can also provide information about protein dynamics on various timescales, from microseconds to seconds. Of course though, <a href="https://corinwagen.github.io/public/blog/20220719_timescales.html">the utility of these coarser timescales is suspect.</a></p><p>NMR has one particularly strong limitation. The technique is generally limited to relatively small proteins (typically under 50 kDa), as larger proteins produce increasingly complex spectra that become difficult to interpret. <a href="https://www.rcsb.org/stats/summary">This turns out to be a major enough problem that NMR is the least used structure characterization method in the PDB. </a></p><h1>Why cryo-EM is better</h1><p>First and most importantly, <strong>cryo-EM doesn't require crystallization of studied-protein structures.</strong> Instead of forcing proteins into a crystal lattice, researchers instead flash-freeze them in a thin layer of vitreous ice (something we&#8217;ll discuss more later). The benefit of this is that we can study massive protein complexes, membrane proteins, and other structures that have historically been nearly impossible to crystallize. <strong>The size advantage of cryo-EM works in the opposite direction of NMR &#8212; larger structures are often easier to work with in cryo-EM than smaller ones.</strong> While NMR struggles with anything over 50 kDa, cryo-EM excels at massive molecular machines like ribosomes (~2,500 kDa) or virus particles (often &gt;1,000 kDa). At those sizes, even X-ray crystallography struggles. Again though, cryo-EM has a problem with smaller structures, which we&#8217;ll expand on more later. </p><p>Another major advantage is that cryo-EM can capture proteins in multiple conformational states simultaneously. <strong>You may intuitively guess that flash-freezing proteins would present the same &#8216;static-only-structures&#8217; problem as crystallization, but this actually turns out to not be true in practice.</strong> Why exactly this is the case requires some more explanation, so we&#8217;ll get to that later.  </p><p>Finally, resolution was historically cryo-EM's weak point &#8212; for many years, it couldn't match the atomic detail provided by X-ray crystallography. The primary bottleneck was that detecting electrons passed through flash-frozen protein was difficult, but better detection setups &#8212; circa 2020 &#8212; changed that. Nowadays. modern cryo-EM can regularly achieve resolutions better than 3 &#197;, and in some cases even approach 1 &#197; resolution. Close to X-ray crystallography levels! </p><p>Of course, there isn&#8217;t a free lunch here. Cryo-EM struggles in one particular area: ease of performing it. </p><p>For one, electron microscopes themselves can run in the seven figures to acquire, and that&#8217;s before considering the specialized equipment (liquid nitrogen or liquid helium) needed to run it. Secondly, dealing with cryo-EM data is monstrously challenging. Data artifacts will naturally arise from the inevitably noisy freezing process, extracting conformations from electron diffractions is difficult at best, and atomic resolution can be inconsistent across a structure. </p><p><strong>Likely amplifying the prior issues, the final problem here is that cryo-EM is a reasonably new characterization method.</strong> Of course, &#8216;new&#8217; is relative. Cryo-EM had its first characterized structure in 1991, whereas X-ray crystallography had it in 1958. Though one would expect this 33 year lead time to have been washed out in the 30~ years since, it is likely that the relative inaccessibility of cryo-EM has made research on it difficult. </p><p>This all said though, as the chart below shows, cryo-EM is picking up speed with each passing year!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G7CN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G7CN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 424w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 848w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 1272w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G7CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png" width="1456" height="888" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:888,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:201997,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G7CN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 424w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 848w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 1272w, https://substackcdn.com/image/fetch/$s_!G7CN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381d0e83-8d05-470a-9a26-de21cbf3326e_1990x1214.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Of course, there&#8217;s still aways to go, comparing it  to X-ray crystallography structures: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TDtj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TDtj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 424w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 848w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TDtj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png" width="1456" height="900" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1093021,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TDtj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 424w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 848w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!TDtj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc637a2c1-e0af-433d-bc99-b7ece6412de6_1932x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With this surface-level overview, we should be ready to start poking at how cryo-EM works at all. We&#8217;ll first explore sample preparation (preparing our protein for input to the cryo-EM grid), then how the imaging process works (via the electron microscope), and then discuss how typical protein structure reconstruction works. </p><h1>The cryo-EM workflow</h1><h2>Sample preparation</h2><p>Let&#8217;s assume you have a purified solution of proteins, suspended in some aqueous solution. This is nontrivial to do, especially with large protein complexes that cryo-EM is known to excel at, but it&#8217;s a complexity we&#8217;ll ignore for now.  </p><p>What&#8217;s next? </p><p>First, we squeeze out the protein solution onto a grid. What does the grid look like? Here&#8217;s a picture:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sf3Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 424w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 848w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1272w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png" width="530" height="343.6166666666667" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:778,&quot;width&quot;:1200,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:452856,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 424w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 848w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1272w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://myscope.training/CRYO_Vitrification">here</a>. Focus on the first two images, ignore the right-most one for now.</figcaption></figure></div><p>Basically, it&#8217;s a small metal mesh (usually copper) that's been covered with a thin film of carbon. The carbon film &#8212; similar to the copper mesh it&#8217;s on top of &#8212; isn't a solid sheet, it's full of holes, typically arranged in a regular pattern. The whole copper grid is only about 3mm in diameter, and each hole in the carbon film is just a few hundred nanometers across. So you end up with each hole of the copper grid itself containing many grids of the carbon film. </p><p>There&#8217;s lots of commercial variety here;<a href="https://www.jenabioscience.com/about-us/news-blog/3342-cryo-em-grids-available"> everything from the hole sizes to the thickness of the carbon sheets can be altered</a>. There&#8217;s a great deal of nuance here w.r.t <strong>why</strong> you&#8217;d prefer some parameters over others, but we&#8217;ll ignore it for now. If you&#8217;re interested, this topic is a whole field in of itself called &#8216;<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC7900225/">grid optimization</a>&#8217;. </p><p><strong>One more question: why a grid&#8230;of grids?</strong> It&#8217;s hard to answer right now, we&#8217;ll come back to that later, let&#8217;s move on. </p><p>When we apply our protein solution to this grid, it creates thin films across these holes, like a soap bubble spanning a bubble wand. So, can we start imaging right away? Not yet, there&#8217;s one problem: biological molecules like proteins are mostly made of light elements &#8212; carbon, nitrogen, oxygen, hydrogen. <strong>These elements don't interact very strongly with electrons.</strong> When you shoot an electron beam through a &#8216;naked&#8217; protein, almost all the electrons go straight through without being deflected. In other words, proteins are nearly invisible to electron beams. </p><p>One curious side note before we move on: the electron argument makes sense for why &#8216;naked&#8217; proteins won&#8217;t work, but I assumed there was another reason too for why naked proteins won&#8217;t work. The electron microscope we'll eventually use operates in a near-vacuum &#8212; it has to, or the electrons would just bounce off air molecules instead of hitting our sample. And I <strong>assumed</strong> vacuums are extraordinarily unfriendly to biological samples, so they must be&#8230;protected in some way. But I stumbled across a paper with an insane title (<a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5837003/#:~:text=Many%20proteins%2C%20and%20many%20protein,et%20al.%2C%202009).">The fate of proteins in outer space</a>) that disproved this: </p><blockquote><p><em>Many proteins, and many protein-protein complexes, retain their structural integrity in vacuo, at least for a sufficiently long time, for many of their essential structural features to be retained and be capable of study in intimate detail.</em></p></blockquote><p>Either way, imaging naked proteins is infeasible. What can we do? </p><p>One way could be to cover the proteins with heavy elements and image that! This is what <strong>used</strong> to be done with via chemical staining of the protein film. Here, you first deposit a fine layer of heavy metal salts on the surface of your grid to allow electron interaction and then throw electrons at that. And, practically speaking, so-called<a href="https://cryoem101.org/chapter-1/#part5"> negative-staining</a> is still often done in the earliest stages of a cryo-EM project to assess feasibility. Unfortunately, the application of the aforementioned salts can both limit resolution (as the deposition of the salts prevents fine-grained imaging) and can cause artifacts in the final structure. </p><p>Is there any other way we can we make the protein visible? </p><p>As the name &#8216;cryo-EM&#8217; may suggest, the answer is to <strong>freeze the proteins</strong>. You&#8217;d be forgiven for thinking that this is deeply unintuitive. If our whole problem with imaging naked proteins is that they don&#8217;t interact with electrons well, what do we gain from surrounding the protein in a frozen matrix of water, which is made of even lighter elements than proteins, and <strong>then</strong> shooting electrons at them?</p><p>The answer is a bit of a bait-and-switch; while proteins are indeed nearly invisible to electrons, ice is even <strong>more</strong> invisible to electrons. You can think of ice here as offering a way to create a nice &#8216;background&#8217; state from which we can more clearly make out the protein structure. A more scientific way to put this wouldn&#8217;t be &#8216;invisible&#8217; or &#8216;not invisible&#8217;, but rather that the electron is phase shifted by its passage through the object &#8212; <a href="https://cryoemprinciples.yale.edu/sites/default/files/files/2%20Phase%20contrast.pdf">one consistent level of shift when passing through ice, and another varied set of shifts as it passes through the protein. </a> And we obviously can&#8217;t use the liquid water, since, again, our electron microscope needs a vacuum to work, so liquids would instantly boil. As such, ice. </p><p>This might seem like a small distinction&#8212; using ice as a background rather than using heavy metals as a foreground &#8212; but it turns out to be <strong>really </strong>important. When we use heavy metals in negative staining, we're really creating a mold around the protein, like pressing a coin into clay. While this gives us good contrast, it also means we're not really seeing the protein itself, just its outline. With ice, we're actually seeing the protein itself suspended in mid-air, the fuzziness of its view being brought into more clarity via a consistent background (the ice). </p><p>To note though, normal ice here won&#8217;t work. When you turn water into ice, it naturally wants to form crystals. This process, called nucleation, starts when a few water molecules happen to arrange themselves in an ice-like pattern. Once that happens, other water molecules are recruited to join into the pattern. </p><p>This crystallization is a problem for two reasons. </p><p>First, when water molecules arrange themselves into ice crystals, they literally take up more space than they did as liquid water. This expansion can tear proteins apart or change their conformation (notably, this particular effect of ice nucleation is why stuff like human cryonics is really hard; shoutout to <a href="https://www.cradle.xyz/">Cradle</a>, a startup that is trying to solve this problem!). This is obviously antithetical to our lofty imaging goals. </p><p>Second, while ice itself doesn&#8217;t scatter electrons, <strong>ice crystals do</strong>, which will ruin the otherwise nice contrast we had with our protein. </p><p>Is it possible to have ice without the ice crystals? </p><p>Yup! <strong>It turns out that ice crystals will not form if water  is frozen to &#8722;135 &#176;C at a rate of around 100,000 degrees Celsius per second.</strong> If water is frozen that fast, ice nucleation crystals will literally not have time to form. To achieve this rate of freezing, people typically dunk the copper grid into liquid ethane. I&#8217;ve attached some interesting notes about this in the footnotes, but we&#8217;ll move on.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> </p><p>How is this all done? There&#8217;s a dedicated machine for it, called the <a href="https://www.thermofisher.com/us/en/home/electron-microscopy/products/sample-preparation-equipment-em/vitrobot/instruments/vitrobot-mark-iv.html">Vitrobot</a>! There&#8217;s a<a href="https://cryoem101.org/chapter-2/#part6"> video on the process here.</a></p><p>Finally, with all that, let&#8217;s assume we&#8217;ve gotten our copper grid frozen, our proteins now stuck in a layer of vitreous ice across our copper grid. We&#8217;ve succeeded! Keep in mind though that we&#8217;ve blown past a <strong>lot</strong> of complexity here. <a href="https://www.nature.com/articles/s41592-021-01130-6">Sample preparation in cryo-EM is a huge field</a>, and what I&#8217;ve written here doesn&#8217;t do it justice at all. But hopefully it gives you a good mental model. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sf3Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 424w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 848w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1272w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png" width="448" height="290.4533333333333" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:778,&quot;width&quot;:1200,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sf3Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 424w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 848w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1272w, https://substackcdn.com/image/fetch/$s_!sf3Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F40d49f1f-4de0-47fe-8c09-a3e2abe7d22b_1200x778.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Same picture as before, now just look at the right-most image. </figcaption></figure></div><p>Now, we just remove the grid from our vat of liquid ethane/nitrogen, store it in a bucket of liquid nitrogen (to keep it cool), and move it on over to the electron microscopy. It&#8217;s time to start imaging! </p><h2>Imaging</h2><p>The workhorse here will be a <a href="https://en.wikipedia.org/wiki/Transmission_electron_microscopy">transmission electron microscopy</a>, or TEM. It may be helpful to have the whole process plainly explained at the start, and we&#8217;ll walk through the steps. </p><ol><li><p>Load up the sample into the TEM.</p></li><li><p>The TEM will shoot a beam of electrons at our sample.</p></li><li><p>These electrons travel through our frozen protein sample.</p></li><li><p>Most electrons pass straight through the ice.</p></li><li><p>Some get deflected by the protein.</p></li><li><p>An electron detector at the bottom records where all the electrons ended up.</p></li></ol><p>This is actually pretty simple. It&#8217;s not too dissimilar to brightfield micrography, where visible light is shown beneath a sample, and we look from above to see what structures show up. In this case, the deflector is the &#8216;eye&#8217;. </p><p>First up, imaging, which will occur on a grid-by-grid level. Before we start explaining imaging, it may be good to note that how imaging is done is the answer to the previous question of why have grids at all: <strong>each grid is a &#8216;shot&#8217; at having a potentially good image</strong>. If some holes have ice that's too thick or thin, or if the protein distribution or orientation isn't ideal in some areas, we can simply move on to other holes. This grid pattern essentially gives us thousands of independent attempts at getting good images from a single sample preparation. </p><p>Now that we know the purpose of the grid, we can ask an even more fundamental question: why is there a carbon grid on top of the carbon grid? Let&#8217;s leave that one to the footnotes.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>Moving on, taking advantage of this grid setup, cryo-EM imaging typically occurs in three steps: grid-view, square-view, and hole-view. The hope of this is to quickly discard unpromising sections of the grid and focus in on the higher-quality sections. In order:</p><ol><li><p>In grid view, we take a low magnification "atlas" of the entire grid. This will immediately show us if we've got usable ice. If the ice is too thick, we'll barely be able to see through the grid squares at all. Usually, we&#8217;ll see a gradient of ice thickness across the grid, so we&#8217;ll have options for finding the perfect thickness for imaging.</p></li><li><p>In square-view, we examine individual copper grid squares at medium magnification. This is where we can spot all sorts of potential problems &#8212; crystalline ice (bad!), contamination, or even more subtle variations in ice thickness.</p></li><li><p>Finally, hole-view, where we zoom in to actually look at individual holes in the carbon film. These are also often referred to as <strong>micrographs</strong>. This is where we're hoping to see a nice, even distribution of our proteins &#8212; not too crowded, not too sparse. If we're not seeing what we want at this level, we go back to sample prep. </p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NSYw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NSYw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 424w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 848w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 1272w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NSYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png" width="632" height="194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/752909b6-80e0-4d06-9663-45ec473a9581_632x194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:194,&quot;width&quot;:632,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NSYw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 424w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 848w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 1272w, https://substackcdn.com/image/fetch/$s_!NSYw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F752909b6-80e0-4d06-9663-45ec473a9581_632x194.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://claude.ai/chat/cfc49fad-8fc0-4759-9b4a-5d8a77737f59">here</a>. This shows grid, square, and hole views, in order. </figcaption></figure></div><p>Notably, the electron energy stays roughly constant at 300 keV throughout this whole process, only the magnification changes. One thing to ponder is this 300 keV number; how safe is that for imaging? Are we expecting it to damage our protein? Generally, yes, <a href="https://www.nature.com/articles/s42003-021-01919-3">to the point where it can dramatically alter the structure of our proteins.</a> There&#8217;s a broader collection of work here in tuning the electron energy, but we&#8217;ll ignore that. </p><p>This whole process is automated in modern microscopes, which can systematically work through these levels, collecting thousands of images over days of operation. </p><p>What does an ideal hole-view look like? Like this: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D1zA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D1zA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 424w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 848w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 1272w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D1zA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png" width="382" height="396.132752992383" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:953,&quot;width&quot;:919,&quot;resizeWidth&quot;:382,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Image 1 of 13&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Image 1 of 13" title="Image 1 of 13" srcset="https://substackcdn.com/image/fetch/$s_!D1zA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 424w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 848w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 1272w, https://substackcdn.com/image/fetch/$s_!D1zA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64a4e510-dcca-49a8-97b5-12523ab8eab8_919x953.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://cryoem101.org/chapter-3/">here</a></figcaption></figure></div><p>Each of the little dark-ish blobs is an aldolase particle, an enzyme involved in energy production. What does <strong>that</strong> look like in ribbon-form? <a href="https://www.ebi.ac.uk/pdbe/entry/pdb/8D44">Here&#8217;s the derived ribbon structure from that blob:</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!INIP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!INIP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 424w, https://substackcdn.com/image/fetch/$s_!INIP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 848w, https://substackcdn.com/image/fetch/$s_!INIP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 1272w, https://substackcdn.com/image/fetch/$s_!INIP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!INIP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png" width="312" height="315.9493670886076" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:790,&quot;resizeWidth&quot;:312,&quot;bytes&quot;:689439,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!INIP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 424w, https://substackcdn.com/image/fetch/$s_!INIP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 848w, https://substackcdn.com/image/fetch/$s_!INIP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 1272w, https://substackcdn.com/image/fetch/$s_!INIP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7387e232-0651-462d-898b-7a49d2b58b0a_790x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hmm. There&#8217;s a bit of a difference. </p><p>So we've got our blobs. Thousands of them, each a 2D snapshot of our protein frozen in ice. And somehow, we need to turn this into those into a 3D dimensional structure. How?</p><h1>Reconstruction</h1><p>Consider this:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5H4n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5H4n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 424w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 848w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 1272w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5H4n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png" width="526" height="229.40247252747253" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1456,&quot;resizeWidth&quot;:526,&quot;bytes&quot;:775261,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5H4n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 424w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 848w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 1272w, https://substackcdn.com/image/fetch/$s_!5H4n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf0df9a9-a0fe-45d2-815f-77051cbae4aa_1574x686.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://www.bnl.gov/nysds19/files/talks/session6/lin_nysds19.pdf">here</a></figcaption></figure></div><p>When we covered our grid with a purified solution of proteins, we don't naively expect them to order themselves in any particular direction. Their orientation will, usually, be random! And it turns out that this randomness is actually a feature, not a bug. Why? Because if every protein landed in exactly the same orientation, we'd only ever see it from one angle. Instead, we get thousands of different views of the same structure. Like so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iPO_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iPO_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 424w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 848w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 1272w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iPO_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png" width="324" height="352.22727272727275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:574,&quot;width&quot;:528,&quot;resizeWidth&quot;:324,&quot;bytes&quot;:231462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iPO_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 424w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 848w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 1272w, https://substackcdn.com/image/fetch/$s_!iPO_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ea3c4aa-4d11-430e-a55c-6ec2ecd61c6c_528x574.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://myscope.training/CRYO_3D_reconstruction">here</a>.</figcaption></figure></div><p>Now, you may say, &#8216;<em>while the particles may not be perfectly ordered, it isn&#8217;t at all obvious that the orientations will be uniformly random.</em>&#8217;. This is correct! The tendency for particles to arrange themselves in a specific pattern is referred to as <a href="https://www.nanoimagingservices.com/about/blog/solving-the-challenges-of-preferred-orientation-in-cryo-em-sample-preparation">&#8216;orientation bias&#8217;, or &#8216;preferred orientation&#8217;</a>, and is actually one of the biggest problems involved in actually running cryo-EM. For the purposes of this essay, we&#8217;ll pretend that this isn&#8217;t an issue, since most of the fixes have to do with either the sample prep or imaging process, <a href="https://t.co/mpdxzbfNpQ">such as this</a>, and I am even less equipped to comment on that than the current topic of this essay. </p><p>Thus, our job can be phrased as the following: given thousands of these (assumed random) 2D views, each one capturing a different angle of a 3D structure, <strong>reconstruct</strong> the 3D dimensional structure. </p><p>How do you do this? Like this: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eZxy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eZxy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 424w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 848w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 1272w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eZxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png" width="454" height="279.1173469387755" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:784,&quot;resizeWidth&quot;:454,&quot;bytes&quot;:74244,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eZxy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 424w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 848w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 1272w, https://substackcdn.com/image/fetch/$s_!eZxy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0290573b-fea9-4476-8d22-4e0ce68c8f43_784x482.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.aimspress.com/article/doi/10.3934/biophy.2022002">here</a></figcaption></figure></div><p>Let&#8217;s ignore the top column. <a href="https://guide.cryosparc.com/processing-data/all-job-types-in-cryosparc/ctf-estimation">CTF estimation</a> is an interesting subject that I&#8217;d recommend reading up on, but its&#8230;a bit disconnected from everything else we&#8217;ve been discussing and requires more background information. </p><p>The second step is something called <strong>particle picking (</strong>which basically also encapsulates <strong>particle extraction)</strong>. Before we get to actual 3D reconstruction, we need to visually isolate the <strong>good</strong> proteins. Remember, the noise-signal ratio in these micrographics is absolutely awful; many of the proteins here will be degraded, ice thickness will vary, the micrographics may have contaminates, and so on. We need to pluck out the segments of every micrograph that are promising and focus in on those. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Drmk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Drmk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 424w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 848w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Drmk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png" width="374" height="376.51006711409394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:894,&quot;resizeWidth&quot;:374,&quot;bytes&quot;:1013167,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Drmk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 424w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 848w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Drmk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0299864-19a8-404b-9938-9ab0fdf0bdc3_894x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.bnl.gov/nysds19/files/talks/session6/lin_nysds19.pdf">here</a></figcaption></figure></div><p>In practice, particle picking relies on either <a href="https://www.sciencedirect.com/science/article/pii/S2352711024000074">template matching</a> (user-predefined particles as templates for identifying particles in micrographs through image matching), or just training an image bounding-box model for identification of good particles via manual curation of a training set. <a href="https://www.nature.com/articles/s41597-023-02280-2">Either way, it seems like it&#8217;s a deeply annoying process to do.</a> It seems like the process is close to being fully automated, <a href="https://academic.oup.com/bioinformatics/article/40/3/btae109/7614090">but, circa 2024, there are still papers being published on New and Improved methods for doing it</a>, so I imagine there&#8217;s still ways to go. </p><p>Anyway, here&#8217;s what a final set of particles may look like, highlighted in a red box. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6y-f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6y-f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 424w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 848w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 1272w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6y-f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png" width="368" height="370.83076923076925" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:524,&quot;width&quot;:520,&quot;resizeWidth&quot;:368,&quot;bytes&quot;:630678,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6y-f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 424w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 848w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 1272w, https://substackcdn.com/image/fetch/$s_!6y-f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc205f2e0-8634-4cbe-be51-c3ff2d128f77_520x524.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.bnl.gov/nysds19/files/talks/session6/lin_nysds19.pdf">here</a></figcaption></figure></div><p>From here, we can move onto 2D classification. We&#8217;d ideally like to cluster our particles such that we have, e.g, 20 particles that are taken from the same view, another 20 that are taken from another view, and so on. The hope is that each of these groups are maximally different from one another on a pixel-by-pixel level, such that each group, if merged, can offer a &#8216;class average&#8217;. For example, in the image below, each row-column image shows one class average.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AzJ0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AzJ0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 424w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 848w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AzJ0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png" width="494" height="357.50911640953717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1032,&quot;width&quot;:1426,&quot;resizeWidth&quot;:494,&quot;bytes&quot;:1031762,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AzJ0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 424w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 848w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!AzJ0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f30c68f-1c34-45a6-ab8e-78c0252040c0_1426x1032.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://cryo-em-course.caltech.edu/documents/8702/part_6_-_single_particle_analysis_3bTNWgw.pdf">here</a></figcaption></figure></div><p>Why do this classification step at all? For the same reason we did atlas imaging and particle picking; we want to bump up the signal-to-noise ratio as much as possible. After this, we&#8217;ll end up with a set of class average images, a limited set of 2D views into what our 3D protein actually looks like. </p><p>Now what? We have three things left: defining the initial volume, 3D classification, and 3D refinement. The following paragraphs will be concerned with all these three topics at once, so I&#8217;ll stop calling out individual items.</p><p>Also, it&#8217;s at this point in a lot of cryo-EM lectures that you&#8217;ll need to start applying Fourier transforms to your class average to fumble your way towards reconstruction of the 3D shape. Truthfully, relatively little of that makes immediate sense to me, so let&#8217;s start off with staying in the pure image world, and then slowly motivate why turning to the frequency space makes sense. <a href="https://www.youtube.com/watch?v=gDgFbAqdM_c&amp;list=PL8_xPU5epJdctoHdQjpfHmd_z9WvGxK8-&amp;ab_channel=caltech">If you&#8217;d like to learn this specific area more deeply, you should check out Dr. Grant Jensen&#8217;s videos. </a></p><p>The first challenge is that we need some starting point &#8212; some initial guess at what our 3D structure might look like. This is the "<strong>initial volume</strong>" problem. There are a few ways to approach this, but, in the most <strong>ideal</strong> case, we have a 3D structure that already roughly looks like our protein. Obviously, this leads to a kind of chicken-and-egg problem. In practice, you can also rely on something less well structured: partial structures, low-resolution structures, and so on. But a pretty good structural prior is, unfortunately, necessary to do cryo-EM. In the machine learning section of the essay, we&#8217;ll discuss ways you can get around this requirement. </p><p>Once we define this initial volume, we&#8217;ll perform what are called &#8216;<strong>reprojections</strong>&#8217; of this volume. Across every angle of the initial volume, shown in purple, we&#8217;ll simulate the 2D projection of that initial volume. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9647!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9647!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 424w, https://substackcdn.com/image/fetch/$s_!9647!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 848w, https://substackcdn.com/image/fetch/$s_!9647!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 1272w, https://substackcdn.com/image/fetch/$s_!9647!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9647!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png" width="544" height="372.6242774566474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:948,&quot;width&quot;:1384,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:676134,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9647!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 424w, https://substackcdn.com/image/fetch/$s_!9647!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 848w, https://substackcdn.com/image/fetch/$s_!9647!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 1272w, https://substackcdn.com/image/fetch/$s_!9647!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4144407-287c-4628-bc2b-03aee6a35dff_1384x948.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now what? This:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d684!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d684!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 424w, https://substackcdn.com/image/fetch/$s_!d684!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 848w, https://substackcdn.com/image/fetch/$s_!d684!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!d684!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d684!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png" width="616" height="399.8076923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:945,&quot;width&quot;:1456,&quot;resizeWidth&quot;:616,&quot;bytes&quot;:1592809,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d684!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 424w, https://substackcdn.com/image/fetch/$s_!d684!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 848w, https://substackcdn.com/image/fetch/$s_!d684!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 1272w, https://substackcdn.com/image/fetch/$s_!d684!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29a332a8-b0df-40aa-a945-86950d8430c8_1608x1044.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On the left, we've got our "Set of images" (labeled 1-7) &#8212; these are our class average 2D images in our sample. They're noisy and blurry, but you can see they're all showing the same structure from different angles.</p><p>In the middle, we have "Model projections" (labeled a-e). These are simulated 2D views generated from our initial 3D model (that blue donut-shaped thing at the top). Obviously, these are artificially clean but serve as good &#8216;ground truths&#8217; for what a 2D image from a given angle should look like. </p><p>From here, we simply take every class average particle and compare them to every one of the model reprojections. For the closest matching reprojection, we average the reprojection with the class average! And if multiple class averages are close to the same reprojection, we simply take the average of <strong>those</strong> class averages alongside the reprojection! Slowly, we replace each reprojection via this averaging process. </p><p><strong>Then, because we know our initial model that created the reprojections, it is mathematically simply to alter the 3D initial model to reflect the modifications of the reprojections.</strong> After the first round, we expect a slightly better 3D structure. And then you simply repeat! </p><p>Each round gets us closer to the true structure, and you simply continue until you&#8217;re satisfied with the final structure! <strong>There&#8217;s also a version of this that uses expectation maximization, which is what is used in practice, but we&#8217;ll ignore that for now.</strong></p><p>Notably, this is why cryo-EM structures look blobby and semi-artifactual. <strong>We&#8217;re basically stretching and shrinking a pre-existing blob!</strong> Reusing the same picture from the start: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lAcR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lAcR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png" width="448" height="448" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1200,&quot;width&quot;:1200,&quot;resizeWidth&quot;:448,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ...&quot;,&quot;title&quot;:&quot;Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ...&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ..." title="Cryo-EM reveals &#8220;crown-like&#8221; structure of protein responsible for ..." srcset="https://substackcdn.com/image/fetch/$s_!lAcR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 424w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 848w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1272w, https://substackcdn.com/image/fetch/$s_!lAcR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7d103f8-c2e2-433e-92c7-e7cc63f08691_1200x1200.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, importantly, we&#8217;ve thus far been operating only in the 2D image domain. But in practice, cryo-EM 3D refinement occurs in the frequency domain via the Fourier transformation. Why is that? </p><p>In real space, if we wanted to figure out how similar two protein images are, we'd need to try aligning them in various ways &#8212; shifting them around, rotating them, and comparing pixel by pixel. This is computationally expensive and can be sensitive to noise. When we transform our noisy images into frequency space, we're essentially decomposing each image into a sum of simple wave patterns. Rotations of the protein in real space become simple shifts in frequency space. This means that instead of trying every possible rotation angle to align two images, we can directly compare their frequency patterns. Moreover, the Fourier transform naturally separates different scales of information &#8211; low frequencies capture the overall shape of the protein, while high frequencies represent fine details. This separation allows us to work hierarchically, first aligning the basic shapes before trying to match the detailed features.</p><p>TLDR: it&#8217;s faster and simpler. No reason you couldn&#8217;t operate in real space alone though, other than slower to compute!</p><p>And that&#8217;s about the entire cryo-EM process! Again, obviously I&#8217;m skipping a ton of nuance and details here, but this should all give you a decent mental model for how the whole system works. Time to move onto what machine learning problems exist here. </p><h1>Some machine learning problems in the area</h1><p>Each of these will be focused on a piece of <a href="https://www.cs.princeton.edu/~zhonge/">Ellen Zhong&#8217;s</a> work, <strong>primarily in the realm of image reconstruction</strong>. While there are many researchers in this space, relatively few have touched as many aspects of the ML problems here as her, so we rely on her work alone for convenience. </p><p>As is usually the case in my articles, I cannot do justice to every research problem in this area.  The focus on reconstruction is partially because that&#8217;s the whole <strong>point</strong> of cryo-EM, so any work there in improving it is pretty high impact. But it&#8217;s also because reconstruction is a pretty intuitively understandable topic. </p><h2>Conformational heterogeneity </h2><p>The traditional reconstruction methods we discussed earlier assume all these configurations can be averaged into a single meaningful structure. This assumption fails when proteins exhibit significant <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3724800/">conformational heterogeneity. </a>And, unfortunately for us, many interesting proteins do indeed demonstrate conformational heterogeneity. </p><p>In this case, our 2D projections aren't just different views of the same 3D object &#8212; <strong>they're different views of slightly different 3D objects.</strong> Each image potentially represents a unique conformational state (albeit likely not massively structurally different). The dimensionality of this problem becomes far more immense; naive reconstruction like we were previously doing would simply capture the most common conformation but ignore all the others. </p><p>One of Zhong&#8217;s most famous papers (and what became her thesis) published in 2020, titled <em><a href="https://www.cs.princeton.edu/~zhonge/assets/pdf/2021_cryodrgn_nature_methods.pdf">CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks</a>, </em>pokes at this problem. </p><p>CryoDRGN is a variational autoencoder that, rather than trying to sort particles into discrete classes or approximate them as linear combinations of base structures, <strong>will instead learn a continuous generative model of 3D structure via the 2D particles</strong>. The model consists of two main components that work together: an encoder network that maps 2D particle images into this latent space, and a decoder network that generates 3D density maps given the latent space. </p><p>Training begins with a dataset of particle images and their estimated viewing angles from consensus reconstruction. The core insight here is that we can learn both the conformational states and 3D structures simultaneously by asking: <strong>"What distribution of 3D structures could have produced these 2D images?".</strong></p><p>During each training iteration:</p><ol><li><p>The encoder network examines a particle image and predicts where in conformational space (latent space) that particle likely exists</p></li><li><p>The decoder network takes that latent space point, <strong>alongside its viewing angle</strong>, and generates a corresponding 3D structure.</p><ol><li><p><strong>Wait&#8230;how do we get the viewing angle of a particle? Isn&#8217;t that unknown?</strong> In practice, what the authors do is go through the usual reconstruction process we discussed above and capture an averaged structure that ignores conformational heterogeneity. From there, they use this averaged structure to grab the likely viewing angles of any given particle. This is actually one of CryoDRGN's main limitations (and one they point out in the paper) &#8212; it can only work if the conformational heterogeneity isn't so severe that it prevents getting decent angle estimates from homogenous reconstruction. </p></li></ol></li><li><p><strong>This predicted 3D structure is projected to 2D (used a non-machine learned equation) given the pose information.</strong></p></li><li><p>The difference between this projection and the actual particle image drives the loss function.</p></li></ol><p>For inference on the training dataset, the encoder network can map <strong>all</strong> particle image into the latent space, which gives us a distribution showing what conformational states were present in our sample. From there, we can simply use dimensional reduction techniques to learn at all possible conformations that exist in our dataset. <strong>There will likely be blobs of major conformations, smaller blobs of rarer conformations, and still yet intermediate state between all of them.</strong> </p><p>Reconstruction with no assumptions on the level of structural heterogeneity! Well&#8230;at least if the heterogeneity isn&#8217;t <strong>too</strong> strong, per my comment on the viewing angles. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J-7k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J-7k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 424w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 848w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 1272w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J-7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png" width="889" height="676" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:889,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:330973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J-7k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 424w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 848w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 1272w, https://substackcdn.com/image/fetch/$s_!J-7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff398bc19-52d0-4f85-8fac-c93a901f0e4a_889x676.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One important note that may already be obvious to you (but wasn&#8217;t to me upon first reading): CryoDRGN isn&#8217;t really a model with weights you can re-use. It has to be retrained again for each new protein! The model hasn&#8217;t learnt some generalizable understanding of 2D particles &#8594; 3D maps. </p><h2>Ab initio reconstruction</h2><p>Because of this demand for the angle of every particle, cryoDRGN still has to deal with the pesky initial volume problem. What if we dropped that requirement entirely? This is often referred to as &#8216;ab inito reconstruction&#8217;; ab initio meaning &#8216;from first principles&#8217;. This would be quite nice; it&#8217;d mean that basically no prior information about a set of particles would be necessary to reconstruct it. </p><p>Unfortunately, cryoDRGN can&#8217;t do that. </p><p>But cryoDRGN2 can! Enter another one of Zhong&#8217;s papers, published in 2021: <em><a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Zhong_CryoDRGN2_Ab_Initio_Neural_Reconstruction_of_3D_Protein_Structures_From_ICCV_2021_paper.pdf">CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images</a>.</em></p><p>With cryoDRGN, we approximate the result of this search space via the averaged single-conformation structure, and use that to generate an angle pose to feed into the model. <strong>With cryoDRGN2, we purposefully eschew this pre-made structure.</strong> But&#8230;we do need to start somewhere. And that somewhere is in whatever random 3D structure is produced by our untrained neural network with randomly initialized. Perhaps something that looks like this: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Bpx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Bpx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 424w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 848w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 1272w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Bpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png" width="378" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:378,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Bpx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 424w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 848w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 1272w, https://substackcdn.com/image/fetch/$s_!1Bpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d1eab1d-91c0-49e1-b6a6-9f16bce86bba_378x356.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> Garbage! But we&#8217;ll refine it over time. </p><p>From here on out, we go through each of our 2D protein particles and try to answer the following question: how does this this particle align with our (terrible) initial model? </p><p>How do we do this? First, some context: as you&#8217;ll recall, each 2D protein particle could have been taken from any 3D orientation (rotation) <strong>and</strong> any 2D translation. That's a 5-dimensional search space: 3 dimensions for rotation (SO(3) group) and 2 dimensions for translation (x,y shifts). SO(3) seems obvious, but why (x,y)? Because protein particles aren&#8217;t perfectly centered in our bounding boxes; they could be slightly off center. </p><p>So, in the brute force case, we&#8217;ll simply need to check every single possible re-projection of this initial model across every angle (SO(3)) and every translation (x,y). If we&#8217;re splitting up 15&#176; rotation in 3D space, that gives you 4,608 possible rotations, and if you're using a 14x14 translation grid, that's another 196 positions.</p><p><strong>That means for each of out thousands particle image, you need to check 4,608 &#215; 196 = 903,168 different possible 2D projections</strong>! Bit intractable. They do some tricks to help reduce the computational load of this (though the process is still expensive). Specifically, relying on Fourier spaces and doing &#8216;frequency marching&#8217;, which is where they start by matching low-frequency features and gradually incorporate higher-frequency details. Understanding these optimizations isn&#8217;t super useful in my opinion, but worth checking out section 3.2 of the paper if you&#8217;d like to know more. </p><p>Once we have poses (rotations + translations) for each particle image, we can update our neural network model (which is a simple MLP). This model is in charge of generating 3D structure. We push 2D particles + pose angles through the network and the network outputs a 3D structure, which hopefully should become more and more refined over time. At a certain point, we may say &#8216;<em>hey, our predicted 3D model is (probably) a fair bit better than this random noise we started off, we should update the poses</em>&#8217;, which kicks again kicks off the pose search process from before, updating all of the poses to hopefully be more accurate. Again, this search process is expensive, so we only run it intermittently. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IFMK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IFMK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 424w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 848w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 1272w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IFMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png" width="728" height="180.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:361,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:511641,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IFMK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 424w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 848w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 1272w, https://substackcdn.com/image/fetch/$s_!IFMK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F481ba240-e8b1-40c9-b3d5-5afa1c57a110_2088x518.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There&#8217;s also a &#8216;model reset&#8217; bit. What&#8217;s that? <strong>There&#8217;s an interesting pathology in cryoDRGN2 where the earliest pose information are useful for very coarse information, but practically useless for finer grained information.</strong> But, by the time the 3D structure improves, and the poses get better, such that we <strong>should</strong> be able to learn higher resolution information, the model has essentially learned to entirely ignore the fine-grained information theoretically contained within our particle image. To fix this, the authors simply reset the model weights and start training from scratch, while still relying on the improved 3D model. </p><p>Finally, what do we do during inference? At the end of the training pipeline, we have two things: a refined global structure used to get pose info, and a model capable of generating a possible 3D structure given a new particle. We <strong>could</strong> rely on this global model, but it implicitly assumes there is only one conformation in our dataset. What if we desired heterogeneity, similar to what cryoDRGN gave us?</p><p>We could simply follow a practice similar to cryoDRGN. But, remember, cryoDRGN2 is NOT a VAE! So we don&#8217;t have access to the underlying distribution outright, but there&#8217;s a simple fix. We simply take the embeddings of any given particle + pose info and use that to sift through the possible conformations. Simple! Ab initio <strong>and</strong> heterogenous reconstruction!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_w8t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_w8t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 424w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 848w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_w8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png" width="464" height="438.22222222222223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1088,&quot;width&quot;:1152,&quot;resizeWidth&quot;:464,&quot;bytes&quot;:576963,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_w8t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 424w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 848w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 1272w, https://substackcdn.com/image/fetch/$s_!_w8t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabe3db94-02f9-4e97-a576-2e35f819277d_1152x1088.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One note: If you&#8217;re feeling confused about why a randomly initialized model works <strong>at all</strong> to grab the (clearly useful) initial poses&#8230;I&#8217;m in a similar boat. That&#8217;s the one part here that doesn&#8217;t make sense to me. Hopefully someone more knowledgable than me reads this post and offers a good explanation! I&#8217;ll update it if that happens.</p><h2>Compositional heterogeneity</h2><p>This section wasn&#8217;t actually supposed to be a part of this essay, but a paper announcement from, yet again, Zhong&#8217;s lab during NeurIPS 2024 last week forced my hand. In a good way! This final section really rounds everything out quite well. </p><p>So, we&#8217;ve discussed conformational heterogeneity and ab initio reconstruction. Theoretically, the two biggest challenges to making cryo-EM more accessible (from a computational lens) are, at least on paper, solved. Surely there isn&#8217;t anything else! But there&#8217;s actually one way we could make cryo-EM even <strong>better</strong>: <strong>allow for multiple proteins at once to be imaged, all on the same grid.</strong> </p><p><strong>Compositional heterogeneity.</strong> </p><p>You&#8217;d be forgiven for thinking this wasn&#8217;t even possible. This essay has been strongly focused on purified, singular proteins, and had never even implied that mixing different proteins together was on the table. </p><p>That&#8217;s because, for most people, compositional heterogeneity is an unfortunate accident. </p><p>If you&#8217;re trying to image a virus with an antibody attached to it, your grids will inevitably have some particles with only virus or only an antibody. A similar phenomenon may happen for imaging large multi-chain proteins, some of which may be missing a subunit. There are methods to deal with this, but most people seem to treat it as a (thing I want) vs (things I don&#8217;t want) problem, binary classification. </p><p>But what if, instead, you purposefully put four fully independent proteins on your grid plate, and wanted to characterize the structure of each one? And, not only that, but you also want to continue to have the nice conformational heterogeneity awareness <strong>and</strong> ab initio reconstruction from cryoDRGN and cryoDRGN2? </p><p>Enter <a href="https://hydra.cs.princeton.edu/">Hydra</a>, from the paper &#8216;<a href="https://arxiv.org/abs/2412.09420">Mixture of Neural Fields for Heterogeneous Reconstruction in Cryo-EM</a>&#8217;, published in 2024, where the authors demonstrated exactly that for <strong>three structurally different proteins.</strong>  </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vfwW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vfwW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 424w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 848w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 1272w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vfwW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png" width="1851" height="702" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:702,&quot;width&quot;:1851,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:988510,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vfwW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 424w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 848w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 1272w, https://substackcdn.com/image/fetch/$s_!vfwW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf96caee-5684-4e1d-881e-b44e43bd31a2_1851x702.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>How does it work? Really&#8230;it&#8217;s quite similar to cryoDRGN2. The 5D pose search process is a fair bit more efficient, but a lot of the general concepts are the same. You&#8217;re still trying to derive pose information from images via the pose search, after which you feed that and the image into a model that reconstructs a 3D structure, reproject that to 2D via the pose information, and compare the true 2D pose to the predicted 2D pose. So, we can capture conformational heterogeneity and have ab initio reconstruction right out of the box.</p><p>How do we deal with compositional heterogeneity? Pretty simple: instead of a single model handling 3D reconstruction, there are now K models (referred to as a<a href="https://arxiv.org/pdf/2003.08934"> neural field</a> in the paper, but, really, it&#8217;s just a formal name for the same model as before that intakes in 2D particle images + poses and outputs a 3D structure), where K is equivalent to the number of structures that exist in your cryo-EM dataset. Each model specializes in learning one type of protein structure. How do we decide which model gets which protein? We don&#8217;t! When a 2D particle image comes in, all K models try to explain it and we learn which model explains it best, while also figuring out the pose and conformational state. All of the K models are connected in the loss function, so they&#8217;ll naturally learn to specialize in a specific protein class (admittedly, I&#8217;m skipping over a bit of confusing math here regarding <strong>how</strong> they are connected...take a look at section 3.3 for more details). <strong>At inference time, we simply go with the prediction of the model that seems to explain a given particle the best.</strong></p><p>How do we pick K if we don&#8217;t actually know how many proteins are in our solution? It&#8217;s not heavily discussed in the paper, but the authors do imply that oversized K&#8217;s work fine, and undersized K&#8217;s look clearly off. In a three protein problem, K&#8217;s of 1 failed, K&#8217;s of 3 worked, and K&#8217;s of 5 had 2 models that no particle ever had high negative log likelihood for. So&#8230;just mess with K until you get something reasonable. Who knows how well this scales, hopefully someone discusses that in a future paper.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ph7l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ph7l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 424w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 848w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 1272w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ph7l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png" width="858" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:858,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:239638,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ph7l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 424w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 848w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 1272w, https://substackcdn.com/image/fetch/$s_!ph7l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8d2195-370c-4996-ab9e-6ab5563710ed_858x402.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Again, compositional heterogeneity isn&#8217;t really ever exploited! Typical cryo-EM workflow has evolved to actively avoid dealing with it. It&#8217;s super interesting that ML has, in this area, somewhat outpaced what the wet lab techniques are actually capable of doing. And to be clear, it intuitively feels like physically preparing a compositionally heterogenous sample is <strong>hard</strong>; you&#8217;d have to deal with potential aggregation, protein-protein interaction, and cause uneven distribution of proteins in the vitreous ice. The future will tell us how hard scaling here is!</p><h1>What&#8217;s left?</h1><p>In this piece, I&#8217;ve primarily walked through how machine learning is changing cryo-EM reconstruction. <strong>But there&#8217;s still so much that I didn&#8217;t explore</strong>. Just last month, there was a paper applying <a href="https://www.nature.com/articles/s41592-024-02505-1">ML to the preferred orientation problem.</a> There&#8217;s also <a href="https://www.tandfonline.com/doi/full/10.1080/07366205.2024.2355771#d1e212">ongoing work</a> trying to derive <a href="https://www.owlposting.com/p/a-primer-on-molecular-dynamics">molecular dynamics</a> information from cryo-EM maps. I also stumbled across <a href="https://elifesciences.org/reviewed-preprints/103486">this paper that is trying to use ML to reduce necessary concentrations of purified proteins</a> in cryo-EM by making the particle picking process even better. It goes on from there, in basically every step of the sample preparation process I discussed above, there is <strong>someone</strong> throwing machine learning at the problem. Jury is still out on how valuable that is, but still!</p><p>And, even amongst the methods I&#8217;ve written about here, there is still work left to do! Cryo-EM reconstruction is still very much in the early days w.r.t these newer tools; partially due to the relative hesitance of structural biologists of modifying pre-existing workflows, and partially because methods like cryoDRGN2 still have <strong>plenty</strong> of kinks left to be worked out. </p><p>Do we expect structural models like Alphafold to be able to replace the role of cryo-EM anytime soon? As with pretty much all existing wet lab techniques, it is unlikely. The determination of ultra-large protein complexes (which cryo-EM excels at) via pure computational methods is still in a hazy place<a href="https://www.nature.com/articles/s41592-024-02174-0"> circa 2024</a>, though of course it may improve with time. </p><p><strong>And, ultimately, structural data is important for these models to work at all.</strong> The growth of the PDB was something done over decades, painstakingly curated by thousands of scientists. While many view the <a href="https://www.rcsb.org/">Protein Data Bank</a> as largely exhausted in utility &#8212; a form of fossil fuel that allowed for the creation of Alphafold2 &#8212; there may still be room for it to grow massively overnight. <strong>Compositional heterogeneity in cryo-EM feels like a brand new way of thinking about structure determination, something that may allow for us to simultaneously characterize tens, hundreds, perhaps thousands of proteins all in one experiment</strong>. Is there a world in which we could double the size of the sum total of structural data in a single year? Perhaps. And if there is, it feels deeply likely that cryo-EM, and the machine learning driving its future, will play a role in that. </p><p>That&#8217;s it, thank you for reading!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>When cryo-EM first came onto the scene, it <strong>had</strong> to use liquid ethane for cooling, which is kinda a pain to handle. Liquid nitrogen, which is more preferable to work, is not capable of cooling samples fast enough. Why not? People originally thought it was because of the <a href="https://en.wikipedia.org/wiki/Leidenfrost_effect">Leidenfrost effect!</a> The nitrogen would produce insulating layer of nitrogen gas around samples dropped into it, which slows down the cooling, giving water molecules enough time to form the aformentioned crystals. </p><p>But a <a href="https://journals.iucr.org/m/issues/2021/06/00/eh5013/index.html">study circa 2021</a> challenges this; they assert that the issue with liquid nitrogen has little to do with the Leidenfrost effect, the problem has to do with there being a thick layer of cold gas thats sits atop liquid nitrogen that pre-cools samples before they hit the liquid! With liquid ethane, this gas layer is simply thinner in size. Once they removed the gas layers from both, they found that they perform roughly the same. </p><p>So&#8230;why do people still use liquid ethane? The best answer I could find is that experimental determination people have an incredibly hard job, and will not easily switch over to new methodologies unless it&#8217;s quite easy. And the existing toolset works well, so why switch? </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The common answer is that copper is meant for structural support of the overlaid carbon film, which is quite flimsy. But then there&#8217;s an even more obvious question: why can&#8217;t we just abandon the carbon film entirely and just go with copper grids with even smaller holes? Well&#8230;&#8230;I&#8217;m not sure. If someone knows the answer to this, reach out to me! I&#8217;ll update the post with the answer if I find it. </p><p>Edit: Answered! Turns out this grid-of-grid system with the exact same material exists already, such as <a href="https://www.quantifoil.com/products/ultraufoil">UltrAuFoil</a> and <a href="https://www.quantifoil.com/products/hexaufoil">HexAufoils</a>, though it&#8217;s gold instead of copper. </p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[A primer on the current state of longevity research ]]></title><description><![CDATA[4k words, 19 minutes reading time]]></description><link>https://www.owlposting.com/p/some-questions-and-answers-i-had</link><guid isPermaLink="false">https://www.owlposting.com/p/some-questions-and-answers-i-had</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sat, 27 Jul 2024 01:04:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!y_4U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y_4U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y_4U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y_4U!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8609522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/146150062?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y_4U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!y_4U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b67a80a-ae24-488e-a5ef-f3f293fab02e_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: This post is co-authored with <a href="https://www.linkedin.com/in/li-stacy/">Stacy Li</a>, a PhD student at Berkeley studying aging biology! Highly appreciate all her help in writing, editing, and fact-checking my understanding!</em></p><p>Audio version of this essay:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;584c4085-e49f-4adf-865c-be10d7a3e3ff&quot;,&quot;caption&quot;:&quot;I go through some questions I&#8217;ve had about longevity research, with plenty of background info, and give the answers I&#8217;ve found.&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Some questions (and answers) I had about longevity research (Audio)&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:223596199,&quot;name&quot;:&quot;Abhishaike Mahajan&quot;,&quot;bio&quot;:&quot;biology posting&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb7749b-731a-42ac-96d4-e823c76fd218_400x400.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-07-27T01:07:27.893Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0404112-a6f9-495d-bb8b-f053f2fdfc4f_1024x1024.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.owlposting.com/p/some-questions-and-answers-i-had-e8b&quot;,&quot;section_name&quot;:&quot;Podcast&quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:147053083,&quot;type&quot;:&quot;podcast&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Owl Posting&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2caaecbe-ec6f-4c50-9596-c60ebade9ad3_400x400.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><ol><li><p><a href="https://www.owlposting.com/i/146150062/introduction">Introduction</a></p></li><li><p><a href="https://www.owlposting.com/i/146150062/did-the-therapeutic-focus-on-sirtuins-amount-to-much">Did the therapeutic focus on sirtuins amount to anything?</a></p><ol><li><p>Background</p></li><li><p>Answer</p></li></ol></li><li><p><a href="https://www.owlposting.com/i/146150062/have-the-longevity-focused-research-institutionsgrant-programsstartups-led-to-anything-significant">Have the longevity-focused research institutions/grant programs/startups led to anything significant?</a></p><ol><li><p>Background</p></li><li><p>Answer</p></li></ol></li><li><p><a href="https://www.owlposting.com/i/146150062/has-cellular-reprogramming-yielded-anything-useful">Has cellular reprogramming yielded anything useful? </a></p><ol><li><p>Background</p></li><li><p>Answer</p></li></ol></li><li><p><a href="https://www.owlposting.com/i/146150062/whats-the-state-of-biological-clocks">What&#8217;s the state of biological clocks? </a></p><ol><li><p>Background</p></li><li><p>Answer</p></li></ol></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.owlposting.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Owl Posting is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h1>Introduction</h1><p>The last time I read about aging research deeply was around 2021. The general impression I was getting was that aging research was increasingly more and more funded (good!). Unfortunately, none of the money led to actionable or useful insights (bad). </p><p>Over time, you get slightly burnt out by all the negative news. </p><p>After getting a job in biotech, I kept a hazy eye on the subject but mostly tuned out of it entirely. But, especially today, I am curious: how has the aging field progressed in the last few years? Since 2021, what has changed?</p><p>In this post, I&#8217;ll share a list of immediate questions about the state of affairs in aging research, and the answers I&#8217;ve found for them. For each question, I&#8217;ll offer some basic background knowledge required to understand the question. Feel free to skip that part if you already understand the question! </p><h1><strong>Did the therapeutic focus on sirtuins amount to much?</strong></h1><h2><strong>Background</strong></h2><p>Sirtuins are a family of signaling proteins, commonly referred to by their corresponding gene name, SIRT1, SIRT2, all the way up to SIRT7. Their primary role is deacetylation, which is just the removal of a chemical marker (acetyl) on proteins. It was noticed in the 1980s that some sirtuin classes were especially involved in three key activities: modifying histones, which are proteins that tune the accessibility of DNA in the nucleus, transcriptional modification, which determines how DNA is interpreted by the body, and DNA repair, which speaks for itself. And anything involved in modifying and maintaining DNA is something worth paying attention to!</p><p>Studies in the 2000s showed that the activity of specific sirtuin classes strongly correlated with age; the young had more sirtuin activity, and the old had less. This seemed to be causative in aging;<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC317077/"> overexpressing certain sirtuin genes led to lifespan increase</a> and<a href="https://www.nature.com/articles/35001622#:~:text=Sir2%20is%20a%20limiting%20component%20that%20promotes%20longevity,much%20longer%20life%20span%20than%20wild%20type%206."> downregulation of them led to lifespan decrease</a>. The results were a bit mixed, and the results were for yeast cells &#8212; always a red flag &#8212; but there was some promise in viewing sirtuins as an aging target.</p><p>It turns out that editing humans to safely overexpress sirtuin genes is somewhat hard to do (as is expressing any gene in humans). But there was an easy way around that: focus on molecules that are required for sirtuin to do its job. A class of therapeutics grew from this:<a href="https://en.wikipedia.org/wiki/Sirtuin-activating_compound"> sirtuin-activating compounds</a>.</p><p>How do you activate sirtuins?</p><p>Well, sirtuins are dependent on NAD+, or nicotinamide adenine dinucleotide, to perform their primary function. Increasing cellular NAD+ levels could also be a way to indirectly push for more sirtuin activity. Practically speaking, NAD+ bioavailability is poor, so supplementation with precursors to NAD+, such as nicotinamide mononucleotide (NMN) and nicotinamide riboside (NR), was instead used. There are plenty of other compounds in this category too: resveratrol, fisetin, and quercetin are all names you may hear mentioned.</p><p>How has this fared?</p><h2><strong>Answer</strong></h2><p><em>TLDR: The whole sirtuin theory was losing steam by the time I started reading about it a few years ago. It&#8217;s only gotten worse. Nothing particularly useful has come from sirtuin-focused therapies, and likely nothing ever will.</em></p><p>A<a href="https://www.cell.com/cell-metabolism/fulltext/S1550-4131%2818%2930112-8?%7B%24trackingTag%7D=&amp;amp%3Belsca2=email&amp;amp%3Belsca3=1550-4131_20180306_27_3_&amp;amp%3Belsca4=Cell+Press"> Cell paper from 2018</a> found that NAD+ precursor supplementation didn&#8217;t improve mice longevity. To be fair, they did show that supplementation improves some aspects of health-span, specifically improved glucose metabolism and reduced oxidative stress to the liver in aged mice, so still potentially useful. But nothing revolutionary. <strong>Still,</strong> <strong>human clinical trials for sirtuin-activating compounds were just beginning around 2021, so there was some nascent hope that something interesting would come from it</strong>.</p><p>But, as is usually the case, yeast cells aren&#8217;t a great model for drugs, and the death of the sirtuin theory has only accelerated upon being exposed to more complex organisms.</p><p><a href="https://www.sciencedirect.com/science/article/pii/S2161831323013595">A 2023 review of all ongoing NAD+ focused clinical trials found underwhelming results.</a> While it may have promise for diseased populations and helping healthspan (very, very mildly), it doesn&#8217;t seem to be the wonder longevity drug that people initially thought it was.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!J2HU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!J2HU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!J2HU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg" width="543" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:543,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!J2HU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 424w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 848w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!J2HU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd60e1b5-3865-4bb3-ad8b-26ecc3e249b3_543x610.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><a href="https://academic.oup.com/lifemeta/article/1/2/122/6711379">More importantly, per a review paper from 2022, the whole basis of sirtuins as longevity genes is likely fundamentally flawed.</a> This doesn&#8217;t mean NAD+ supplementation itself is a bad thing, as it is central to metabolism and has minor benefits, just that the focus on its impact on sirtuins itself have led the field astray.<a href="https://www.fightaging.org/archives/2011/09/sirtuins-are-increasingly-looking-like-a-dead-end/"> A prescient blog post from 2011 sketches out the bear argument much more deeply</a>.</p><h2><strong>Have the longevity-focused research institutions/grant programs/startups led to anything significant?</strong></h2><h2><strong>Background</strong></h2><p>Longevity research was, for a very long time,<a href="https://www.youtube.com/watch?v=qzQBrEnbCT0&amp;t=1s"> viewed as near pseudoscience by</a> much of the academic community. It was characterized as having a lack of rigor, being driven by cranks, and generally not being worth most researchers time. Cynthia Kenyon&#8217;s pioneering work in the 1990s at UCSF studying the intersection of longevity and caloric restriction changed this. Her work was not only valuable scientifically, but also reputationally amongst academics, elevating longevity research to something more respectable.</p><p>But it remained underfunded by federal eyes, partially because aging still wasn&#8217;t considered a disease, so NIH grants for it were slim. Researchers focused on aging were forced to apply to tangentially related grants on, for example, Alzheimer&#8217;s, and<a href="https://www.nia.nih.gov/research/blog/2017/11/yes-researchers-basic-biology-aging-can-be-funded-alzheimers-money"> often worried that they&#8217;d be rejected due to a lack of background in Alzheimer&#8217;s specifically</a>.</p><p>As of 2024, that has completely changed. Today,<a href="https://www.nia.nih.gov/research/blog/2023/09/looking-ahead-new-cleared-concepts-aging-research?utm_source=nia-twitter&amp;utm_medium=social&amp;utm_campaign=news-20231117"> far more NIH grants are dedicated specifically to subfields within aging</a>. While the number of these grants still pales in comparison to diabetes or cancer-related grants, there is far more federal recognition of longevity as a useful scientific topic worthy of study.</p><p>There&#8217;s also an immense amount of non-federal money flowing in, much of it popping up in the last 3-4 years. <a href="https://age1.com/">Age1</a> is a VC that focuses purely on longevity startups. <a href="https://impetusgrants.org/">Impetus Grants</a> give equity-free money to scientists who are doing longevity projects. Altos Labs, Retro Bio, and NewLimit are all billionaire-backed, for-profit, longevity startups.</p><h2><strong>Answer</strong></h2><p><em>TLDR: No, but it&#8217;s a bad question.</em></p><p>After writing the background and pondering on what the answer would be, I&#8217;ve realized the question is a bit unfair.</p><p>For one, it&#8217;s still very early. Good biology research in general can take years, and good aging research can take even longer. Seeing any semblance of an ROI within three years of increased longevity funding is a pipe dream, we should expect it more on the order of ten years.</p><p>But, more importantly, it&#8217;s hard to tie back funding specific institutions to clinically relevant outcomes, because you never know what the counterfactual would be. Maybe the next big longevity discovery doesn&#8217;t come from an aging-focused institution, but rather from a more basic metabolic disorders lab that used a diabetes grant to fund themselves. Does that mean the longevity dollars were wasted? I don&#8217;t think so.</p><p>Scientific discovery works in strange ways, research from longevity could end up impacting disparate fields from itself, leading to returns in unexpected manners. As long as the research questions being investigated using this influx of money are interesting, I think the money is well spent. And I think those research questions are <strong>very</strong> interesting.</p><p>Unlike the longevity institutions of the past, most of these newer ones have far grander ambitions than interventions like calorie restriction, supplements, and sauna usage. Instead, they focus on areas that, if realized, would yield fundamental step changes in human lifespan. Fields like cryogenics and cellular reprogramming. If there is anybody in the world I&#8217;d trust to be given billions of dollars, it&#8217;d be smart and ambitious people with that research plan.</p><h1><strong>Has cellular reprogramming yielded anything useful?</strong></h1><p><em>Note: this section genuinely would not be possible if not for <a href="https://www.adanguyenx.com/blog/partial-reprogramming">Ada Nguyen&#8217;s amazing deep-dive on partial cellular reprogramming</a>. Highly recommend reading her article if deeply interested in this topic.</em> </p><h2><strong>Background</strong></h2><p>In 2006, Shinya Yamanaka discovered that upon introducing 4 transcription factors (proteins that regulate the DNA&#8594;protein process) to skin cells, they would slowly convert themselves to stem cells. These cells were referred to as <strong>induced pluripotent stem cells</strong> (iPSC&#8217;s); induced because it was &#8216;forced&#8217; to happen by the transcription factors, and pluripotent because they could re-differentiate into any other cell type (heart cell, liver cell, etc). Yamanaka won the 2012 Nobel Prize for this discovery, later deemed &#8216;cellular reprogramming&#8217;.</p><p>iPSC&#8217;s were somewhat of a revolution in the stem cell field generally because it meant we were able to mass produce stem cells from ordinary cells. While the 4 discovered transcription factors &#8212; <em>Oct4</em>, <em>Sox2</em>, <em>Klf4</em>, and <em>c-Myc</em>, also often called <em>OSKM</em> or <em>Yamanaka Factors</em> &#8212; weren&#8217;t universal across cell types, the concept was. <strong>Nearly every cell had a genetic switch for being turned into iPSCs.</strong></p><p>But the relevance of iPSC&#8217;s to the discussion of longevity has little to do with the &#8216;stem cell&#8217; part of it. The process by which a differentiated cell turns into a stem cell is gradual, taking weeks. Over this period, as cellular identity is slowly being stripped away, the cell is also <strong>rejuvenating on a biochemical level</strong>.</p><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9845736/">Mitochondrial morphology is improved. Epigenetic noise is stripped away. Telomeres are lengthened</a><strong><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9845736/">.</a> </strong>And, upon re-differentiation of the iPSC to a cell type, the improvements are retained! <strong>One could make an argument that cellular reprogramming leads to age reversal. </strong>Is this <strong>really</strong> age reversal though? Well, we&#8217;ll get into that in a second. On the surface though, it does seem like there is something relevant to longevity going on here.</p><p>Cellular reprogramming was initially more of a curiosity than something clinically translatable since in-vivo iPSC conversion would be massively disruptive to a complex organism. You <strong>could</strong> do it ex-vivo; remove cells from a human, allow them to undergo the iPSC conversion + redifferentiation process, and transplant them back in. Unfortunately, that&#8217;s infeasible for most tissue types. However the 2016 discovery of<a href="https://www.cell.com/cell/fulltext/s0092-8674(16)31664-6"> partial cellular reprogramming</a> by Ocampo et al. lent fire to the therapeutic potential of this approach.</p><p>Typical cellular reprogramming requires cells to be constantly exposed to OSKM over weeks, as the reprogramming process also takes weeks. <strong>But if you instead halt exposure to reprogramming factors after just a few days, cells can retain their original identity while still holding onto the rejuvenation benefits.</strong> Such a methodology is termed &#8216;partial cellular reprogramming&#8217;. This means that in-vivo cellular reprogramming is a very real possibility and, indeed,<a href="https://www.nature.com/articles/s43587-022-00183-2"> has been done</a>.</p><p>Quick note:<a href="https://www.nature.com/articles/s42003-024-06328-w"> in practice, in-vivo partial cellular reprogramming is done cyclically!</a> The transcription factors are expressed for a few days, turned off (via doxycycline-modifiable promoters) for a few days, and repeated. This allows cells to slowly &#8216;accrue&#8217; the positive impacts of reprogramming, while also allowing time to recover from the cellular stress of reprogramming,<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10861195/"> Singular, short doses of transcription factors also have some positive impact</a>, but generally less so than the cyclic approach.</p><p>The stage was set for a brand new therapeutic platform based on partial cellular reprogramming and, accordingly, money flowed in. Altos Labs<a href="https://www.drugdiscoverytrends.com/biotech-altos-labs-emerges-with-3b-in-funding-to-focus-on-cellular-rejuvenation-programming/"> launched in 2022 with $2 billion</a> and backing from Jeff Bezos. Retro Biosciences<a href="https://www.technologyreview.com/2023/03/08/1069523/sam-altman-investment-180-million-retro-biosciences-longevity-death/"> launched in 2023 with $180 million</a> and backing from Sam Altman. NewLimit<a href="https://blog.newlimit.com/p/newlimit-series-a"> launched in 2023 with $40 million</a> and backing from Brian Armstrong. All of them relied on cellular reprogramming being a reliable, robust, and efficacious therapeutic for extending human lifespan.</p><p>Of course, as is always the case in biology, there are caveats to the whole approach.</p><ol><li><p><strong>Reprogramming is difficult to do safely.</strong> This is related to the delivery problem, but more specific. Even if you can correctly deliver a reprogramming therapy to all the cells in your body, there are tons of medically concerning &#8216;gotchas!&#8217;. As an example, while all cells can be reprogrammed,<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9845736/"> there are degrees of &#8216;plasticity&#8217;.</a> For example, hepatocytes can be reprogrammed rapidly, while cardiomyocytes take more time. If you go too far with the reprogramming, you get tumors, too little and there&#8217;s no benefit. This complicates any attempt to reprogram, as it means each cell type requires a hyper-controlled dosage of reprogramming factors &#8212; something that modern medicine isn&#8217;t currently capable of. Even localized injection of these transcription factors doesn&#8217;t solve this, as most organs and tissues are heterogenous (composed of different cell types).</p></li><li><p><strong>While in-vivo partial cellular reprogramming improves some known biomarkers of age, they don&#8217;t benefit everything.</strong> As an example, epigenetic noise is stripped away during reprogramming, but telomere length and DNA mutations stay largely the same. This is especially obvious when you look at the in-vivo results of partial cellular reprogramming. While some papers do report lifespan benefits, it isn&#8217;t immortality, more on the order of a 10-20% bump in average lifespan along with a bevy of health-span benefits (increased grip strength, improved body composition, etc). Impressive, but there are facets to aging that are unaffected by cellular reprogramming.</p></li><li><p><strong>In-vivo delivery of reprogramming therapies is challenging. </strong>As with every otherwise promising genetic therapy, getting them to where they need to go is an unsolved problem. Plugging the company I work at,<a href="https://www.dynotx.com/"> Dyno Therapeutics</a>, as someone trying to solve the issue, but it&#8217;s still early days. Nobody yet has a tool that can tack on genetic material into every desired cell in your body, we&#8217;re years away from that. The only reliable way to do this in animal models is by editing germ-line cells, but we obviously can&#8217;t do that in humans.</p></li></ol><h2><strong>Answer</strong></h2><p><em>TLDR: Cellular reprogramming has shown promise in animal studies. However no therapies have reached clinical trials yet. One is close though!</em></p><p>There are several promising animal results. One of the strongest results here is an<a href="https://www.biorxiv.org/content/10.1101/2023.01.04.522507v2"> early 2023 study by Macip et al.</a> demonstrating that cyclic expression of transcription factors in wild-type mice could extend their remaining lifespan by 109% when started at a very old age (124 weeks). This translates to 8.86 weeks of life remaining in the control group versus 18.5 weeks for the treatment group. Even more interestingly, they achieved this with AAV-delivered gene therapy! It is as close to &#8216;<em>how humans would receive the therapy</em>&#8217; as one could get. One of the major caveats about animal studies is that they rarely translate easily to humans, but still! Impressive!</p><p>There are a fair number of more concrete concerns with the study,<a href="https://what-the-cell.com/2023/03/06/back-to-the-benjamin-button-mice-making-old-mice-lived-longer/"> such as the ones detailed here</a>, but the usual concerns with longevity papers aren&#8217;t an issue here. The therapy was AAV-delivered, it was done with genetically unmodified animals, and there was a clear + strong signal of lifespan increase instead of a more nebulous &#8216;biological clock&#8217; decrease. The main issue is just that the sample size wasn&#8217;t large enough: only 8 control mice and 12 drugged mice. Overall, it&#8217;s an excellent and well-supported result.</p><p>Ultimately, for-profit companies are the ones who are going to bring this stuff to the clinic. As of yet, there is nothing within reprogramming that is being actively used in clinical trials. However, one startup is close!</p><p>One may expect<a href="https://rejuvenatebio.com/"> Rejuvenate Bio</a>, which published the earlier mouse reprogramming paper, to be this one startup. However, judging from news reports, they are focusing more on gene therapy for arrhythmogenic cardiomyopathy, a congenital condition; i.e., not age-related. Longevity still seems to be on their radar, but I&#8217;m not seeing much further progress in the whole direction outside of the above paper.</p><p>So, if not Rejuvenate Bio, who else?</p><p><a href="https://www.turn.bio/">Turn Biotechnologies</a>. As far as I can tell,<a href="https://www.prnewswire.com/apac/news-releases/fda-meeting-feedback-puts-turn-biotechnologies-on-track-to-be-first-longevity-company-taking-cell-rejuvenation-therapy-to-clinic-301965868.html"> they are the only one amongst all reprogramming startups</a> that have something close to the clinic: an injectable drug for dermatology. Interestingly, they also seem to be the only ones with<a href="https://static1.squarespace.com/static/5c18d21eb98a783ec469f783/t/64d587e03621587b1f05749e/1691715569526/TurnBio-ISID-2023-FINAL.pdf"> positive results in human cells</a>! It&#8217;s entirely ex-vivo, but still, a fair bit closer to translation than anybody else. Even more interestingly,<a href="https://www.prnewswire.com/news-releases/turn-biotechnologies-launches-eturna-delivery-platform-designed-specifically-to-enable-nucleic-acid-therapeutics-301622784.html"> they claim to have solved the delivery problem of transcription factors via &#8216;nanostructure carriers&#8217;.</a> I&#8217;m not finding many details on this approach, so we&#8217;ll see how well it fares. While results in ex-vivo settings are promising, in-vivo settings are where we&#8217;d want a promising drug to have results.</p><p>There are other interesting startups in the cellular reprogramming space, but there are relatively few details on their internal progress. There are really only two that have given some very mild insight into what&#8217;s internally going on: NewLimit and Altos Labs.</p><p><a href="https://blog.newlimit.com/">NewLimit gives monthly updates</a> on how their partial cellular reprogramming work is going. They are investing in a high-throughput discovery platform for finding partial reprogramming transcription factors, both from a wet-lab angle and an ML angle. Nothing revolutionary yet, there&#8217;s still a lot of set-up work going on internally.</p><p><a href="https://www.altoslabs.com/">Altos Labs</a> is notoriously private, but I did see a<a href="https://twitter.com/ydeigin/status/1808988520196743207"> video a bit back about their work on cellular reprogramming.</a> They report a 25% bump in total lifespan alongside some qualitative health-span improvements using a therapeutic based on Yamanaka factors (quoted from the video) across 1000 mice. How impressive is this? It very much depends on their experimental conditions. For example, it depends on what mice they are using. The &#8216;max lifespan&#8217; statistic can be gamed by using mice with certain genetic disorders, such as Progeria or metabolic conditions, leading to results that don&#8217;t necessarily transfer to healthy mice. Looking forward to the full paper on this!</p><h1><strong>What&#8217;s the state of biological clocks?</strong></h1><h2><strong>Background</strong></h2><p>Biological clocks are a class of methods to utilize biomarker data to predict an individual&#8217;s <em>biological age </em>(BA). These models are typically trained on samples with known <em>chronological age </em>(CA): consequently, the difference between an individual&#8217;s predicted BA and known CA suggest either acceleration or deceleration of biological aging. In other words, if your biological age is lower than your chronological age, you&#8217;re doing great.</p><p>Why do we care about biological clocks? For the same reason we care about HbA1c for T2D diabetes patients; <strong>we need some sort of clinically relevant and quantitative endpoint if we want to build longevity therapeutics</strong>. Why not just use total lifespan? For one, it takes a <strong>really</strong> long time to assess, but more importantly, it&#8217;s a very coarse-grained view of the potential impact of longevity therapeutics. If two drugs both lead to a +5 years in total lifespan, but one operates off preventing tumors and the other operates off of reducing sarcopenia, that&#8217;s information you&#8217;d like to know! More importantly, that&#8217;s information that we need to understand the real underpinnings of biological aging: thus, we have a need for more granular biological clocks.</p><p>There are some more easy-to-measure biomarkers on the table;<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6778477/"> grip strength</a>,<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7681064/"> graying hair</a>, and<a href="https://www.nature.com/articles/s41419-018-0765-9"> wrinkled skin</a> all show strong correlation with age, even within mice and primates! But even this still is coarse, the biological action of a given therapeutic may still be obscured with a phenotype-specific endpoint. Ideally, we&#8217;d like something more molecular.</p><p><strong>The most commonly discussed biological clocks are </strong><em><strong>epigenetic</strong></em><strong> clocks</strong>. The chemical markers that cover your DNA &#8212; or epigenetics &#8212; affect how it&#8217;s converted to RNA, and thus, to proteins. These markers are massively dynamic throughout one's life, being affected by everything from<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6275017/"> diet</a>,<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9915026/#:~:text=In%20principle%2C%20there%20are%20two,targets%20(i.e.%2C%20receptors%20as%20their"> medications</a>, and even<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5835037/"> lack of sleep</a>. One of these markers in particular, methyl groups, is highly relevant in estimating biological aging. <strong>A 2013 paper by Steve Horvath discovered that methylation sites &#8212; often referred to as CpG sites &#8212; are connected to age in a relatively simple way.</strong> As a creature ages, the ratio of unmethylated CpG to methylated CpG sites goes up, reflecting a general loss of methylation in aging. This is also known as<a href="https://pubmed.ncbi.nlm.nih.gov/24138928/"> Horvarth&#8217;s Clock</a>.</p><p>There are several others, but have had a relatively smaller footprint, scientifically speaking. Let&#8217;s rattle through them.</p><p><strong>Telomere length,</strong> the single hallmark of aging that has leaked into popular consciousness, is another way to measure aging. Telomeres are repetitive &#8216;ends&#8217; on chromosomes, likely meant to serve as a buffer against the inevitably noisy process of cell division.<a href="https://pubmed.ncbi.nlm.nih.gov/33552142/"> The longer it is, the lower your biological age, or at least that&#8217;s what one would hope.</a></p><p><strong>Transcriptomics</strong> is another angle. The transcripts, or RNA, in your cells are dynamic to a similar degree as epigenetic markers. Likely even more so! And though it&#8217;s not as simple as a ratio of markers to age, like Horvarth&#8217;s Clock, or higher-is-better, like telomeres, a mapping from transcripts to biological age may still be possible.<a href="https://onlinelibrary.wiley.com/doi/full/10.1111/acel.13320"> And indeed, early results on worms show it is!</a></p><p><strong>Proteomics </strong>examines proteins, the end state of transcripts and another potential marker of age. It&#8217;s the same story as with transcripts; proteins are important, dynamic, and potentially show a (complex) relationship with age.<a href="https://www.nature.com/articles/s41591-019-0673-2.epdf?sharing_token=i53wRtPOTh_C1l9nWzkuP9RgN0jAjWel9jnR3ZoTv0MI3Y1S6inq9eI1Y5GonWP6JFlmolgaRSsCpoEwhbJBYO-sewesiNlzKNRgKH4h_EiJdFpr7iIqtjh9LX8JbZoOIu7rD8fHSrgKjK1mH8DtWmTo_8HYrh8JoWki0iM6JOpGxg39Vq6lJ8G2qVPheshA3hWlA4BOa7YAHDSckKjeY3wALBsBcKb9-vTJ035nydg%3D&amp;tracking_referrer=www.statnews.com"> Again, early results show that they do</a>!</p><h2><strong>Answer</strong></h2><p><em>TLDR: Not great, not terrible. Lots of work left to do. Everyday consumers probably shouldn&#8217;t get their biological age tested.</em></p><p>Let&#8217;s go through all the methods from above, in the same order.</p><p><strong>Epigenetic clocks seem to be useful, but their interpretation is a bit complicated.</strong> While we&#8217;ve been discussing biological clocks, we&#8217;ve unconsciously accepted the axiom that as long as a clock correlates with chronological age, it also correlates with biological age, or, in other words, age-related physiological malfunction. But this isn&#8217;t necessarily true! Tree rings correlate with tree age, but don&#8217;t seem to match up as closely with the tree actually being worse off. Clocks based on epigenetic ages aren&#8217;t dissimilar to this.<a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1824-y"> Clearly, they do correlate with age-related cellular dysfunction</a>. <strong>But, depending on how the clock is constructed, they can also be driven by entirely time-based stochastic processes unrelated to cellular dysfunction</strong>.<a href="https://www.nature.com/articles/s43587-024-00634-y"> Three 2024 papers published in Nature Aging all confirm this</a>; the methylation status of 66 to 75 percent of methylation sites are driven by random processes. More importantly, this trend holds across different independent experiments and explains a large amount of power from pre-existing clocks. To be clear, these methylation changes likely aren&#8217;t truly random, but quasi-random: they occur within specific genomic loci and have probable directional changes depending on the system&#8217;s initial methylation state. Our current inability to predict or explain these changes makes them appear random, but underlying deterministic mechanisms may exist that we haven't yet uncovered.<a href="https://www.nature.com/articles/s43587-022-00220-0?fromPaywallRec=false"> Similarly, a 2022 paper confirms that epigenetic age is correlated with some, but not all known cellular hallmarks of aging</a>. All this to say that epigenetics clocks are certainly useful, but their role in giving a clear and simple endpoint for age-reversal therapeutics is fuzzier.</p><p><strong>Moving on, telomere length</strong> <strong>doesn&#8217;t actually seem to be predictive of much, so clocks based on it are suspect</strong>.<em> </em>A<a href="https://www.nature.com/articles/s41556-022-00842-x"> 2022 review</a> reports that the model of telomere shortening as a primary cause of aging has been largely superseded by a model of telomere <em>dysfunction</em>, where telomere damage matters more than length. What is telomere dysfunction, if not shortened lengths? Well, dysfunction is correlated with length, but it has more to do with telomere damage in general, especially chronic damage. This sort of damage, whether that&#8217;s due to oxidative stress or something else, can cause persistent inflammation responses. In turn, this leads to aging-esque phenotypes. Notably, this means that even telomeres that are long &#8212; relatively-speaking &#8212; can still be &#8216;bad&#8217; telomeres if they had undergone this chronic damage process. All of this implies that relying on telomeres as a biological clock is iffy, since the &#8220;history&#8221; of a telomere matters and a snapshot won&#8217;t tell the full story.</p><p><strong>Clocks based on transcripts and proteins</strong> <strong>have lots of promise and I&#8217;m excited to see where they go. </strong>There are papers that can draw strong connections between<a href="https://www.nature.com/articles/s43587-022-00317-6"> transcripts</a> and<a href="https://www.nature.com/articles/s43587-022-00317-6"> protein expression levels</a> to age-related cellular markers of age, organ dysfunction, and mortality risk. There&#8217;s a huge amount of papers in this space, with relatively few detractors, so the whole direction does feel quite promising. In a more qualitative sense, there&#8217;s also a sense of neatness regarding the mechanism behind why transcript and protein clocks work at all. What is that mechanism?<a href="https://www.cell.com/trends/genetics/fulltext/S0168-9525(24)00027-1#:~:text=Transcript%20expression%20decreases%20with%20gene,%2C%20an%20age%2Dassociated%20disease."> A 2024 review paper covers it in great deal</a>, theorizing that it is the &#8220;<em>relative increase of the expression of short genes and a relative decrease of the expression of long genes</em>&#8221;. This length-dependence phenomenon repeats across multiple species, is connected to DNA damage &#8212; a known marker of age &#8212; as longer genes have more DNA available to mutate, and t<a href="https://www.nature.com/articles/nature14319">here is some early evidence suggesting a connection to methylation as well</a>. Both the practical and theoretical arguments behind these clocks are strong, but the area is still quite nascent, and future failure modes are always possible. Looking forward to future work!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.owlposting.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Owl Posting is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[A primer on GFP and esmGFP]]></title><description><![CDATA[2.6k words, 12 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-gfp-and-esmgfp</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-gfp-and-esmgfp</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Thu, 27 Jun 2024 15:19:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!PgTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PgTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PgTF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PgTF!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8237355,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/145994544?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PgTF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!PgTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc58840d-96ac-498b-b439-bb0ec33cde4a_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This post is co-written with Tobias Schraink, a former colleague at Dyno Therapeutics.  He&#8217;s exceptionally talented at computational biology, building up data pipelines, and answering questions about biology from curious people (me). Incidentally, he&#8217;s job hunting! DM him on <a href="https://twitter.com/TobiasSchraink">Twitter</a> or <a href="https://www.linkedin.com/in/tschraink/">Linkedin</a> if his profile seems interesting!</em></p><ol><li><p><a href="https://www.abhishaike.com/i/145994544/introduction">Introduction </a></p></li><li><p><a href="https://www.abhishaike.com/i/145994544/what-is-gfp">What is GFP?</a></p></li><li><p><a href="https://www.abhishaike.com/i/145994544/what-is-esmgfp">What is esmGFP?</a></p></li><li><p><a href="https://www.abhishaike.com/i/145994544/conclusion">Conclusion</a></p></li></ol><h1>Introduction</h1><p><a href="https://evolutionaryscale-public.s3.us-east-2.amazonaws.com/research/esm3.pdf">ESM3 dropped a few days back</a>, along with the announcement of its parent company, <a href="https://www.evolutionaryscale.ai/">Evolutionary Scale</a>. Besides the typical metrics, they included one other curious benchmark for success: <strong>creating an evolutionarily distinct version of GFP, or green fluorescent protein</strong>. This new protein, roughly 100 mutations away from wild-type GFP, was deemed &#8216;esmGFP&#8217;.</p><p>This isn&#8217;t an alien benchmark. <a href="https://www.profluent.bio/">Profluent</a> also did something similar, <a href="https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1">flexing their recent model by using it create a Cas9-like protein</a>, which was <strong>400 mutations away</strong> from the wildtype version of it. As with Profluent&#8217;s redesign, the true accomplishment of ESM3 has marginally little to do with the GFP redesign, it&#8217;s just a means to demonstrate the power of the model. </p><p>Prior to reading the paper, I had only somewhat heard of GFP via its use in protein reporting and <a href="https://news.stanford.edu/stories/2017/02/glowing-mice-suggest-new-gene-therapy-technique">usage in glowing rats</a> (which, as it turns out, isn&#8217;t GFP,<a href="https://www.nature.com/articles/3300951"> but some other less immune-stimulating fluorescent protein</a>). After some reading, I thought a quick rundown on GFPs and this new esmGFP might be interesting! </p><p>That said, I'm keeping this much shorter than my <a href="https://www.abhishaike.com/s/primers">usual primer posts</a>. It&#8217;s GFP &#8212; there's only so much to say!</p><h1>What is GFP? </h1><p>It&#8217;s a 238 amino acid, structured protein that lights up green (hence the name) when you shine specific types of light on it. First discovered in jellyfish, it has, over the course of 70~ years, become one of the most widely used tools in molecular biology. </p><p>The best way to give an overview of it is via an FAQ, so let&#8217;s do that. </p><p><strong>How does it shrine green?</strong> GFP is shaped like a barrel that loops in on itself. Within the barrel, specifically on the aforementioned loop, there are amino acids (Ser-Tyr-Gly) that, together, form a &#8216;chromophore&#8217;, which just refers to molecules that can absorb wavelengths of light. Upon blue, violet, or UV light hitting the chromophore region, electrons in it will excite, and slowly return back to their baseline state. The &#8216;return to relaxation&#8217; happens to release green light. </p><p>Here&#8217;s the structure of it; the chromophore (which, remember, is <strong>part</strong> of GFP, not separate!) lies in the middle of the barrel. Funnily, the best structure I could find also came from an ML GFP mutation paper!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1FtH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1FtH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 424w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 848w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 1272w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1FtH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png" width="1270" height="458" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:458,&quot;width&quot;:1270,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:579298,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1FtH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 424w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 848w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 1272w, https://substackcdn.com/image/fetch/$s_!1FtH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a73e2fb-a868-4933-a97f-2e8e0ce97f1c_1270x458.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.biorxiv.org/content/biorxiv/early/2018/06/02/337154.full.pdf">here</a></figcaption></figure></div><p><strong>Why green? And why can only blue/violet/UV light be used to make it light up? </strong>The difference in energy between the excited state and the ground state of the GFP chromophore electrons corresponds to green light wavelengths (around 510 nm). As for the latter question, blue/violet/UV light similarly correspond to wavelengths that are high-energy enough for the GFP chromophore to <strong>be</strong> excited. You <strong>can</strong> shine a red light onto a GFP; it&#8217;ll just not do anything because the electrons are not sufficiently excited for there to be a &#8216;relaxation&#8217; phase. <strong>To note, there are other non-GFP proteins that can emit alternative colors, such as RFP&#8217;s, or Red Fluorescent Proteins!</strong> These are largely structurally identical to GFP&#8217;s, but have sequence mutations on the chromophore + barrel regions that modify the difference in electron excited-ground states on the chromophore. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KM1c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KM1c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 424w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 848w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KM1c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png" width="579" height="386.13255494505495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:579,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Visible Light Spectrum Overview and Chart&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Visible Light Spectrum Overview and Chart" title="Visible Light Spectrum Overview and Chart" srcset="https://substackcdn.com/image/fetch/$s_!KM1c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 424w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 848w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 1272w, https://substackcdn.com/image/fetch/$s_!KM1c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F251ed0cd-ca1f-466e-b801-3096b4376aac_1500x1000.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What makes GFP unique? </strong>Lots of things in the world are chromophores. But GFP has become a near-staple in lab research for three main reasons. We&#8217;ll get into <strong>why</strong> these are such useful characteristics in the next question.  </p><ol><li><p><strong>Bright. </strong>GFP lights up <em><strong>very</strong></em> green when it's excited, and only requires external light to do so. Conversely, many other chromophores have relatively weak excitation and are, thus, hard to monitor by an external entity (usually a human with a microscope). Many other chromophores also require external assistance to form a chromophore, such as added proteins, whereas GFP mostly works out of the box (given access to oxygen). </p></li><li><p><strong>Stable. </strong>GFP doesn&#8217;t really unfold, its barrel-shaped structure is quite stable in a variety of environments. More importantly, the chromophore within a GFP is not easily &#8216;quenched&#8217; by outside interference (say, other proteins). From a structural perspective, this is because the hydrophobic barrel shape of GFP protects the chromophore from having to interact with much of the environment. This is ideal from a utility perspective; GFP needn&#8217;t be &#8216;babied&#8217; to ensure that it&#8217;ll still be capable of lighting up when surrounded by other molecules. As with all claims about the stability proteins, <a href="https://blog.addgene.org/avoiding-the-dark-side-of-fluorescent-protein-fusions-with-mox-fps">this isn&#8217;t always true</a>. It&#8217;s a decent-sized large protein with a complex structure, sometimes things will go wrong! But even this was largely solved through protein engineering <a href="https://www.generalbiosystems.com/uploadfiles/file/201504/Engineering%20and%20characterization%20of%20a%20superfolder%20green%20fluorescent%20protein.pdf">using only two simple mutations</a>, a testament to how stable base GFP already is. </p></li><li><p><strong>Innocuous.</strong> GFP isn&#8217;t reactive with much. As such, it doesn&#8217;t alter the usual behaviors of cells, proteins, and the like from it existing alongside them. The same addendums as in the &#8216;Stable&#8217; section above apply; there are exceptions, but still relatively minor and solved with a small set of mutations.  </p></li></ol><p><strong>What do people use GFP for? </strong>Because of its brightness, stability, and innocuous nature, GFP has become extremely popular as a detection method of protein tags. One application involves splicing the sequence for GFP at the end or the start of a protein-coding gene, such that when the gene is transcribed, GFP is tagged onto the structure as well! This allows for a researcher to have an easy way to track the location of the protein as it&#8217;s shuttled around the cell, purely by shining a light onto a cell and seeing the green spot (GFP-tagged protein) move around. And, because GFP is stable and non-reactive, it doesn&#8217;t (usually) affect the usual function of the protein either. There are other use-cases as well, such as detecting when a certain section of a genome is being transcribed or when an engineered virus has transduced a cell.</p><p>All in all, it&#8217;s a really simple protein with a wide array of use cases.</p><p>ESM3 redesigned it. How? Why? We&#8217;ll get into that!</p><h1>What is esmGFP?</h1><p>esmGFP is largely equivalent to GFP, just roughly 100~ mutations away from it. </p><p><strong>This alone doesn&#8217;t make esmGFP &#8220;better&#8221; than GFP.</strong> Proteins many mutations away from wild-type are interesting in some cases, especially viruses, but definitely not with GFP. Moreover, unlike <a href="https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1">Profluent&#8217;s Cas9 redesign</a>, which potentially allows them to escape patent laws over its use in therapeutics, GFP is basically perfect as is. <a href="https://blog.addgene.org/when-gfp-lets-you-down">Some things could certainly be improved about GFP,</a> such as its dependence on certain acidity conditions, inability to be tacked onto certain proteins, and its requirement for oxygen to function correctly. But, for the most part, it works fine. Even if we did desire to heavily mutate GFP to improve its performance, not many mutations are necessary to do that, e.g. <a href="https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2019.01200/full">only 10 mutations are necessary to improve fluorescence intensity by 3.3x</a>.</p><p>The only reason one would redesign GFP is that it is <strong>hard </strong>to do so. </p><p>And that&#8217;s exactly why ESM3 tackled the problem! From the paper:</p><div class="pullquote"><p>&#8230;Rational design and machine learning-assisted highthroughput screening have yielded GFP sequences with improved properties&#8212;such as higher brightness or stability, or differently colored variants&#8212;that incorporated small numbers of mutations (typically 5 to 15, out of the total 238 amino acid coding sequence) from the originating sequence. </p><p>Generating a new GFP would require materialization of the complex biochemistry and physics that underlie its fluorescence&#8230;Light emission is highly sensitive to the local electronic environment of the chromophore. For these reasons, obtaining a new functional GFP would require precise configuration of both the active site and the surrounding long range tertiary interactions throughout the beta barrel.</p></div><p>This sets up the problem quite well. While the Cas9 protein does have a fair bit of monomeric competition in the &#8216;<em>what else can cleave genetic material?</em>&#8217; dimension &#8212; all of them with fair structural differences &#8212; GFP is a bit more standard. Typical methods to mutate it to yield different properties are usually quite minor, only in the realm of 5-15 amino acids. Generating a structurally equivalent but sequence-wise unique GFP is extremely novel and, if we&#8217;re to take the paper at face value, very challenging. </p><p><strong>How did they do this? </strong>We&#8217;ll avoid a lot of the minutiae of the ESM3 architecture setup here, and just view it as a conditional generational model, where the conditions can be structure, sequence, and/or function. </p><p>ESM3 conditions the design process on a provided set of 6 amino acids (Thr62, Thr65, Tyr66, Gly67, Arg96, Glu222) sequence/structure <strong>+</strong> the structure (but not sequence) of residues 58-71, <strong>all</strong> of which corresponding to chromophore/chromophore adjacent positions that in the normal GFP of residues. From the paper:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNYy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNYy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 424w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 848w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 1272w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNYy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png" width="562" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:562,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207746,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QNYy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 424w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 848w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 1272w, https://substackcdn.com/image/fetch/$s_!QNYy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa01ae7bd-8c83-4ad1-8771-050971967b8e_562x422.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This may be viewed as cheating, since you&#8217;ve already provided the &#8216;light emitting&#8217; part of the structure, but this region also seems to be pretty tolerant of mutations anyway. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6193456/#SD2">I stumbled across a paper that analyzes the fitness landscape of the chromophore region</a> (very similar to the above 6 amino acids) and find that the overwhelming majority of mutations still allow fluorescence (though negatively affects overall performance). Conversely, the well structured barrel section of GFP seems &#8216;harder&#8217; to re-design, as it has a hard-to-tease apart impact on the structure of the chromophore, but I&#8217;m also not finding many papers on the subject and am not a protein designer so &#175;\_(&#12484;)_/&#175;. </p><p><strong>What was the design process?</strong> A bit involved, relying on two rounds of generation. The first created tens of thousands of designs in-silico, were filtered down to 88, and all expressed in-vitro. One of these were picked as a GFP-like seed <strong>(referred to as B8)</strong> which was 96 mutations away from WT. This one then went through another iterative in-silico design process, filtered down, and re-expressed. One of these final re-expressions, also referred to as C10, was deigned <strong>esmGFP</strong>, a further 15 mutations away from B8.  </p><p>Of particular interest are the filters that were used during the design process. <strong>Specifically, three filters relied on having access to crystallized GFP structures (1QY3 and 1EMA)!</strong> This isn&#8217;t a bad thing exactly, but it does perhaps inflate the utility of the model for proteins that lack a crystallized structure. The GFP design process may have been more convincing had it relied on AF2/ESMFold/ESM3 predicted structures instead &#8212; though it is likely that those have memorized GFP entirely. This is an extremely minor point overall, especially given that these filters are specific to the region of the chromophore we provided to the model. </p><p>Either way, here are the crystal-structure-reliant-filters. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RyWz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RyWz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 424w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 848w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 1272w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RyWz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png" width="546" height="462.36416184971097" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:586,&quot;width&quot;:692,&quot;resizeWidth&quot;:546,&quot;bytes&quot;:334604,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RyWz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 424w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 848w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 1272w, https://substackcdn.com/image/fetch/$s_!RyWz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f9a9e5a-4c31-4c0b-a986-d7e48c986219_692x586.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>How well did esmGFP perform? </strong>On par with GFP on a few metrics, but distinctly different in one category. </p><p>Let&#8217;s start with how it equaled GFP in performance. The below chart shows that the excitation (Ex)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> spectra of EGFP (a form of GFP usually used in lab) matches that of esmGFP; <strong>blue/violet, higher energy</strong>. The same could be said of the emission spectra<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a>, <strong>which peaks at green.</strong> Everything is as expected!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DGao!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DGao!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 424w, https://substackcdn.com/image/fetch/$s_!DGao!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 848w, https://substackcdn.com/image/fetch/$s_!DGao!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 1272w, https://substackcdn.com/image/fetch/$s_!DGao!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DGao!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png" width="1102" height="420" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:1102,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:294284,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DGao!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 424w, https://substackcdn.com/image/fetch/$s_!DGao!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 848w, https://substackcdn.com/image/fetch/$s_!DGao!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 1272w, https://substackcdn.com/image/fetch/$s_!DGao!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F387f8c4c-b1c3-44e2-848a-5b77691b635a_1102x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This next chart is a bit more interesting. The fluorescence of esmGFP, B8, and &#8216;knockouts&#8217; (esmGFP with known loss-of-function mutations on the chromophore) were compared to a few control GFP&#8217;s that are regularly used in the lab: avGFP, cgreGFP, and ppluGFP. As expected, the knockouts barely lit up, the controls did fine, and esmGFP was better (brighter) than B8. </p><p><strong>But, more curiously, both B8 and esmGFP took longer than a day to fully mature to their &#8216;peak&#8217; fluorescence, up to a week! </strong>This phenomenon wasn&#8217;t something I was even previously aware of; I assume GFP&#8217;s are born with a fully functioning chromophore, but it turns out <a href="https://book.bionumbers.org/what-is-the-maturation-time-for-fluorescent-proteins/">interaction with oxygen molecules allow a chromophore to fold into the correct position.</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!47wD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!47wD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 424w, https://substackcdn.com/image/fetch/$s_!47wD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 848w, https://substackcdn.com/image/fetch/$s_!47wD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 1272w, https://substackcdn.com/image/fetch/$s_!47wD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!47wD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png" width="599" height="456.67561983471074" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:738,&quot;width&quot;:968,&quot;resizeWidth&quot;:599,&quot;bytes&quot;:513880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!47wD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 424w, https://substackcdn.com/image/fetch/$s_!47wD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 848w, https://substackcdn.com/image/fetch/$s_!47wD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 1272w, https://substackcdn.com/image/fetch/$s_!47wD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b78c86c-26a6-4c8c-9fa7-9231bccc8e64_968x738.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But the amount of time esmGFP takes to mature is strange even amongst most GFP&#8217;s; most other ones that are actively used have maturation times measured in tens of minutes, not days! </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IFRM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IFRM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IFRM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg" width="436" height="491.7188498402556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:706,&quot;width&quot;:626,&quot;resizeWidth&quot;:436,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;What is the maturation time for fluorescent proteins?&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="What is the maturation time for fluorescent proteins?" title="What is the maturation time for fluorescent proteins?" srcset="https://substackcdn.com/image/fetch/$s_!IFRM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!IFRM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ad03b4f-53a7-4157-8430-b5dfaf3f4485_626x706.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://book.bionumbers.org/what-is-the-maturation-time-for-fluorescent-proteins/">here</a></figcaption></figure></div><p>This isn&#8217;t a huge knock against ESM3, there is immense accomplishment in being able to design a protein with such specific functionality, massively far away in sequence space from wild-type, all with comparable fitness outside of one dimension (maturation time). Still though&#8230;what caused such a long maturation time? Would love to hear a structural biologists take on this!</p><p><strong>Anything else interesting?</strong> Two things! </p><p>One, the paper claims 58% sequence divergence from esmGFP to any other GFP protein. Assuming a constant rate of mutation amongst GFP&#8217;s, this is where the &#8216;<em>500 million years of evolution</em>&#8217; part of the ESM3 paper title comes from. </p><p>But what about divergence to <strong>any</strong> protein?</p><p>Brian Naughton, who famously (in a very specific niche corner of Twitter) <a href="https://twitter.com/btnaughton/status/1783566980047671745">found that the Profluent Cas9 redesign could be 98% recapitulated by three Streptococcus sequences</a>, performed the same analysis here with esmGFP! It&#8217;s not as severe as an overlap, but still, some! Here is the original post:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n-Ap!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n-Ap!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 424w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 848w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 1272w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n-Ap!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png" width="625" height="452.62455516014234" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1124,&quot;resizeWidth&quot;:625,&quot;bytes&quot;:559273,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n-Ap!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 424w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 848w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 1272w, https://substackcdn.com/image/fetch/$s_!n-Ap!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0e2693a1-e351-4a16-8cf4-e2035ddd5b09_1124x814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://x.com/btnaughton/status/1805732057827655905">here</a></figcaption></figure></div><p>Two, esmGFP is generated with ESM3 7B, not ESM3 98B! This is surprising, given that 98B had generally superior performance on every benchmark throughout the paper. I suspect the reason for this is more logistical than scientific. Wet lab experiments take time, a 98B model likely takes forever to train, and with <a href="https://www.nature.com/articles/s41586-024-07487-w">Alphafold3 released just a month ago</a>, Evolutionary Scale likely just wanted to get something out. Excited to see follow-up work on what 98B is capable of in a design setting! </p><h1>Conclusion</h1><p>esmGFP isn't groundbreaking on its own, but that's not really the point. It's more of a proof of concept for the model behind it. What's actually impressive is the all-in-one approach that ESM3 took &#8212; <strong>combining structure, sequence, and function in a model with 98 billion parameters</strong>. And Evolutionary Scale did this with only about 15 employees, which is insane! The clinical and research impact remains to be seen, but there's definite potential here for massively speeding up existing workflows.</p><p>I&#8217;m very curious where Evolutionary Scale goes with this. It seems like they are eschewing the therapeutic route, going with a more <em>model-as-a-service </em>setup. This is interesting and, for the most part, unprecedented in biology! The <a href="https://www.schrodinger.com/">Schr&#246;dinger</a> model may be a close parallel to what happens; a hefty license fee for use amongst a department. But perhaps a different story will be told. While Schr&#246;dinger had plenty of open-sourced competitors to their tools, ESM3 98B stands largely alone alongside Alphafold3, potentially allowing them to price their tools in entirely different ways. <strong>If useful enough, potentially even a royalty of drugs that were created with ESM3 in the loop?</strong> This seems like the only way the company could afford to continue training models of this size and deploy them &#8212; Schr&#246;dinger-level economics probably won&#8217;t cut it. But the time spans before drug royalties kick in are long &#8212; multiple years &#8212; and it is unlikely that ESM3 is currently pivotally important enough to justify such a high price. But ESM4? Who knows? I can definitely see tools like this becoming such an integral part to faster drug development that biotech companies would be willing to cut into their own share of the profits to have access to it.  </p><p>Either way, very curious as to what the typical heroes of proteomics ML open source &#8212; <a href="https://www.ipd.uw.edu/david-baker/">Baker</a>, <a href="https://www.aqlab.io/">AlQuraishi</a>, and <a href="https://biology.mit.edu/profile/sergey-ovchinnikov/">Ovchinnikov</a> &#8212; do next, given the move by both Evolutionary Scale and Isomorphic to be largely closed-source. </p><p>Finally, if there do happen to be any Evolutionary Scale engineers or scientists reading, I&#8217;m plugging my article here if ESM4 has yet to start training and ideation is still on the table:</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f96a0fad-5498-4953-9e70-e7ecde8fcf46&quot;,&quot;caption&quot;:&quot;&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;An argument for integrating molecular dynamics data into proteomic ML&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:223596199,&quot;name&quot;:&quot;Abhishaike Mahajan&quot;,&quot;bio&quot;:&quot;biology posting&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb7749b-731a-42ac-96d4-e823c76fd218_400x400.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-06-02T21:04:09.767Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00291dc2-2e12-4244-adb2-e23d87bea9f7_850x489.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.abhishaike.com/p/an-argument-for-integrating-molecular&quot;,&quot;section_name&quot;:&quot;Arguments &quot;,&quot;video_upload_id&quot;:null,&quot;id&quot;:144690555,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:5,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Owl Posting&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2caaecbe-ec6f-4c50-9596-c60ebade9ad3_400x400.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>MD is the next frontier of interesting training tokens!</p><div><hr></div><p><em>Thank you for reading this post! Every two weeks, I&#8217;ll be posting something about biology, ML, or the intersection of the two. If this interests you, please subscribe!</em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Referring to the wavelength of light the chromophore is absorbing.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Referring to the wavelength of light the chromophore is emitting.</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[A primer on why microbiome research is hard]]></title><description><![CDATA[6.3k words, 29 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-why-microbiome-research</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-why-microbiome-research</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 17 Jun 2024 12:23:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Nl2-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nl2-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nl2-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nl2-!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8130234,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/144562568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nl2-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl2-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36aea7bb-ccbc-4535-9b16-e98b5fabcb45_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.abhishaike.com/i/144562568/introduction">Introduction </a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/why-is-it-hard">Why is it hard? </a></p><ol><li><p><a href="https://www.abhishaike.com/i/144562568/difficulty-of-characterization">Difficulty of characterization </a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/causal-links-are-hard-to-establish">Causal links are hard to establish</a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/unknown-microbiota">Unknown microbiota</a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/therapies-are-complicated">Therapies are complicated</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144562568/the-initial-promises">The initial promises</a></p><ol><li><p><a href="https://www.abhishaike.com/i/144562568/metabolic-conditions">Metabolic conditions</a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/tumor-microbiomes">Tumor microbiomes </a></p></li><li><p><a href="https://www.abhishaike.com/i/144562568/gut-brain-axis">Gut-brain axis</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144562568/conclusion-tldr">Conclusion + TLDR</a></p></li></ol><h1>Introduction </h1><p>Microbiomes are, as far as cutting-edge science goes, old news. Knowledge that the microbiome exists and is involved in digestive process has been known for over a half-century. But, with the discovery of <em>H. Pylori&#8217;s</em> role in causing peptic ulcers and the subsequently awarded 2005 Nobel Prize in medicine for it, the microbiome began to be viewed differently: something that is intertwined with the neurological, immunological, and metabolic health of its host. And perhaps something that represents a new therapeutic target entirely. Yet, over the course of the last 20 years, the microbiome field has little to show for it, despite billions in funding, thousands of published papers, and dozens of spun-up companies.</p><p>As far as I can tell, no scientific deceit was committed. No scandalous, Theranos-esque, or Alzheimers-esque fraud caused this lack of progress. Microbiome research is literally just <strong>hard</strong>. Why? This essay will attempt to explain that. </p><p>This all isn&#8217;t to say no utility in microbiome research has been found. There has been one extraordinary success in the field: <a href="https://www.health.harvard.edu/blog/stool-transplants-are-now-standard-of-care-for-recurrent-c-difficile-infections-2019050916576">fecal transplants for recurrent </a><em><a href="https://www.health.harvard.edu/blog/stool-transplants-are-now-standard-of-care-for-recurrent-c-difficile-infections-2019050916576">Clostridioides difficile (C. Diff)</a></em><a href="https://www.health.harvard.edu/blog/stool-transplants-are-now-standard-of-care-for-recurrent-c-difficile-infections-2019050916576"> infections</a>. Whereas this microbiome-aware treatment yields a near-perfect cure rate, <a href="https://academic.oup.com/cid/article/40/11/1586/444708">competing approaches had a dismal 20% cure rate</a>. Even more interestingly, knowledge of microbiomes have led us towards large-scale ecological engineering by <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4027352/">leveraging the sexual differentiation influence that the Wolbachia microbe has on mosquitos.</a> But these successes are largely isolated. </p><p>Studying the microbiome is incredibly challenging. You&#8217;re trying to observe a living, breathing ecosystem that will react to the most minute of perturbations, has an astonishing level of diversity, and can entirely change from day-to-day. And you&#8217;re trying to connect the total collective actions of all this to some functional impact on the human it lives in. And, ideally, you&#8217;d also like the ability to directly modify<strong> </strong>aspects of this ecosystem to change this functional impact! All hard things to do. <strong>We&#8217;ll discuss all these more in depth in this essay. </strong></p><p>But as with all difficult things, some very clever people did manage to make some headway. We mentioned <em>C. Diff </em>infections, which was one of the major achievements in the field and spurred further investigation into its clinical potential. <strong>We&#8217;ll also discuss three of the highest-potential areas in microbiome analysis, and where they ended up. </strong> </p><p>Lot of material to cover! Let&#8217;s start. <strong>One quick note: this essay will primarily focus on gut microbiomes, as that is where the field is most developed.</strong> Many of the points I&#8217;ll make will apply to studies that concern themselves with microbiomes in general, but the discussion will primarily focus around the gut.</p><h1>Why is it hard? </h1><h2>Difficulty of characterization </h2><p>One of the more fundamental challenges we face here is that extracting out the microbiome <strong>at all</strong> is hard! </p><p>We start off with a sample collected from the biome of interest. For the mouth, it'd be a swab of your cheek, for the gut, it'd be a stool sample, and so on. But how do we go about characterizing what bacterial, viral, fungal, and so on species are in this sample? </p><p>The earliest, scalable method of doing so was called &#8216;16S sequencing&#8217;, which attempted to read a set of genes commonly referred to as &#8216;16S rRNA genes&#8217;, or just 16S, which, when transcribed, <a href="https://en.wikipedia.org/wiki/Prokaryotic_small_ribosomal_subunit">yields a subunit of a ribosomal structure</a>. <strong>The set of genes encompassing 16S had been understood for decades as &#8216;taxonomic identifier&#8217; genes; something that alone poked at the phylogeny, or evolutionary history, of a particular species.</strong> In turn, sequencing these genes in a microbe colony allowed researchers to learn all species contained in that particular colony! Unfortunately, as with any attempt to simplify biology, this turned out to have a variety of hidden flaws. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6677816/">16S genes apparently could undergo horizontal gene transfer</a> (bacterial genes being transferred to neighboring bacteria), meaning microbes in your sample could very often <strong>be contaminated by its neighbors. </strong>This in turn meant that some species, <strong>even when not present in a sample</strong>, <a href="https://www.sciencedirect.com/science/article/abs/pii/S0168160517302842">could still be detected by bioinformatic pipelines using 16S sequencing as input</a>. Past the error rate of the test itself, 16S sequencing were more accurate for some taxonomies of microbes than others, some could be well characterized by it, others couldn&#8217;t be, which leads to massive selection bias in downstream analysis.  This was one of the earliest tests offered at consumer microbiome companies (<a href="https://en.wikipedia.org/wiki/UBiome">including the infamous uBiome</a>).</p><p>But, for the most part, 16S sequencing was replaced with two methods, both of which are the current dominant forms of microbiome characterization: <strong>shotgun metagenomic sequencing</strong> and <strong>metatranscriptomic sequencing</strong> and. As the names imply, the former is concerned with sequencing all DNA snippets in your sample, and the latter with all RNA snippets. This methods, enabled by the Next-Generation-Sequencing (NGS) revolution, allows us to characterize, in bulk, the exact evolutionary history (shotgun metagenomics) and ongoing behavior (metatranscriptomics) of our sample. Unlike 16S, these methods were generally far better at characterizing microbes across domains of life.</p><p>Unfortunately, these methods still have their own issues.</p><p>Let&#8217;s start with shotgun metagenomic sequencing &#8212; called &#8216;shotgun&#8217; because it is untargeted sequencing of all DNA found in a sample. While going this route over purely sequencing 16S allows us to view the genetic landscape of our sample fairly well, practical interpretation of the results are difficult. The sheer diversity of the microbiome is absolutely enormous, and very little of it has been catalogued. <strong>As such, most DNA sequences derived from a metagenomic sequencing run will, in all likelihood, be unable to be perfectly matched to any catalogued species.</strong> This isn&#8217;t a problem in of itself, phylogenetic relationships can be derived from matching up unknown sequences with reference sequences in order to understand how unknown sequences &#8216;fit&#8217; into the broader picture. <strong><a href="https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-021-01059-0">But this invariably opens the door to bias.</a></strong> Specifically, the the process used to perform this &#8216;fitting&#8217; can dramatically alter what microbes are reported as present in the final result, with the results becoming far worse with poorly studied taxonomies of life, such as rare microbes. As such, multiple parallel metagenomic studies of the exact same sample may arrive to dramatically different conclusions of the species present within that sample,<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6716367/"> unless they rely on the exact same algorithms and reference databases.</a></p><p>Metatranscriptomics share basically all of the same issues as metagenomics has, since it also relies on references to characterized sequences. There is one more major problem in that <a href="https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2019.00904/full">most of the harvested RNA is ribosomal RNA (90%~)</a>, which is largely useless for understanding the transcriptomic profile of microbes. So our problem with detecting low-population microbes becomes even <strong>more</strong> of an issue here!</p><p>One quick note: we may try to mitigate some of these issues by first culturing the microbes before sequencing them, encouraging low-population microbes to increase in number. This turns out to be an ill-fated approach in practice.<a href="https://www.nature.com/articles/nature17645"> Many microbes have hyper-specific living conditions</a> and will refuse to grow &#8212; or outright die &#8212; outside of their natural habitat (which happens to be you). People are working on fixing this, but it seems quite aways from actually being reliable. </p><p><strong>And we haven&#8217;t even mentioned that the long tail of everything before sequencing readout also is a source of bias!</strong> <a href="https://journals.asm.org/doi/10.1128/spectrum.00090-22">Everything from genetic material extraction to sequencing machine used</a> to even the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510183/">storage conditions</a> can yield systemic differences in the characterization of a microbiome sample.<a href="https://www.nationalgeographic.com/science/article/contaminomics-why-some-microbiome-studies-may-be-wrong"> Moreover, contamination of microbe samples is also a huge problem</a>, <strong>which can come from as innocuous of a place as the kit used for genetic material extraction</strong>. </p><p>A more skeptical reader may, at this point, claim that I&#8217;m overstating things here a bit. After all, we do DNA and RNA sequencing for human cells all the time, and seem to be happy with our results there. Aren&#8217;t many of these problems alleviated by simply following the same protocols for human samples? For the most part, no, microbes are genuinely <strong>much </strong>harder. Reference alignments are less of a deal for human cells, since we&#8217;ve largely characterized the full human genome, so we can trust readouts more. This ability to rely on references means that DNA/RNA contamination is less of an issue, since the reference sequences will save us if we do accidentally sequence non-human material. <a href="https://www.stjude.org/media-resources/news-releases/2023-medicine-science-news/humans-vs-bacteria-differences-in-ribosome-decoding-revealed.html">Human cells also produce proteins slower than bacteria</a>, meaning less ribosomal RNA, meaning less readout noise. Finally, the biomass of any given species of a microbe in a sample is astonishingly low &#8212; given the immense diversity of the microbiome &#8212; whereas human tissue is far more homogenous other than cell-type differentiation, leading to higher fidelity readouts.</p><p>All of this wraps up to a pretty worrying message: <strong>many microbiome studies may be fundamentally incorrect in their takeaway message due to nearly unavoidable problems in sample storage, sample preparation, sequencing, and sequence alignment</strong>. </p><p>But let&#8217;s not get ahead of ourselves. Let&#8217;s say we&#8217;re better than the literature and take extreme pains to ensure we&#8217;re doing things <strong>right</strong>. We take a sample of some microbiome (say, our gut), store it well, avoid contamination, sequence it correctly, align it correctly, and get to a fully accurate view of the genomic and transcriptomic landscape of our microbiome. </p><p>What do we do with the readout? Well, aren&#8217;t there good microbes and bad microbes? Shouldn&#8217;t our first check be to see if we have good ones, and none of the bad ones? Maybe check if the transcriptome matches up with people with gut microbiomes? </p><p>All decent ideas! But there is a minor issue to be aware of. </p><h2>Causal links are hard to establish</h2><p>There are two sides of the microbiome literature. </p><p>One paints a relatively rosy picture of the connection between certain species of a microbiome and diseases. So-called &#8216;dysbiosis&#8217;, or irregularities/abnormalities in ones microbiome, became an increasingly popular view of how to view the microbiome. Associative studies showing the connection between <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7146472/">atherosclerosis</a>, <a href="https://www.nature.com/articles/s41579-020-0433-9">metabolic disorders</a>, and the like, to certain species of gut microbes have only further cemented this. There are dozens of papers in this realm, many of them gathering thousands of citations, all with effusively positive conclusions on the future of the field in clinical practice.</p><p>The other side points to two deeply unfortunate facts<strong>. One, there is no such thing as a &#8216;normal&#8217; microbiome.</strong> <a href="https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2022.917926/full">The microbiomes between otherwise healthy people can strongly differ</a> and <a href="https://www.cell.com/cell-host-microbe/fulltext/S1931-3128(14)00346-1">even change over time</a>; while there are some species whose overgrowth is clearly bad, there are very few hard rules.<strong> And two, it is very much unclear how much a microbiome is &#8216;chasing&#8217; a certain phenotype (e.g. autism, T2D, etc) versus actually causing it.</strong> <a href="https://www.nature.com/articles/s41586-020-2881-9">After all, alterations in microbes don&#8217;t only correlate with disease, but also with ethnicity, age, medication intake, vegetable intake, whether they are vegan or not, and </a><strong><a href="https://www.nature.com/articles/s41586-020-2881-9">even whether they have a dog</a></strong><a href="https://www.nature.com/articles/s41586-020-2881-9">.</a> And most studies in the this space are purely associative with regards to disease, primarily comparing the microbiomes of diseased patients with healthy ones. </p><p>Given these two, how can you possibly assess causality of the microbiome doing anything at all? Well, application of a therapeutic agent to the microbiome may lead us somewhere. Enter human fecal microbiome transplants, or FMT. </p><p><strong>FMT represent the strongest piece of evidence with regards to the causal impact of the microbiome on physiological state.</strong> This treatment is exactly what it sounds like; transferring the fecal matter of one (healthy) human into another (diseased) human, usually either via pills or colonoscopy, allowing the microbes contained within to colonize a new gut. This way, we needn&#8217;t grapple with questions on good/bad microbes or what they are actually doing! Unfortunately, this is also rarely done outside of well-established use-cases &#8212; of which there is basically only one: <em>C. Diff </em>infections. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455422/">This is partially due to the fuzzy regulations regarding FMT</a>, but also the<a href="https://www.frontiersin.org/articles/10.3389/fmedt.2022.961569/full"> immense complexity in administering the therapy at all.</a> So while there are some studies using human FMT to study causality, which we&#8217;ll go over more deeply in the last few sections, it makes up a relatively small fraction of the literature. </p><p>There is another way to assess causality: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7614635/">mendelian randomization</a>, or MR. In recent years, MR has leaked its way into the microbiome literature, allowing researchers to make strong claims about the <strong>causal</strong> impact of the microbiome on disease, while also still relying on non-FMT, observational studies in humans. The results of these MR studies, unfortunately, remain hazy in my opinion, even though there&#8217;s plenty of papers claiming to establish causality. <a href="https://www.nature.com/articles/s41598-023-31115-8.pdf">For example, while a human MR study between microbiome and longevity found that some microbes were causative of longer lifespans (with pretty small correlations),</a> <strong>the results didn&#8217;t transfer well from European to Chinese population.</strong>  How many other failure modes are like this in the microbiome literature? Relatively few studies attempt to steelman their own conclusions, so it&#8217;s hard to ascertain. Also, even when these studies are done, and singular species of interest are found as protective/harmful, one should recall the earlier discussion on difficulty of microbiome characterization! <a href="https://www.sciencedirect.com/science/article/abs/pii/S1075996422000324">For example, the gut microbe </a><em><a href="https://www.sciencedirect.com/science/article/abs/pii/S1075996422000324">Bacteroides dorei </a></em><a href="https://www.sciencedirect.com/science/article/abs/pii/S1075996422000324">has become increasingly recognized for its role in heart conditions and T1D.</a> But, this microbe <em>also</em> has a <a href="https://www.sciencedirect.com/science/article/abs/pii/S1075996422000324">close taxonomic relationship with </a><em><a href="https://www.sciencedirect.com/science/article/abs/pii/S1075996422000324">Bacteroides vulgatus</a>, </em>a far more innocuous gut microbe, making it hard to identify at all! While replication certainly helps assuage our concern about this issue, one-off studies should be looked at with a critical eye. </p><p>On a more anecdotal level, I know from personal experience that accurate causal inference is <strong>hard</strong>. The theory goes incredibly deep, and there are tons of failure modes. While MR is certainly a step up from purely associative studies, I&#8217;m immediately skeptical of the whole direction. <a href="https://www.nature.com/articles/nature25973#:~:text=Taken%20together%2C%20our%20results%20demonstrate,after%20accounting%20for%20host%20genetics.">Given that the environment is the dominating factor &#8212; not host genes &#8212; in determining the microbiome</a>, using genetic instrumental variables just feels doomed.  </p><p>Well, okay. Assessing causality with human FMT is hard to do from a regulatory perspective, and MR is potentially leading us astray. How about relying on animal FMT to assess causality? It&#8217;s a decent idea, but still problematic. </p><p><a href="https://link.springer.com/article/10.1007/s00018-017-2693-8">The physical structure, digestion rate, pH, and glycan profiles of a stomach dramatically differ between human and animals.</a> This has functional impacts! <a href="https://pubmed.ncbi.nlm.nih.gov/34118462/">Entire species of microbes that can survive in humans cannot in mice, and vice-versa</a>. While there certainly is overlap in genus, the difference mounts up as we critically interrogate individual strains. <a href="https://chrismasterjohnphd.substack.com/p/the-greatest-error-in-microbiome">There is also an interesting essay I found about how mice naturally engage in coprophagy, or eating their own feces, and their microbiome has likely adapted for this!</a> While this is only a theory, and not backed up by a paper, it is interesting to consider that even behavioral differences between animals and humans could lead to downstream impacts on their microbiome.  <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9775123/">Using animals more closely related to humans, such as pigs or primates, will likely alleviate this problem</a>, but is challenging to scale in any meaningful way. </p><p>Okay, learning what&#8217;s actually in the microbiome is hard, and learning what the microbiome is actually doing is hard. What else is the issue?</p><h2>Unknown microbiota</h2><p><a href="https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-019-0667-z">Despite decades of study, a substantial fraction of the microbiome remains </a><strong><a href="https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-019-0667-z">unable to sequenced in any meaningful way.</a> </strong></p><p>Low-abundance taxonomies of microbes are where our minds may jump to when we consider this, as high-throughout, sequencing-based approaches cannot easily discover them (given that they are often not included in reference databases). While directly cultivating such microbes is a possible solution, this is also not a great fix as we discussed earlier, as some microbes are resistant to being grown outside of their natural environment. As the fields collection of reference genomes grow, the problem of low-abundance microbes diminishes, <a href="https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-019-0667-z">but, as it stands, the current range of &#8216;un-mappable genetic sequences&#8217; is around 10% (optimistically).</a> While this is a step up from the previous numbers, which were closer to 30% to 50% a decade ago, we&#8217;re still quite far from having a complete picture of the microbiome. </p><p><strong>But this section actually has little to do with low-abundance microbes! </strong>Instead, there is a much deeper problem. While we&#8217;ve been politely using the words &#8216;microbes&#8217; throughout this essay, realistically, what <em>most<strong> </strong></em>people are studying are the bacterial component of a microbiome. <strong>The fungal and viral components, or the &#8216;<a href="https://academic.oup.com/femsre/article/41/4/479/3738183?login=false">mycobiome</a>&#8217; and &#8216;<a href="https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(22)00294-8/fulltext">virome</a>&#8217;, of the whole system are largely entirely left alone.</strong></p><p>That 10% of un-mappable sequences discussed above is an immense understatement of the true unknown set of species being missed, because fungal and viral genomes are quite small! This ties into why characterizing these domains of life is difficult, the smaller the genome, the more challenging it to accurately tie it back to a characterized species/genus/taxonomy. And the results from bacterial microbiome papers don&#8217;t seem to cleanly translate to understanding the behaviors of gut fungal/viral colonies either, <strong>different behaviors entirely are observed amongst them</strong>. For example, <a href="https://academic.oup.com/femsre/article/41/4/479/3738183?login=false">there seem to be a lack of &#8216;core&#8217; species in the mycobiome</a> as time passes, very much unlike the bacterial microbiome, which does display a high degree of stability post-childhood. Moreover, there is a high degree of interaction between these domains, <a href="https://www.thelancet.com/journals/ebiom/article/PIIS2352-3964(22)00294-8/fulltext">with bacteriophages being able to dramatically alter the behaviors and compositions of nearby bacteria</a> (and likely fungi as well!). </p><p><strong>Realistically though, how important are understanding these territories of life?</strong> Most microbiome papers only attempt to sequence the bacterial components, so much of the historical literature cannot be trusted upon to gauge the individual impact of the virome or mycobiome. <a href="https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/advs.202301097">However, there is evidence to suggest at least the virome is a crucially important aspect of fecal microbiome transplants (FMT):</a></p><div class="pullquote"><p>One angle of research supporting the role of the virome is the association between efficacy and overlapped viral communities. Studies[17,121,127] of recurrent CDI indicated that recipients who benefited from FMT had an increased relative abundance of Microviridae and a decreased abundance of Caudovirales. These results suggested that the Microviridae family and Caudovirales order potentially played a pivotal role in the efficacy of FMT. Moreover, Broecker et al.[128] observed that stable bacteriophages remained present for at least 4.5 years following FMT and were better correlated with successful FMT than bacterial communities. The negative correlation between recovery of the virome and CDI recurrence can further certify the vital role of the virome in FMT. For example, one study[121] reported that CDI patients who only had restored bacterial communities suffered from disease recurrence. In subjects with metabolic syndrome who did not respond to FMT, the differences between the viral communities shared by the recipient and the respective donor were greater.[124]</p></div><p>While the mycobiome is even more understudied than the virome, <a href="https://www.nature.com/articles/s41579-018-0116-y">as sequencing it is far more complicated</a>, there is<a href="https://genomemedicine.biomedcentral.com/articles/10.1186/gm467"> evidence suggesting a similar level of interplay between it and bacteria</a>, implying that efficacy of FMT is also partially tied to it. </p><p>Given the lack of insight into the bacterial, viral, and fungal elements of the microbiomes, <strong>performing FMT</strong> <strong>is akin to giving patients medications which contain (a small amount of) unknown active ingredients.</strong> Outside of the raw danger of performing FMT on patients, <a href="https://www.cidrap.umn.edu/fecal-transplant/fda-warns-about-infections-linked-fecal-microbiota-transplants">which has been an issue before</a>, this also means that interpreting the results of microbiome transplants is incredibly hazy, as the <strong><a href="https://gut.bmj.com/content/69/3/502">same underlying therapeutic could be composed of radically different elements based on the donor</a>. </strong>To be fair, there are efforts to fix this. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8039740/">There is ongoing work in next-generation-FMT&#8217;s</a>, such as ones that rely on specifically-selected-for strains or metabolic byproducts of a microbiome, instead of the raw contents microbiome itself, increasing both reliability and safety. But as always, very much early days.</p><h2>Therapies are complicated</h2><p>As we&#8217;ve stated previously, the thrust of a lot of microbiome research is not to simply understand the microbiome, but also to also alter it in clinically meaningful direction. How is this done in practice? </p><p>FMT is the typical method for doing this. However, when we previously discussed FMT in the context of establishing causality, we naively accepted that FMT <strong>does</strong> in fact change the microbiome and does so in ways that are predictable and sustainable. </p><p>Is that actually the case?</p><p><strong>Now, FMT does alter the microbiome, that much has been confirmed.</strong> However, the extent and durability of these changes can vary substantially depending on factors such as the donor&#8217;s microbiome, <a href="https://www.nature.com/articles/s41591-022-01913-0">the pre-existing state of the recipient's microbiome, </a><strong><a href="https://www.nature.com/articles/s41591-022-01913-0">and</a></strong><a href="https://www.nature.com/articles/s41591-022-01913-0"> the donor-recipient microbiome compatibility</a>. Moreover, even the way FMT is administered &#8212; whether that&#8217;s the antibiotic regimen used to prepare patients for FMT or how many times FMT is repeated &#8212; can modify the impact of it on the patient. <strong>And all these variables-to-take-into-account don&#8217;t have correct answers on how to deal with them, they are simply </strong><em><strong>known</strong></em><strong> to be a factor!</strong> I&#8217;m not giving many exact details here, and that&#8217;s because it&#8217;s complicated, the whole subject is just filled from top-to-bottom with nuances. There is a great Cell review paper on the whole topic <a href="https://www.cell.com/cell-host-microbe/pdf/S1931-3128(23)00125-7.pdf">here</a>, I highly recommend reading that paper if interested in this specific topic. The main takeaway here is that even if a researcher (<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7359198/">or even laymen doing DIY FMT</a>) performs FMT and finds positive/negative results, <strong>the minute details of exactly </strong><em><strong>how</strong></em><strong> they did it matters a lot.</strong> This is a big deal! While FMT may very well be transformative for a lot of conditions, we still don&#8217;t have a great grasp on how to administer it &#8216;correctly&#8217;. </p><p>Let&#8217;s take a big, big step back from something as complex as FMT for a second. How about just focusing on diet? Are those sufficient to alter a microbiome?</p><p>Well, diet does influence the microbiome, that&#8217;s been known for over a decade now. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321864/">Additions of things like fiber, polyphenols, probiotics, and so on do cause gut microbiome shifts in adult humans.</a> However, this is missing the full picture: <strong><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266695/">dietary changes do not seem to reliably alter the microbiome permanently</a></strong>. In other words, while sudden changes to one&#8217;s diet do alter colony composition (<a href="https://www.nature.com/articles/nature12820">as fast as within 24 hours</a>), the microbiome quickly &#8216;snaps back&#8217; to its usual state <strong>even if the diet is maintained for over a year! </strong>This implies that modifying microbial compositions cannot be done through passive changes of their energy sources (diet), but require necessitate forced recolonization efforts, as is done with FMT. Given the complexity we&#8217;ve seen with using FMT at all, this is an unfortunate result.</p><p>To sum it up, we don&#8217;t fully understand the therapeutic target (microbiome), the therapy (microbial modification), and the delivery of the therapy (FMT). </p><p>Microbiomes are hard!</p><h1>The initial promises </h1><p>Despite the difficulty of working with microbiomes, research on it has continued for decades, researchers desperately trying to find a clear clinical target. But after several decades of work, the primary established utility of microbiome transplants is still only for <em>C. Diff</em> infections, along with IBD-esque disorders, <a href="https://www.thelancet.com/journals/langas/article/PIIS2468-1253(23)00441-7/abstract#:~:text=your%20institutional-,Summary,small%2C%20phase%202%20clinical%20trials.">though the success rate there is dramatically lower than it is for </a><em><a href="https://www.thelancet.com/journals/langas/article/PIIS2468-1253(23)00441-7/abstract#:~:text=your%20institutional-,Summary,small%2C%20phase%202%20clinical%20trials.">C. Diff </a></em><a href="https://www.thelancet.com/journals/langas/article/PIIS2468-1253(23)00441-7/abstract#:~:text=your%20institutional-,Summary,small%2C%20phase%202%20clinical%20trials.">specific conditions</a>. </p><p><strong>As of June 2024, there are only two FDA-approved microbiome therapeutic. And zero for non-C. Diff conditions. </strong></p><p>What happened to everything else that the microbiome field has tried to tackle? Let&#8217;s go over three. There&#8217;s a lot more than just these, but these are ones that will often pop up in the literature. </p><h2>Metabolic conditions </h2><p>There was strong theoretical evidence, pretty early on, that there was interplay between metabolic conditions (T2D, obesity, cardiac conditions, etc) and the hosts microbiome. After all, <a href="https://gut.bmj.com/content/70/6/1174">if the microbiome is heavily involved in metabolic signals</a>, it isn&#8217;t a big stretch to imagine that directly altering the microbiome <strong>should</strong> yield improvements for metabolic conditions. The confirmation that there <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/joim.12508">were microbiota differences between healthy patients and patients with metabolic diseases</a> lent some further credence to the theory. </p><p>Unfortunately, the theory hasn&#8217;t borne out well, the results are mixed-to-negative for human applications.</p><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6835402/">One paper that covered 3 human FMT studies had this to say:</a></p><div class="pullquote"><p>Studies reported mixed results with regards to improvement in dysglycemia metabolic parameters. Vrieze, et al. [40] and Kootte, et al. [41] reported that peripheral insulin sensitivity (rate of glucose disappearance, RD) increased at 6 weeks in patients receiving donor FMT versus patients receiving the placebo control. Hepatic insulin sensitivity (endogenous glucose production, EGP) was further assessed in two studies but no statically differences were found. Kootte, et al. [41] observed a lower HbA1c level in patients who received donor FMT at 6 weeks than in patients receiving the placebo control. However, this study indicated the patients who received donor FMT did not show difference in HbA1c or insulin sensitivity (RD) after 18 weeks [41]. This finding suggests that the observed short-term benefit of FMT on dysglycemia was not maintained long-term. </p><p>&#8230;</p><p>Included studies demonstrated no differences between patients receiving donor FMT and patients receiving placebo with regards to cholesterol profile, including the levels of total cholesterol, HDL-C, LDL-C and TG. Vrieze, et al. [40] and Kootte, et al. [41] also reported no significant differences on BMI between patients receiving donor FMT and patients receiving placebo followed at 6 weeks.</p></div><p><strong>So, while there are improvements in parameters of interest in the short term (&lt;6 weeks), the effect disappeared over a longer time horizon.</strong> Another meta-study looked at nine human FMT studies and found similar results for BMI, cholesterol biomarkers, and a few others:  </p><div class="pullquote"><p>This meta-analysis investigated studies using FMT for the treatment of obesity and metabolic syndrome and basically concluded that the treatment was effective in the short term. At 2 to 6 weeks after the intervention, mean HbA1c and mean fasting glucose were lower in the FMT group than in the placebo group, although this was a small mean difference. However, mean insulin levels were significantly lower in the FMT group, suggesting a significant improvement in insulin sensitivity.</p><p>&#8230;</p><p>We found no difference between the FMT and control groups through analysis of the mean differences in clinically significant parameters (<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288718#pone-0288718-g004">Fig 4</a>) [past 6 weeks], except for a slight decrease in HbA1c at 12 weeks in the FMT group compared to the placebo group in the study by Yu et al. [<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288718#pone.0288718.ref035">35</a>].</p></div><p>There certainly is <strong>something</strong> here! Improvements in baseline parameters in the short term still is something. <strong>But it does seem to be demonstrably difficult to permanently alter the metabolic aspects of the microbiome,</strong> <strong>even if you use FMT</strong>. This problem shows up a fair bit in the microbiome literature; while initial results may seem good, many FMT&#8217;s don&#8217;t seem to &#8216;take&#8217; over the long term. It is still, as of yet, unclear why this may by the case, but I suspect it is strongly related to what we discussed in the &#8216;Unknown microbiota&#8217; section. </p><h2>Tumor microbiomes </h2><p>Warning, we&#8217;re doing a minor deviation from gut microbiomes here!</p><p>It is well established that certain types of cancers can be triggered by certain microbiota interactions, <a href="https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1006573">such as </a><em><a href="https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1006573">Helicobacter pylori</a></em><a href="https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1006573"> for stomach cancer</a>, allowing a clinician to have a relatively easy diagnostic proxy of risk of cancer from microbiome analysis alone. However, it was generally thought that this, and a few others, were isolated cases. For almost every other type of cancer, a much more invasive biopsy was required to confirm cancer. This was up until a 2020 Nature paper from <em>Knight et al</em>. titled &#8216;<a href="https://www.nature.com/articles/s41586-020-2095-1">Microbiome analyses of blood and tissues suggest cancer diagnostic approach</a>&#8217; that found, as the title implied, that there was a distinctive microbial signature found <strong>within</strong> <strong>tumors. </strong>Insanely, this microbial signature was so distinct, that it could even distinguish <strong>between different types of</strong> <strong>cancer with a near 100% accuracy rate. </strong></p><p><strong>If it was so distinct, why hadn&#8217;t we stumbled across it before?</strong> Well, microbe populations in general are extremely low biomass and thus hard to <strong>discover </strong>through sequencing alone, much less characterize. There are of course a few counterexamples to this: namely the gut and skin. But a microbiome anywhere else is hard to stumble across, you need to be intentionally looking for non-human DNA (microbes). This paper was simply the first that looked for it, and also took great pains to ensure the underlying sequence data was accurate &#8212; reminding readers that they removed the vast majority of microbial reads they found (&gt;92%) to ensure they only used high confidence readings. <strong>Why is this paper useful?</strong> Well, the DNA of these tumor microbiomes actually leaked into the bloodstream, allowing for cancer detection by looking for circulating microbial DNA! This would potentially be a whole new type of diagnostic test; allowing patients to avoid a painful, expensive, and invasive biopsy. This already exists and has a name, so-called &#8216;<a href="https://en.wikipedia.org/wiki/Liquid_biopsy">liquid biopsies</a>&#8217;. While other companies have been operating in this space for years, most of them are looking for shed-off tumor DNA circulating in the bloodstream as indication of cancer. This paper implied that we could instead rely on the distinctive microbes found alongside certain cancers. <strong>In other words,</strong> <strong>don&#8217;t focus on the tumor, focus on the microbiome that grows within the tumor.</strong> It was hard to tell what advantages this offered over pure tumor DNA, but it was still an interesting proposition with lots of room to explore. </p><p>It was a very clever paper with clear clinical implications and, obviously, immediately became famous. As of today, it&#8217;s racked up 800+ citations, and the dataset that it used was used for dozens of follow-up papers, <a href="https://www.cell.com/cell/fulltext/S0092-8674(22)01127-8">with one work even discovering a tumor mycobiome</a>! The paper authors eventually founded a company on this whole microbiome-sequence-for-liquid-biopsy approach, naming it <a href="https://micronoma.com/">Micronoma</a>, which continues to exist today.</p><p><strong>Around three years later, the validity of these results was called into question.</strong> </p><p>This skepticism came in the form of a pre-print paper titled &#8216;<a href="https://pubmed.ncbi.nlm.nih.gov/37555750/">Caution regarding the specificities of pan-cancer microbial structure</a>&#8217; from <em>Gihawai et al.</em>, <strong>claiming that the analyses within the original paper were so deeply flawed that the  findings are entirely incorrect.</strong> The vast majority of the &#8216;microbial DNA&#8217; that the paper had identified were, according to the authors, human in origin or the result of environmental contamination. No fraud had been done, merely incorrect analyses of the sequencing data (probably). </p><p>The authors of the original paper (<em>Knight et al)</em> quickly responded to it <a href="https://www.biorxiv.org/content/10.1101/2023.02.10.528049v1">with their own paper</a>, which had a funnily grandiose line in its abstract:</p><div class="pullquote"><p>Therefore, despite numerous, high-impact, peer-reviewed research papers that either validated our conclusions or extended them using data we released we carefully considered criticism raised by Gihawi <em>et al</em>. about potential mishandling of contaminants, batch effects, and machine learning approaches&#8230;</p></div><p>The authors behind <em>Gihawi et al.</em> considered this response, and published another paper, this time peer-reviewed, with an even stronger title: &#8216;<a href="https://journals.asm.org/doi/full/10.1128/mbio.01607-23">Major data analysis errors invalidate cancer microbiome findings</a>&#8217;. </p><p>A rebuttal to this was, once again, mounted by <em>Knight et al.</em> within the same day, which <a href="https://github.com/gregpoore/tcga_rebuttal/tree/master">primarily lives on a Github repository</a>.</p><p>The rebuttal research trail here ends. The only response from the skeptics I could find was in this response by the PI to a request for commentary of the Github: </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iLB9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iLB9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 424w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 848w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 1272w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iLB9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png" width="1228" height="710" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:710,&quot;width&quot;:1228,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:472235,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iLB9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 424w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 848w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 1272w, https://substackcdn.com/image/fetch/$s_!iLB9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feadceb68-1856-4a42-b13b-2c06a64292e6_1228x710.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://x.com/gilbertjacka/status/1686408777007030272">here</a></figcaption></figure></div><p>Salzberg did release a 2024 paper indirectly targeted at this whole concept, in which he states that cancer sequencing datasets (which the original cancer microbiome paper used) <strong><a href="https://www.biorxiv.org/content/10.1101/2024.05.24.595788v1.full.pdf">have no evidence of containing microbial colonies if you do the correct analysis</a></strong><a href="https://www.biorxiv.org/content/10.1101/2024.05.24.595788v1.full.pdf">.</a> Similarly, the tumor microbiome authors <strong>also</strong> released a general paper in 2024, claiming that there are, in fact,<strong><a href="https://www.nature.com/articles/s41388-024-02974-w"> such a thing as distinct microbial colonies if you do the correct analysis</a>.</strong> </p><p>Uh&#8230;so who is right?</p><p>I have no clue. Honestly, the fight becomes stranger the more you dig into it. Some points that each side raise feel entirely irrelevant; such as hyper-focusing on the correctness of the 2020 paper over whether there <strong>actually</strong> is a distinctive tumor microbiome. More confusingly, each side claims minutiae of the opponents rebuttals are actually a misunderstanding of the original point, making it extremely hard to parse who is in the right. Even Derek Lowe <a href="https://www.science.org/content/blog-post/arguing-about-cancer-microbiome">says the whole discussion doesn&#8217;t seem at all settled</a> circa mid-2023. </p><p>Maybe I&#8217;ll write a full blog post someday on a deeper dive of the cancer microbiome. But I guess this illustrates a consequence of the immense complexity of microbiomes; 3 rounds of research rebuttals over what should be a theoretically straightforward question (do tumors have a distinctive microbiome?) yielded no conclusive answers.  </p><p>If I was forced to make a call, who is right? The whole topic is, frankly, far out of my paygrade. I&#8217;ve read quite a few Twitter arguments over this between people who are actively publishing in the metagenomic field and were still unable to conclude/agree on anything substantive. My vote goes to the skeptics, but only because my prior is on microbiome research being fishy in general. </p><p>As is often the case with controversial biotech startups, the market is ultimately quite good at deciding on the validity of it. Time will tell whether the company formed around this, <a href="https://micronoma.com/">Micronoma</a>, comes up with anything useful in the end. </p><h2>Gut-brain axis </h2><p>Let&#8217;s end on something a bit more positive!</p><p>The biochemical interplay between gut microbiota and the central nervous system was a minor revolution in the field, at least from the outside-looking-in. This had a pretty obvious founding basis: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5414803/">with the amount of mood-altering hormones it secrete (more than 90% of all serotonin in your body!), gut microbiota are basically equivalent to an endocrine organ.</a> <strong>This had a similarly obvious therapeutic angle: modify the microbiome, modify neurological function.</strong> Similarly with metabolic conditions, <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/nmo.12378">correlational findings that individuals with depression display unique microbiota further led credence to this theory.</a> Similar discoveries were made for other neurological disorders, such as <a href="https://www.nature.com/articles/s41398-023-02325-5">anxiety</a> and even <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/nmo.12378">schizophrenia</a>. </p><p>How has this borne out in terms of actual clinical results? </p><p>Surprisingly positively! </p><p><a href="https://www.cambridge.org/core/journals/international-psychogeriatrics/article/fecal-microbiota-transplantation-in-an-elderly-patient-with-mental-depression/F60C1C353A935643E2328A4858928DEC">A case study of elderly patient with anxiety, depression, and loss-of-appetite showed massive improvement after FMT, and she continued to display the same improvement 12-months-post FMT.</a> But, of course, n-of-1 studies are always a little suspect. </p><p><a href="https://pubmed.ncbi.nlm.nih.gov/29684865/">A non-controlled retroactive study on the psychiatric impact of FMT &#8212; using patients treated with FMT for GI issues &#8212; found general improvement of anxiety and depressive symptoms.</a> The study only lasted for 4 weeks though.</p><p><a href="https://www.sciencedirect.com/science/article/pii/S2452231719300077#t0010">A controlled study of 42 patients with chronic fatigue syndrome (CFS) found improvement in CFS symptoms when using FMT versus oral probiotics over several months.</a>  Of note is that assessment of symptom improvement is qualitative assessment by physicians <strong>and not blinded.</strong> </p><p>Generally, the case for modifying the gut microbiome for neurological function is somewhat strong. Unlike with metabolic conditions or cancer microbiomes, the evidence seems quite consistent across papers, implies a degree of causation, and, <a href="https://journals.sagepub.com/doi/10.1177/07067437221150508">as a double-blinded randomized control study states</a>, is quite safe. </p><p>But why hasn&#8217;t it leaked into actual therapeutics? As of today, there is no officially approved drug that uses FMT for treatment of neurological conditions. I could hand-wave about how we need larger, multi-center, randomized controlled trials, which are expensive, and companies may not see enough value to invest, but I believe there&#8217;s a less scientific/clinical reason why FMT for neurological development has stalled: <strong><a href="https://pubmed.ncbi.nlm.nih.gov/29064350/">you technically cannot patent FMT-based therapeutics</a>, since it is a &#8216;living organism&#8217;, just like cell lines. This massively reduces the incentive for companies to invest here! </strong>This is a somewhat funny, if a little dark, example of how a therapy could be indefinitely stalled via a law that seems completely unrelated to it. To be clear, this clearly isn&#8217;t stopping microbiome therapies at large! <a href="https://www.vedantabio.com/">Vedanta Biosciences</a>, which focus on microbial therapies, did announce in late 2023 that they were successfully granted patents for their single-species microbe therapies, and there are two other FDA-approved microbiome drugs that are, presumbly, patented as well. So, these patent issues aren&#8217;t an absolute blocker, but does perhaps increase the time-to-market. </p><h1>Conclusion + TLDR</h1><p>This essay covered a lot. Let&#8217;s do a quick TLDR: </p><ol><li><p>Characterizing the microbiome, both in terms of species present and their functional impact, remains extremely challenging due to technical limitations, unknown unknowns, and the sheer complexity of the system.</p></li><li><p>Establishing causality between the microbiome and disease is difficult. Most studies are purely associative, and even more rigorous approaches like MR and animal studies have significant limitations.</p></li><li><p>FMT, the main therapeutic approach, is still poorly understood in terms of durability, reproducibility, and long-term effects. We're essentially giving patients a complex, largely uncharacterized cocktail and hoping for the best.</p></li><li><p>Despite early hype, most clinical applications have failed to pan out so far, with the exception of C. diff infections. Results for metabolic conditions are mixed, and while microbial treatments targeting the gut-brain axis may have promise, it is likely that patent issues in the microbiome space more broadly will hurt its chances to reach the market. </p></li><li><p>Even supposedly groundbreaking findings, like the discovery of tumor microbiomes, remain mired in controversy and may not hold up to further scrutiny.</p></li></ol><p>What&#8217;s next? </p><p>I&#8217;m not sure. I generally feel extremely skeptical of this field for now. Everything here feels so deeply messy, hard to understand, and uncontrollable through the usual Western medicine approaches of throwing a therapeutic at it. Clearly, the microbiome is important! But modifying it is challenging, fraught with issues, and, where it does somewhat work, regulatory issues cripple it from moving quickly. </p><p>It&#8217;s possible that this may change soon! But, unlike most of my other pieces, I am quite skeptical that anything major here will happen soon, barring perhaps gut-brain-axis FMT therapeutics establishing efficacy in longitudinal double-blind trials and getting past the patent hurdle. The field has been working away for decades now with few actionable results besides curing <em>C. Diff, </em>and it doesn&#8217;t feel like anything has especially changed in the last few years to really change that trajectory. It is similarly unlikely that microbiome research really <strong>stops</strong> in any capacity, there is a fair bit of money still pushing it forwards. I suspect the next decade will involving shying away from the complexity of typical FMT&#8217;s, and instead focus on so-called &#8216;<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10724401/">next-generation FMT</a>&#8217;. These sets of methods &#8212; which isolate specific bacterial or viral components of a feces sample, rather than use all of it &#8212; will potentially allow microbiome therapeutics to reach previously impossible predictability, standardization, and efficacy. Perhaps this alone will address many of the issues we&#8217;ve raised here! Time will tell. </p><div><hr></div><p><em>Thank you for reading this post! Every two weeks, I&#8217;ll be posting something about biology, ML, or the intersection of the two. If this interests you, please subscribe! If you enjoyed this post, here are some other ones you may like as well: </em></p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6640bad3-8552-46da-afab-c86c6bda0d92&quot;,&quot;caption&quot;:&quot;Introduction Some background The hard stuff The relevance of toxicity datasets to the clinical problem Methodological problems in toxicity datasets Intraspecies toxicity variability Toxicity synergism Conclusions Introduction There are now (claimed) foundation models for&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A primer on why computational predictive toxicology is hard&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:223596199,&quot;name&quot;:&quot;Abhishaike Mahajan&quot;,&quot;bio&quot;:&quot;biology posting&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb7749b-731a-42ac-96d4-e823c76fd218_400x400.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-05T13:32:16.914Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.abhishaike.com/p/a-primer-on-why-computational-predictive&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144021611,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:3,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Owl Posting&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2caaecbe-ec6f-4c50-9596-c60ebade9ad3_400x400.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;1222f357-c46e-4e66-b50c-ad3f1b7dc9af&quot;,&quot;caption&quot;:&quot;Introduction How does MD work in practice? System Definition Force Fields The basic definition The details Energy minimization + equilibration Production simulation Interesting miscellaneous things Bypassing small timescales Quantum effects Free energy calculations&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A primer on molecular dynamics&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:223596199,&quot;name&quot;:&quot;Abhishaike Mahajan&quot;,&quot;bio&quot;:&quot;biology posting&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb7749b-731a-42ac-96d4-e823c76fd218_400x400.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-20T03:22:47.489Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.abhishaike.com/p/a-primer-on-molecular-dynamics&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144423480,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Owl Posting&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2caaecbe-ec6f-4c50-9596-c60ebade9ad3_400x400.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;bba04389-25ac-4af9-a75d-f600d8ac70d8&quot;,&quot;caption&quot;:&quot;Introduction Clinical Medicine The pursuit of noninvasive glucose - hunting the deceitful turkey Hidden stratification causes clinically meaningful failures in machine learning for medical imaging The amoral nonsense of Orchid&#8217;s embryo selection Why conventional wisdom on health care is wrong (a primer)&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;sm&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;A primer on my favorite pessimistic scientific articles&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:223596199,&quot;name&quot;:&quot;Abhishaike Mahajan&quot;,&quot;bio&quot;:&quot;biology posting&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb7749b-731a-42ac-96d4-e823c76fd218_400x400.jpeg&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2024-05-11T14:05:44.833Z&quot;,&quot;cover_image&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67e2677b-2539-4a09-9481-eb6bb2e3036a_1400x700.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://www.abhishaike.com/p/a-primer-on-my-favorite-pessimistic&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:144442937,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:13,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;Owl Posting&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2caaecbe-ec6f-4c50-9596-c60ebade9ad3_400x400.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div>]]></content:encoded></item><item><title><![CDATA[A primer on molecular dynamics]]></title><description><![CDATA[7.8k words, 36 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-molecular-dynamics</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-molecular-dynamics</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 20 May 2024 03:22:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DliF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DliF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DliF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!DliF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!DliF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!DliF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DliF!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8801135,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/144423480?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DliF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!DliF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!DliF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!DliF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75e0e584-09d7-4907-bd6e-4554320257e0_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.abhishaike.com/i/144423480/introduction">Introduction</a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/how-does-md-work-in-practice">How does MD work in practice?</a></p><ol><li><p><a href="https://www.abhishaike.com/i/144423480/system-definition">System Definition</a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/force-fields">Force Fields </a></p><ol><li><p><a href="https://www.abhishaike.com/i/144423480/the-basic-definition">The basic definition</a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/the-details">The details</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144423480/energy-minimization-equilibration">Energy minimization + equilibration </a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/production-simulation">Production simulation</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144423480/interesting-miscellaneous-things">Interesting miscellaneous things</a></p><ol><li><p><a href="https://www.abhishaike.com/i/144423480/bypassing-small-timescales">Bypassing small timescales</a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/quantum-effects">Quantum effects </a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/free-energy-calculations">Free energy calculations</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144423480/case-studies">Case studies </a></p><ol><li><p><a href="https://www.abhishaike.com/i/144423480/discovery-of-lirafugratinib-rly-a-highly-selective-irreversible-small-molecule-inhibitor-of-fgfr">Discovery of lirafugratinib (RLY-4008), a highly selective irreversible small-molecule inhibitor of FGFR2</a></p></li><li><p><a href="https://www.abhishaike.com/i/144423480/characterizing-receptor-flexibility-to-predict-mutations-that-lead-to-human-adaptation-of-influenza-hemagglutinin">Characterizing Receptor Flexibility to Predict Mutations That Lead to Human Adaptation of Influenza Hemagglutinin</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144423480/conclusion">Conclusion</a></p><p></p></li></ol><h1>Introduction </h1><p>I&#8217;ve recently become very interested in molecular dynamics, or MD. </p><p>Why would you want to do MD at all? MD allow you to watch molecules dance in full atomic detail. While experimental techniques give you static molecular mugshots, MD lets you observe the dynamic behavior of molecules over time. And understanding molecular motion is key for everything in biology, <strong>everything</strong> in biology is vibrating molecules underneath the surface! Beyond just watching, you can use MD to predict molecular behavior. Want to know how tightly a drug will bind its target? Run an MD simulation and calculate the binding free energy. Trying to design a new enzyme? Simulate different designs to see which folds and functions best. Obviously, there are limitations &#8212; as we&#8217;ll get into &#8212; but the possibilities the field presents are fascinating. </p><p>Unfortunately, MD also happens to be incredibly hard to learn on your own. Unlike my other posts, where I can go from nearly zero knowledge to blog-post-capable in a week, MD has taken me the better part of a month and I&#8217;m <em>still </em>unsure of a lot of things. I&#8217;m much more used to the general fuzziness of ML; extremely overparameterized statistical models that can actively learn means that their exact theoretical underpinnings are, at best, interesting, but rarely useful outside of a few cases. So, hand-waving is practically fine! This isn&#8217;t the case in MD, all of it feels important to know and understand. You&#8217;re dealing with physics after all, and even if you hand-wave away certain aspects of the physics, your simplified model will be that much less capable of understanding certain phenomena. </p><p>In this post, I&#8217;ll try to equip you with enough knowledge to run basic simulations on your own, while also having enough theoretical backing to understand what&#8217;s actually going on under the surface. Obviously, we&#8217;ll be skipping a lot of things, but there should be enough here to vaguely understand some MD papers! To test this out, we&#8217;ll also go over two MD papers at the end of the post. </p><h1>How does MD work in practice?</h1><p>Consider a protein that we dreamed up. Completely unfolded, just a straight line of amino-acids linked together, Glycine-Lysine-Alanine, and so on. What happens if we drop this protein in a bucket of water and wait a few hundred nanoseconds? It&#8217;ll slowly turn from a straight line into a knotted mess of tangles and curves. Or, in other words, it will <strong>fold. </strong></p><p>Most people will be familiar with the concept of a protein folding, the Alphafold2 news-cycle hammered that into most of us. But it's worth considering what it means for a protein to 'want' to fold at all. In many ways, a protein 'wants' to fold in the same way a dropped apple 'wants' to fall to the ground &#8212; physics simply pulls it in that direction. In the case of the apple, gravity is making the biggest call. In the case of the protein, the forces at play are more at the molecular level. Here are some:</p><ul><li><p><strong>Bond forces</strong>: Chemical attachments between neighboring atoms that can be stretched, twisted, or turned to different angles.</p></li><li><p><strong>Electrostatic interactions</strong>: Attractions or repulsion between charged atoms. </p></li><li><p><strong>Van der Waals forces</strong>: Weak electromagnetic attractions between all atoms in close proximity</p></li><li><p><strong>Solvent interactions</strong>: Some parts of a protein may be hydrophobic/hydrophilic and will desire to hide its hydrophobic elements away from water + expose its hydrophilic elements to water. </p></li></ul><p>There&#8217;s a few more of course, but these are the major ones. These forces will push and pull on every atom contained in the protein, ultimately forcing it to a certain configuration. As mentioned, this final configuration is the folded state of the protein, but we could interpret this final state in a more physics-grounded way: <strong>low thermodynamic free energy.</strong> It&#8217;s not worth pondering too much <em>how</em> to quantitatively assess this, but rest assured there are, we&#8217;ll discuss it a bit more later. Low thermodynamic free energy of a folded structure means it&#8217;s a stable structure (hard to move from), and high means its an unstable structure (easy to move from). </p><p>To note, the above pushing-pulling on our protein isn&#8217;t just the case for protein folding, it&#8217;s for all problems! Docking with small molecules, chemical reactions, everything moves towards the direction of low thermodynamic free energy. </p><p>Let&#8217;s return to our protein though, how do we figure out the final folded state of our protein with MD? There&#8217;s some nuance here in that some proteins will have multiple states with low thermodynamic free energy states, so the equilibrium state for a protein could be flip-flopping between multiple states. Let&#8217;s ignore that for now and assume there is a singular final state for this specific protein. </p><h2>System Definition</h2><p>First, we dunk the protein in a big cube of water. </p><p>Computationally that is. This isn&#8217;t an oversimplification, you literally &#8216;define&#8217; a 5x5x5 (or however large) nanometer cube filled with H20 molecules around your protein, or the <strong>solvent</strong>. Why water? Most proteins exist in aqueous environments, so this is just an attempt to match the &#8216;natural&#8217; environment of a protein. This is important because the environment often defines what final states a protein can reach &#8212; there are many possible end states of a protein, but not all of them are reachable in all environments. Lots of simulations also add in various salts and ions in the solution to bring the overall charge of the cube of water + protein to neutral, as that&#8217;s also (mostly) biologically realistic. To note, water isn&#8217;t the only possible solvent, one can really use anything, like ethanol, it just depends on whether the MD software package you&#8217;re using supports that. </p><p>Two last things. One, you may often hear the phrase &#8216;<strong>periodic boundary condition</strong>&#8217; when it comes to the definition of this box of water, this just means that the box will loop any molecule that touches a side to the other side of the box. This isn&#8217;t always done but is common enough that I thought I should mention it. Two, the solvent can be represented <strong>implicitly</strong> or <strong>explicitly </strong>&#8212; if implicit, the solvent is represented as a continuous medium, if explicit, represented as individual molecules. This will change how the potential energy (discussed later) is calculated, but it&#8217;s a minor-add on that I won&#8217;t discuss too heavily during the rest of this post. For clarity's sake, the equations as present in the next sections will be assuming an implicit solvent. </p><p>This box filled with salts and ions + water + maybe some other things + our protein is often referred to as the &#8216;<strong>system</strong>&#8217;. Let&#8217;s not get <em>too</em> ahead of ourselves with pretending we&#8217;re being actually realistic though, we&#8217;re missing the millions of other proteomic, ionic, and otherwise interactions that our protein has while it actually floats around in-vivo. But assuming a box of water with some salts isn&#8217;t a bad starting point! </p><h2>Force Fields</h2><h3>The basic definition</h3><p>After we have our system, we need to get around to defining the laws of physics of that system. As in, given the charge, size, and so on for every atom in our system, what does real-world physics say should happen to the position of those atoms second to second? This is often deemed the &#8216;<strong>force field</strong>&#8217; of the system, and there&#8217;s lots of different options for that. <em>That&#8217;s strange</em>, you may ask, <em>isn&#8217;t there only one set of laws of physic</em>s? Well, yes. But we don&#8217;t really know how to accurately capture those laws of physics in a computer in a way that&#8217;s even slightly tractable to solve, so a bunch of very smart people have all built very different ways of approximating those laws. The grander strokes of the force themselves are captured; all well-known forcefields have <em>some</em> conception of the major forces (discussed below). They differ in the nuances; some very minor things are ignored in some (e.g. flexibility in the water molecules), some hyperparameters are shifted away from theoretically correct values to empirically derived ones, and so on. A few names of force fields you may come across are CHARMM, AMBER, and GROMOS. </p><p><em>How do we choose which force field to use for our problem?</em> Some are explicitly tuned for some problems over others (e.g. modeling biomolecules, like GLYCAM), some are generally useful for any atomic problem, some are meant to be less accurate but fast, some are more accurate but slow, and so on. It&#8217;s a very empirical exercise to find which one will work best, and whether it needs to be tuned further. </p><p><em>How accurate are we being here with these force fields?</em> Surely more than the system, right?  Most force fields used in practice do include an awful lot of effects and model things quite well but are missing one thing: <strong>quantum effects</strong>. This includes stuff like electron tunneling, electron delocalization, and electron correlation; all of these are entirely ignored in the calculations. How does this practically change the results of the simulations? It depends <em>entirely</em> on the problem. For our problem of finding the fold of a protein, quantum effects (as far as I can tell) play a relatively minor role. The main cases in which it does make a difference is when the simulation involves breaking/forming of chemical bonds or transition metals, as those are the two primary cases (amongst a long tail of others) where quantum effects become extremely important. We&#8217;ll ignore these for now, and discuss it bit more in a future section. </p><p>A final note on understanding force fields: a helpful way to look at our proteins is as a bunch of atoms, all connected by <strong>springs</strong>. The exact composition, tension, push/pulling force, and flexibility of these springs can be thought of as the force field, which defines all of these values in extreme detail. This analogy eventually falls apart, as we&#8217;ll see in the next section, but as far as mental models for MD go, springs aren&#8217;t a bad one!</p><h3>The details</h3><p>A lot of MD &#8216;basics&#8217; tutorials are really fuzzy on what the force field is <strong>actually</strong> defining, instead relying on analogies or hand-wavy explanations. And, on the other hand, actual MD papers feel extremely complicated. Is there a middle ground here? I&#8217;m not sure, but I&#8217;ll try to get there. </p><p>What <em>actually</em> is a force field? Mathematically, it is a way to establish what the total <strong>potential energy</strong> of a system is. This is the value that we ultimately want to decrease as much as possible during our MD simulation. <br><br>How do we do this? Well, we first need an equation for potential energy. Why? If we have an equation that equals that, we can take the gradient of that equation with respect to atomic position. Why? If we take the gradient of an equation with respect to atomic position, we learn how to modify atomic position to maximize potential energy. But we don&#8217;t want to maximize potential energy, we want to minimize it. <strong>So we instead want to take the negative gradient of the potential energy equation</strong>. And that&#8217;s it!</p><p>Here is the potential energy equation used by many MD force fields. First, the equations for finding potential energy: </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\nE_{\\text{total}} = E(\\mathbf{r}_1, \\mathbf{r}_2, \\ldots, \\mathbf{r}_N)\n\\end{equation}&quot;,&quot;id&quot;:&quot;CYXFALEDCC&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\nE_{\\text{total}} = \\sum_{\\text{bonds}} k_b (r - r_0)^2 + \\sum_{\\text{angles}} k_{\\theta} (\\theta - \\theta_0)^2 + \\sum_{\\text{dihedrals}} V_n [1 + \\cos(n\\phi - \\gamma)] + \\sum_{i=1}^{N-1} \\sum_{j=i+1}^{N} \\left[ \\frac{A_{ij}}{R_{ij}^{12}} - \\frac{B_{ij}}{R_{ij}^{6}} + \\frac{q_i q_j}{\\epsilon R_{ij}} \\right]\n\\end{equation}\n&quot;,&quot;id&quot;:&quot;WNGBDBSNOH&quot;}" data-component-name="LatexBlockToDOM"></div><p>Or, represented in picture form:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b9ZZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 424w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 848w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 1272w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png" width="1456" height="420" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:420,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Molecules | Free Full-Text | An Overview of Molecular Modeling for Drug ...&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Molecules | Free Full-Text | An Overview of Molecular Modeling for Drug ..." title="Molecules | Free Full-Text | An Overview of Molecular Modeling for Drug ..." srcset="https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 424w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 848w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 1272w, https://substackcdn.com/image/fetch/$s_!b9ZZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69f5b5b7-cccd-40fb-8a45-0ea46ae78c0d_3350x966.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.mdpi.com/1420-3049/24/9/1693">here</a></figcaption></figure></div><p>But, remember, we actually want the derivative of each term here, not potential energy itself! This is also referred to as <strong>force</strong>. The following equation for it is a bit ugly and doesn&#8217;t make much intuitive sense, but it&#8217;s provided here for clarity's sake. <strong>We won&#8217;t be referring back to it!</strong> </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\frac{dE_{\\text{total}}}{dr} = \\sum_{\\text{bonds}} 2k_b (r - r_0) + \\sum_{\\text{angles}} 2k_{\\theta} (\\theta - \\theta_0) \\frac{d\\theta}{dr} + \\sum_{\\text{dihedrals}} V_n n \\sin(n\\phi - \\gamma) \\frac{d\\phi}{dr} + \\sum_{i=1}^{N-1} \\sum_{j=i+1}^{N} \\left[ -\\frac{12 A_{ij}}{R_{ij}^{13}} + \\frac{6 B_{ij}}{R_{ij}^{7}} - \\frac{q_i q_j}{\\epsilon R_{ij}^2} \\right]\n&quot;,&quot;id&quot;:&quot;BEHRTJXYBQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>If you&#8217;ve committed to reading this section, relax! The potential energy equation is genuinely quite simple, we&#8217;ll get through this. Let&#8217;s break it apart into pieces and explain the concept behind each one. </p><div><hr></div><p>First up is quantifying the energy created by the bonds between atoms. </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\n\\sum_{\\text{bonds}} k_b (r - r_0)^2 \n\\end{equation}&quot;,&quot;id&quot;:&quot;AUKURZBXGR&quot;}" data-component-name="LatexBlockToDOM"></div><p>We&#8217;re going to iterate through every chemical bond in our system. For each bond, we&#8217;ll take the difference between that bond length and &#119903;_0, which refers to the bond length at which energy is minimized. &#119903;_0&#8203; is static across bonds of the same type, like double bonds or bonds between carbon-carbons. So, in essence, we&#8217;re checking how much a bond has deviated from its &#8216;ideal&#8217;, low-energy form. Then we square that, because we&#8217;re treating bonds as springs, and <a href="https://en.wikipedia.org/wiki/Hooke%27s_law">we do that for springs</a> (other reasons too obviously, but kind of unimportant). Then, we multiply it all by &#119896;_&#119887;&#8203;, which is a static value of how much that particular bond &#8216;resists&#8217; deformation. Higher &#119896;_&#119887; means stiffer, so more potential energy is created by deviations. As with &#119903;_0, &#119896;_&#119887; is static across all bonds of the same type.</p><div><hr></div><p>Next is quantifying energy created by the angles between atoms. </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{\\text{angles}} k_{\\theta} (\\theta - \\theta_0)^2 &quot;,&quot;id&quot;:&quot;DFQXUVMVSG&quot;}" data-component-name="LatexBlockToDOM"></div><p>Basically the exact same as the above equation, just slightly modified. We&#8217;re going to iterate through every angle formed by three connected atoms in our system. For each angle, we&#8217;ll take the difference between the current angle &#120579; and &#120579;_0, which is the angle at which the energy is minimized. &#120579;_0&#8203; is fixed for angles of the same type, like the typical 109.5 degrees for sp3-hybridized carbon atoms. Again, we&#8217;re checking how much an angle has deviated from its &#8216;ideal&#8217;, low-energy form. Then we square that deviation, just like we do with bonds, because angles also behave like springs in this model. After that, we multiply it by &#119896;_&#120579;&#8203;, a static value that indicates how much the angle &#8216;resists&#8217; deformation. A higher &#119896;&#120579;&#8203; means the angle is stiffer and thus more potential energy is created by deviations. Like &#120579;_0, &#119896;&#120579;&#8203; is the same for all angles of the same type.</p><div><hr></div><p>First up, the energy created by the torsion of the bonds between atoms:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{\\text{dihedrals}} V_n [1 + \\cos(n\\phi - \\gamma)] &quot;,&quot;id&quot;:&quot;DXKJXGUASV&quot;}" data-component-name="LatexBlockToDOM"></div><p>This one requires some definitions. Mainly what &#8216;dihedral&#8217; angles are. Imagine you have four contiguous atoms connected in a chain: A-B-C-D. The dihedral angle is the angle between the plane formed by atoms A-B-C and the plane formed by atoms B-C-D. Knowing the dihedral angle allows you to understand how &#8216;twisted&#8217; the configuration of atom bonds are. </p><p>We&#8217;re going to iterate every dihedral &#120601; angle in our system. For each &#120601;<em>, </em>we&#8217;ll plug it into a cosine function that&#8217;ll tell us how much it deviates from it&#8217;s ideal, low-energy position. Within the cosine function, we&#8217;ll multiply it by n to account for &#120601; wrapping around being identical in meaning (e.g, 360 &#120601; is equivalent to 720 &#120601;) and subtract that from the ideal dihedral angle &#120574;. As with all other static values, &#120574; is constant across known sets of 4 atoms. To ensure all values are positive, we add 1 to the cosine result. </p><p>Finally, we multiply by &#119881;&#119899;&#8203;, a constant that represents the energy cost of deviating from the ideal angle. A higher &#119881;&#119899; means a higher energy penalty for deviations. This constant is again set for all dihedrals of the same type.</p><div><hr></div><p>Finally, we have a term to define energy created by not-bond-related forces. </p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\sum_{i=1}^{N-1} \\sum_{j=i+1}^{N} \\left[ \\frac{A_{ij}}{R_{ij}^{12}} - \\frac{B_{ij}}{R_{ij}^{6}} + \\frac{q_i q_j}{\\epsilon R_{ij}} \\right]\n&quot;,&quot;id&quot;:&quot;EYYKSTWERI&quot;}" data-component-name="LatexBlockToDOM"></div><p>Because this isn&#8217;t bond-related, it&#8217;ll be calculated between every single pair of atoms in our system, hence the double sigma. On that note, not-bond-related is a bit general, isn&#8217;t it? This equation actually refers to two forces: Lennard-Jones potential (van der Waals forces) represent the first two, and electrostatic interactions represent the third one. Let&#8217;s go through them.</p><p>The first term is the repulsive energy between atoms &#119894;_<em>i</em> and &#119895;_<em>j </em>(van der Waals repulsion). Here, &#119877;_&#119894;&#119895;&#8203; is the distance between the atoms, and &#119860;_&#119894;&#119895;&#8203; is a constant that controls the strength of the repulsive force. The distance is raised to the 12th power, meaning the repulsive energy increases very rapidly as the atoms get closer together, preventing them from overlapping.</p><p>The second term is the attractive energy between atoms &#119894;_<em>i</em> and &#119895;_<em>j </em>(van der Waals attraction). Again, &#119877;_&#119894;&#119895;&#8203; is the distance between the atoms, and &#119861;_&#119894;&#119895;&#8203; is a constant that controls the strength of the attractive force. The distance is raised to the 6th power, so the attractive energy decreases as the atoms move apart, but not as sharply as the repulsive energy.</p><p>The third term is the electrostatic energy between atoms &#119894;_<em>i</em> and &#119895;_<em>j</em>. &#119902;_&#119894;&#8203; and &#119902;_&#119895; are the charges on the atoms, and &#120598; is the dielectric constant, which reduces the effective strength of the electrostatic interaction based on the surrounding medium. Remember, in our case, this is water! The distance &#119877;&#119894;&#8203; again appears in the denominator, but with no exponents, so the electrostatic energy linearly decreases as the atoms move apart.</p><div><hr></div><p>That wasn&#8217;t too bad! All we need are bond stretches, bond angles, bond dihedral angles, van der Waals forces, and electrostatic potentials to make up most of the force field math, easy! </p><p>Are we missing anything? </p><p>There&#8217;s a few more forces, specifically explicit solvent effects (if we are representing the solvent explicitly), improper torsions, and the exact impact of temperature/pressure on the system. These are relatively minor compared to the four we have discussed &#8212; but still important! If you&#8217;d like to get a much more in-depth read into what&#8217;s going on in the background of force fields, I&#8217;d recommend checking out the <a href="https://ambermd.org/doc12/Amber23.pdf#page=276">AMBER docs</a> and <a href="http://docs.openmm.org/7.0.0/userguide/theory.html">OpenMM docs</a>; the MD software package documentation ecosystem contains <strong>much</strong> more useful information compared to any individual paper. </p><h2>Energy minimization + equilibration</h2><p>Finally, we can move on from force fields.</p><p>There are two steps we do before simulation: <strong>energy minimization</strong> and <strong>equilibration</strong>. </p><p>After dropping our protein into a bowl of water and setting up the force field, we may find two problems when attempting simulation. One, there may be accidental overlaps of molecules in our system, leading to massive electrostatic force values that&#8217;ll break our system. Two, attempting to immediately perform simulation will result in extremely high force values, as pressure + temperature goes from zero to their set values (lets say 1 psi + 300 Kelvin) over the span of a picosecond. This, understandably, means the simulation loses a bit of realism. </p><p>For the overlap problem, we perform energy minimization. Here, we just iteratively calculate the potential energy of the system and use gradient descent algorithms to iterative adjust <strong>only</strong> the position of the molecules, not the velocities. The end goal is a configuration that is a local minimum in the potential energy landscape.  </p><p>For the temperature problem, we perform equilibration. During this phase, the same selected force field is used, but the pressure and temperature of the system start from a very low value and are slowly brought to the desired values over a longer time span (10-100&#8217;s of picoseconds). This allows the system to &#8216;settle&#8217;, both the water atoms and the protein. Exactly <em>how</em> the pressure and temperature are raised + maintained are sets of equations that are often referred to as, respectively, <strong>thermostats</strong> and <strong>barostats</strong>. </p><p>There are a few questions we may have over this.</p><p><em>Why do we refer to this separately?</em> <em>Isn&#8217;t this technically simulation since we&#8217;re using our chosen force field in our system?</em> Yes &#8212; with pressure + temperature variation if we&#8217;re being pedantic &#8212; but, because researchers don&#8217;t typically consider this as part of the MD trajectory when doing post-run analysis, it&#8217;s also not considered part of the actual simulation. We&#8217;ll defer explanation of simulation details to the next section. </p><p><em>How do we decide how fast to raise the temperature/pressure?</em> <a href="https://ambermd.org/tutorials/advanced/tutorial8/loop7.php">The whole subject is somewhat of an art, as the AMBER documentation notes, but basic things work fine:</a></p><blockquote><p>Equilibration protocols are still largely a matter of personal preference.&nbsp; Some protocols call for very elaborate procedures involving gradually increasing temperature in a step-wise fashion while other more aggressive approach simply use a linear temperature gradient and heat the system up to the desired temperature.&nbsp;</p></blockquote><p><em>Neither temperature nor pressure was included in the force field equations, where do they come in? </em>Because, technically speaking, neither temperature nor pressure modifications change the potential energy of the system! They <em>do</em> indirectly modify it, but only via altering the positions + velocities of particles in the system. We won&#8217;t discuss the temperature/pressure math here too much, because it doesn&#8217;t feel super valuable. For more details, I once again recommend checking out the AMBER or OpenMM docs! </p><p><em>Now</em> we&#8217;re finally ready to start our simulation. </p><h2>Production simulation</h2><p>Let&#8217;s start at the zeroth second of our simulation, T=0. </p><p>At this moment, the positions and velocities of every atom in our system are known (either set to static values or derived as a result of the energy minimization step), and we&#8217;d like to know how it changed during some established time-step. Typically, MD simulations at the scale of protein folding operate at the femtosecond level, or 10^-15 seconds. Let&#8217;s operate with a similar mindset, our simulation with go from 0, 1, 2&#8230;and so on femtoseconds. Why can&#8217;t we go faster? <a href="https://web.stanford.edu/class/archive/cs/cs279/cs279.1222/lectures/lecture4_annot.pdf">Numerical instability, this lecture covers it a bit more</a>.</p><p>What happens at T=1?</p><p>First, let&#8217;s calculate the force of our system at T=0 needed to push the i&#8217;th particle in the direction of the lowest potential energy. This is the negative derivative of the potential energy of the system, as discussed above. We&#8217;ll do this for every particle in our system:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\n\\vec{F}_i = -\\nabla_{\\vec{r}_i} E(\\mathbf{r}_1, \\mathbf{r}_2, \\ldots, \\mathbf{r}_N)\n\\end{equation}\n&quot;,&quot;id&quot;:&quot;ECNMCMCXSF&quot;}" data-component-name="LatexBlockToDOM"></div><p>From here, we need to find some way to tie this force into what the new velocity + position for each particle at T=1. We can defer to Newton&#8217;s second law for this! Because we know force and the mass of every particle (provided by the force field), we can use the law to find the acceleration for each particle.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\vec{F}_i = m_i  \\vec{a}_i&quot;,&quot;id&quot;:&quot;XXTAIBXSVI&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;m_i \\vec{a}_i = -\\nabla_{\\vec{r}_i} E(\\mathbf{r}_1, \\mathbf{r}_2, \\ldots, \\mathbf{r}_N)\n&quot;,&quot;id&quot;:&quot;OQJLNIRUIN&quot;}" data-component-name="LatexBlockToDOM"></div><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\vec{a}_i = -\\frac{1}{m_i} \\nabla_{\\vec{r}_i} E(\\mathbf{r}_1, \\mathbf{r}_2, \\ldots, \\mathbf{r}_N)\n&quot;,&quot;id&quot;:&quot;VVLYQPVDQY&quot;}" data-component-name="LatexBlockToDOM"></div><p>But we aren&#8217;t actually specifically interested in the acceleration of each particle, but rather the velocity and position changes over a time interval! For that, we need to rely on methods that can integrate over Newtons laws as, if you recall, acceleration is the derivative of velocity and second derivative of position. </p><p>Because my differential equations background is pretty horrible, we&#8217;ll do something <strong>very</strong> basic and apply Euler&#8217;s method here to grab out the velocity + position update. </p><p>We can approximate a velocity update with this:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\vec{v}_i(t + \\Delta t) \\approx \\vec{v}_i(t) + \\vec{a}_i(t) \\Delta t\n&quot;,&quot;id&quot;:&quot;RENEEKLUHQ&quot;}" data-component-name="LatexBlockToDOM"></div><p>Reasonably intuitive, we just need to substitute the acceleration value here with the re-arranged Newton&#8217;s second law.</p><p>The position update is even easier.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\vec{r}_i(t + \\Delta t) = \\vec{r}_i(t) + \\vec{v}_i(t) \\Delta t\n&quot;,&quot;id&quot;:&quot;RFCLCNVNXD&quot;}" data-component-name="LatexBlockToDOM"></div><p>Keep in mind though, I&#8217;m using Euler&#8217;s method here is because it&#8217;s simple! In practice, more sophisticated methods are used, such as <a href="https://en.wikipedia.org/wiki/Verlet_integration">Verlet Integration</a> or <a href="https://en.wikipedia.org/wiki/Langevin_equation">Langevin Integration</a>, because Euler&#8217;s method is extremely inaccurate and slow for any reasonably sized time-step. </p><p>And that&#8217;s it! We now know the updated velocity and position at the next time step, we just need to do this a few million times and our simulation is complete! Over very many hours of compute time, we&#8217;ll end up a massive stack of data known as a <strong>trajectory</strong>, which will encode the position and velocities for every particle in our system from timestep-to-timestep, showing the molecular dance that occurs. The video below is an accurate representation of how our folded protein's trajectory may look (start at 0:15). Lots of vibrating around, slowly poking its way towards a stable structure, and then fully stabilizing &#8212; all over 6 million femtoseconds (6 microseconds). </p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;c3979982-8f4f-4acd-b963-b7df56622ac0&quot;,&quot;duration&quot;:null}"></div><p>Gorgeous! With that, we&#8217;ve completed a very basic simulation workflow. There&#8217;s a lot of post-simulation analysis to be done, but as that is usually custom to the problem one is solving, we&#8217;ll leave example analyses to the case study section below. </p><h2>Interesting miscellaneous things </h2><p>Here is a collection of things that didn&#8217;t come up in the above section but may still pop up in MD papers + are cool. </p><h3>Bypassing small timescales</h3><p>Because MD time-steps are limited to femtoseconds &#8212; due to increasing instability and inaccuracy upon integrating Newton&#8217;s second law if pushed beyond that &#8212; a huge tail of biological phenomena are extremely difficult to model reasonably well. Our initial problem, protein folding, would be a hard sell for us to <em>actually</em> do &#8212; many protein folding events typically take milliseconds to <strong>seconds</strong>. Ligand binding and unbinding events often have microsecond to millisecond lifetimes. Allosteric transitions, ion channel gating, and major conformational changes associated with protein function can also take microseconds or longer. </p><p>This means that in a typical MD simulation, we're only directly observing a tiny fraction of the molecule&#8217;s full dynamic behavior. We might see fast, small-scale motions like sidechain rotations, loop fluctuations, and transient hydrogen bonds, but we likely won't capture large-scale slow motions or rare conformations. And, unfortunately, biology is filled with large, slow things and rare events. </p><p>There are ways around this!</p><p>One could simply scale things up. Anton, a supercomputer built by <a href="https://www.deshawresearch.com/">DESRES</a>, was custom-built from the ground up with optimized hardware to tackle exactly this through pure computational power. The extreme modeling speeds they were able to achieve yielded a humorous paper title that I could only describe as a flex: <strong><a href="https://dl.acm.org/doi/abs/10.1145/3458817.3487397">Anton 3: twenty microseconds of molecular dynamics simulation before lunch</a></strong><a href="https://dl.acm.org/doi/abs/10.1145/3458817.3487397">.</a> Of course, while academics are able to access fragments of the full Anton system with permission, using it regularly isn&#8217;t feasible for any non-employee of DESRES. </p><p>But there&#8217;s a subtle point here. The issue with small timescales has nothing to do with <strong>time</strong> specifically, but rather that molecules are traversing an energy landscape, may end up roaming around in local minima of potential energy states, and only eventually find their way out after lengthy stretches of microseconds. Stretching out the time of the simulation is one way to solve the problem, as Anton was built to do. But another option is to <strong>change the simulation itself</strong>, such that it&#8217;s able to quickly traverse this energy landscape without getting stuck &#8212; allowing us to get that much more information per femtosecond of simulation. </p><p>An increasingly important tactic used here is <strong>enhanced sampling</strong>, which is far more in reach of the average scientist. This is an extremely broad category of methods that seek to massively increase the diversity of molecular dynamics trajectories by changing the &#8216;rules&#8217; of the simulation away from physical reality. There are <strong>tons</strong> of examples. Here are a few: adding <a href="https://en.wikipedia.org/wiki/Metadynamics">potential energy costs to visiting previous states</a> in order to encourage faster stabilization (metadynamics), <a href="https://pubmed.ncbi.nlm.nih.gov/29744830/">running multiple simulations at different temperatures in parallel and allow swapping of temperatures to encourage conformational space exploration</a> (replica exchange molecular dynamics), and <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3438784/">altering the potential energy equations to reduce energy barriers between conformational states</a> (accelerated molecular dynamics). An entire blog post could be written on this subject alone, the diversity of ideas here is immense. <a href="https://arxiv.org/pdf/2202.04164">If interested, a very useful &#8212; though clearly meant for experts &#8212; review paper on the subject can be found here.</a></p><h3>Quantum effects</h3><p>As we mentioned earlier, most classical force fields used in MD simulations don't account for quantum mechanical effects like electron delocalization or tunneling. For many biological systems, like the protein folding example we walked through, this is a reasonable approximation. However, there are certain cases where quantum effects become critical to model accurately, such as when simulating chemical reactions or systems involving transition metals. </p><p>We&#8217;re being vague here. What do we actually <strong>mean </strong>when we say &#8216;quantum effects&#8217;? What we really mean is we go from pretending electrons don&#8217;t exist (beyond the charge they provide) to modeling them directly. Here are a few of the major forces in a quantum system:</p><ol><li><p><strong>Electronic kinetic energy. </strong>Mostly self-explanatory, the force that each electron in the system has. </p></li><li><p><strong>Electron-nucleus electrostatic attractions</strong>. This is the attractive electrostatic interaction between electrons and nucleuses.</p></li><li><p><strong>Electron-electron electrostatic interactions.</strong> This is the repulsive electrostatic interaction between electrons.</p></li><li><p><strong>Exchange energy.</strong> This is a quantum mechanical effect that arises from the Pauli exclusion principle, which states that no two electrons can have the same quantum state. The exchange energy lowers the energy of the system by keeping electrons with the same spin apart.</p></li></ol><p>There&#8217;s a few more as well. Unlike how we treated force fields and simulation, I am going to stay away far away from the math on this one &#8212; I have talked about quantum stuff enough times with physicist friends to understand that attempting short-form explanation of it is rarely worth it. What I <em>will</em> touch on is some terminology. </p><p>Attempting to model quantum effects forces us to deviate from Newtons second law when deriving acceleration updates (and thus velocity and position updates as well). Remember, electrons are not a particle, they are a wave, so Newton&#8217;s second law doesn&#8217;t apply in this case! Because of this, we must rely on the<a href="https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation"> </a><strong><a href="https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation">Schr&#246;dinger equation</a></strong>, which could be seen as the quantum counterpart of Newton&#8217;s second law to allow incorporation of waves. However, the Schr&#246;dinger equation is computationally intractable to solve, so, <a href="https://pubs.acs.org/doi/10.1021/acs.accounts.1c00514">in practice</a>, simplifications like the <a href="https://en.wikipedia.org/wiki/Born%E2%80%93Oppenheimer_approximation">Born&#8211;Oppenheimer (BO) approximation</a> are used to replace it. We could also directly relax the complexity of the quantum &#8216;rules&#8217;, as <a href="https://en.wikipedia.org/wiki/Density_functional_theory">Density Functional Theory (DFT)</a>, or <a href="https://en.wikipedia.org/wiki/Hartree%E2%80%93Fock_method">Hartree&#8211;Fock methods</a> do. These are often combined, e.g, BO is typically used in DFT and Hartee-Fock. </p><p>But even with the approximations, modeling quantum mechanics is still incredibly hard.</p><p>One approach to making quantum effects tractable is to use a method called <strong>quantum mechanics/molecular mechanics (QM/MM)</strong>. In QM/MM, a small region of the system that requires quantum mechanical treatment (like the active site of an enzyme) is modeled using quantum mechanics, while the rest of the system is treated classically with the usual force field we discussed earlier. This does add some extra headaches in <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9105939/">working out how two segments of a molecule having two different laws of physics will interact with one another</a>, but it does seem to work!</p><p>We could also do something a little more clever &#8212; there are some quantum effects that could be reframed in a classical mechanics manner, allowing us to simply tack it alongside the usual potential energy equations. Electron charge distribution &#8212; the ability for electrons to be shared between atoms or to be off-center from its parent atom &#8212; is one of those things. This is a phenomenon that is only noticeable with a quantum lens but <em>could</em> be modeled directly using Newtonian laws by treating electrons as a particle. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6520134/">This was the hope behind polarizable force fields</a>, which ignore all quantum effects other than electron charge distribution, <a href="https://pubs.acs.org/doi/10.1021/jp910674d">such as the AMOEBA force field.</a> This sort of hack <a href="https://cs.stanford.edu/people/ihaque/papers/amoeba.pdf">does seem to yield better performance compared to fixed charge models and compares well to typical pure QM modeling methods</a>, but it does have an added computational cost, as you are dramatically upping the number of particles in the system. </p><h3>Free energy calculations</h3><p>For what it&#8217;s worth, this is my favorite section of this post. </p><p>We could imagine our protein being able to fold itself into a vast landscape of different conformations, each requiring different amounts of energy. Amongst these conformations are a variety that are able to nicely pack all of the hydrophobic residues into the core of the protein, fold itself such that residues are nicely tied up with hydrogen bonds, minimize the steric clashes/stretches/torques of its bonds, and so on &#8212; these states are energetically favorable. </p><p>But is energetically favorable all a protein &#8216;cares&#8217; about? Potential energy, as determined by the interactions within the protein (like bond stretching, van der Waals interactions, etc.), does indeed favor the folded state. If we were only considering potential energy, the protein would always fold into the state with the lowest potential energy. But it doesn&#8217;t.</p><p>What also enters into the picture is <strong>entropy</strong>. It&#8217;s much easier to think of entropy in a statistical manner rather than via its actual definition. Conformations that achieve these nice &#8216;neat&#8217; properties with low potential energies are relatively rare compared to the broad space of all possible conformations. Let's say a protein stumbles into a low-potential energy state. There are very few conformations in this state that maintain this low potential energy - maybe a slight rotation of an amino acid residue or a minor shift in a bond angle would kick it right out of this favorable zone. On the other hand, if a protein is in a high-potential energy state, it can largely do whatever it &#8216;wants&#8217; and it will continue to retain high potential energy. <strong>The former case has low entropy, the latter has high entropy.</strong> </p><p><strong>Systems tend to favor states that are energetically favorable (low potential energy) and states that are statistically favorable (high entropy). </strong>This is a really beautiful idea, our protein is caught being two opposing forces, and must balance them. And an important one too, <strong>because it determines how statistically likely a possible end-state is</strong>. This is extremely important for understanding the interactions between drugs and proteins!</p><p>The interplay between these two values is often referred to as the &#8216;free energy&#8217; of the system and is encapsulated by the Gibbs free energy equation:</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\begin{equation}\nG =  H - T S\n\\end{equation}&quot;,&quot;id&quot;:&quot;XLRHDWVLPH&quot;}" data-component-name="LatexBlockToDOM"></div><ul><li><p>&#119866; is the free energy, which we&#8217;d like to keep as low as possible. </p></li><li><p>&#119867; is the enthalpy, which is largely equivalent to potential energy. </p></li><li><p>&#119878; is the entropy.</p></li><li><p>&#119879; is the temperature in Kelvin. As the temperature rises, the entropy of the system dominates in importance, since atoms that move faster also desire to retain that much more entropy. </p></li></ul><p><strong>Interesting and somewhat unrelated note</strong>, free energy itself could be viewed as &#8216;weighted&#8217; state probability. The Boltzmann distribution provides a way to connect the free energy of a state to the probability of the system being in that state. It states that the probability of a system being in a particular state (&#119875;&#119894;) is proportional to the exponential of the negative free energy of that state (&#119866;&#119894;) divided by the product of the Boltzmann constant (&#119896;) and the temperature (&#119879;), which is in turn divided by the sum of all possible states, which serves as a normalization factor to ensure that the probabilities of all states sum to 1.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;P_i = \\frac{e^{-E_i/kT}}{\\sum_{j} e^{-E_j/kT}}\n&quot;,&quot;id&quot;:&quot;BDRDPJPZAY&quot;}" data-component-name="LatexBlockToDOM"></div><p>But, for complex systems like proteins, naively applying the Gibbs energy equation or the Boltzmann distribution (as given) is <strong>computationally intractable</strong>. There are just too many possible conformations to enumerate and evaluate.</p><p>How do we get around this? </p><p>Well, let&#8217;s ground the problem. Let&#8217;s say we&#8217;re interested we&#8217;re developing a drug and want to assess how stable the binding of the drug is to a protein of interest &#8212; or, in other words, the free energy of the complex. The potential energy here is easy to evaluate, but the entropy is, again, challenging. One way we could simplify the problem is realizing that we perhaps don&#8217;t <em>need</em> to care about the absolute free energy, <strong>but rather the difference in free energy between the ligand bound to the protein, and the ligand unbound to the protein.</strong> To note, we absolutely do lose some information by phrasing the problem this way! If we focus on deriving the <em>change</em> in free energy, we cannot understand the global ranking of this ligand-protein complex in reference to <em>every</em> ligand-protein pair to ever exist. <strong>But we don&#8217;t care about that!</strong> Instead, what we're usually interested in is the difference in free energy between two states, such as the folded and unfolded states of a protein, or the bound and unbound states of a drug-protein complex.</p><p>This makes things <em>so</em> much easier. The realization that differences in free energy is what we actually desire led to the development of <strong>alchemical free energy calculations </strong>&#8212; called &#8216;alchemical&#8217; because it involves unphysical changes to the simulation. In this class of methods, we needn&#8217;t enumerate through every possible state, but rather just the states &#8216;between&#8217; two given states. One common method for doing this is <strong>free energy peturbations</strong>.</p><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420631/">Here&#8217;s how its done.</a></p><p>Let&#8217;s consider state A as the protein by itself and state B as the protein with the docked ligand. In these methods, we need to define a pathway to gradually transform the system from state A (protein without ligand) to state B (protein with ligand bound). This is done by introducing a coupling parameter, often denoted as &#955;, which takes values from 0 to 1. When &#955;=0, the system is in state A, and when &#955;=1, the system is in state B. At intermediate values of &#955;, the system is in a hybrid state between A and B. In practice, this transformation is often done by gradually turning off the interactions between the ligand and the protein (for the binding process) or gradually turning them on (for the unbinding process). At each intermediate &#955; value, we run molecular dynamics simulations to sample the configurations of the system at that particular point along the transformation path.</p><p>For each configuration sampled at a given &#955; value, we calculate the potential energy of the system in both the current state and the neighboring states (i.e., at &#955; and &#955;+&#916;&#955;). The free energy difference between these states can then be calculated using the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4420631/">Zwanzig relationship</a>.</p><div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;\\Delta G = -kT \\ln \\langle \\exp(-\\Delta U / kT) \\rangle\n&quot;,&quot;id&quot;:&quot;VWWZROEOYN&quot;}" data-component-name="LatexBlockToDOM"></div><p>Here, &#916;G is the free energy difference between adjacent states, kT is the Boltzmann constant, T is the temperature, -&#916;U is delta of potential energy between the states, and the angle brackets denote an ensemble average over the configurations sampled at each &#955; value. By summing these free energy differences over the entire transformation path, we can obtain the total free energy difference between states A and B.</p><p>Importantly, these alchemical methods implicitly capture the entropic contributions to the free energy difference through the sampling of configurations at each intermediate state. The Boltzmann factor exp(-&#916;U/kT) weights the contribution of each configuration based on its potential energy difference, effectively accounting for the relative probabilities of different configurations, which is related to entropy.</p><p><a href="https://www.researchgate.net/publication/280557809_Free-energy_calculations_in_structure-based_drug_design">There are </a><strong><a href="https://www.researchgate.net/publication/280557809_Free-energy_calculations_in_structure-based_drug_design">tons</a></strong><a href="https://www.researchgate.net/publication/280557809_Free-energy_calculations_in_structure-based_drug_design"> of alchemical free energy methods.</a> I&#8217;ve missed an enormous amount of nuance in this section, but it should equip you with enough terminology to dig into the details. How do you split up states? Where do methods like this go wrong? How do you ensure entropic contributions by the state split-up are actually captured? Questions for the reader to ponder&#8230;</p><h1>Case studies</h1><p>There is no better way to realize you understand a subject than to read a full-fledged paper about it and to actually <em>get it</em>, when previously it would&#8217;ve all been gibberish. Let&#8217;s go over one. </p><h2><a href="https://www.pnas.org/doi/full/10.1073/pnas.2317756121">Discovery of lirafugratinib (RLY-4008), a highly selective irreversible small-molecule inhibitor of FGFR2</a></h2><p>Released by <a href="https://relaytx.com/">Relay Therapeutics</a> just a few months ago, this paper is one of the few cases I&#8217;ve seen where it is described, in extreme detail, how MD helped lead to the development of a drug. </p><p>Here&#8217;s some context. Existing FGFR inhibitors are pan-FGFR, hitting all isoforms (FGFR1-4) and causing dose-limiting toxicities as a result. Hyperphosphatemia from FGFR1 inhibition and diarrhea from FGFR4 inhibition often lead to dose reductions or treatment interruptions, capping the efficacy of these drugs. A selective FGFR2 inhibitor could potentially avoid these issues and allow more effective treatment. However, the high structural similarity between FGFR isoforms has thwarted conventional structure-based drug design approaches to find selectivity handles.</p><p>To tackle this, the authors turned to long-timescale molecular dynamics (MD) simulations. They ran simulations up to 25 &#956;s to thoroughly sample the conformational landscapes of FGFR1 and FGFR2, hunting for differences in protein dynamics that could be exploited for selective targeting.</p><p>The starting structures for the simulations were based on X-ray crystal structures of the FGFR1 and FGFR2 domains (PDB IDs 4RWI and 1GJO, respectively). The domains were placed in a cubic simulation box with periodic boundary conditions, again with water and ions as a solvent. The protein was modeled using the Amber99SB*-ILDN force field, and the small molecule ligands were modeled using the general Amber force field. The equilibration process was more typical, using an off the shelf thermostat (Nos&#233;&#8211;Hoover thermostat) and barostat (Martyna-Tobias-Klein barostat) to modify, respectively, the temperature and pressure. </p><p>Fascinatingly, the MD simulations revealed a key difference in the behavior of a region called the P-loop between FGFR1 and FGFR2: </p><div class="pullquote"><p>In the simulations of FGFR1, the P-loop quickly contracted from the extended conformation and became disordered. In the simulations of FGFR2, however, a somewhat extended conformation persisted, and the P-loop was far less flexible than that of FGFR1. This result suggested that the P-loop might be a suitable region for selective targeting of FGFR2.</p></div><p>The visualizations of these trajectories are quite beautiful in how much information they convey. Here is the flexible FGFR1 P-loop:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;0bdbea8e-a0da-46ec-af20-4890e3dbfb1c&quot;,&quot;duration&quot;:null}"></div><p>And the less flexible FGFR2 P-loop:</p><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;a544814f-1493-468c-bd1a-56ce5f5eec9d&quot;,&quot;duration&quot;:null}"></div><p>They also use distance deviation plots between the two P-loops to show how different they are:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DiGP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DiGP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 424w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 848w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 1272w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DiGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png" width="676" height="309" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:309,&quot;width&quot;:676,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95525,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DiGP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 424w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 848w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 1272w, https://substackcdn.com/image/fetch/$s_!DiGP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7ec718a5-2861-4278-938e-c2f3ebfd3bb9_676x309.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There&#8217;s something quite special here, MD revealed something &#8212; a conformational difference &#8212; that would&#8217;ve required much more expensive methods (like cryo-EM) to find! The authors continued down this rabbit-hole, hypothesizing how targeting the P-loop affected FGFR1 and FGFR2:</p><div class="pullquote"><p>Our initial design efforts aimed to covalently engage FGFR2 Cys491, a residue that lies at the tip of the extended P-loop and is also targeted by the covalent pan-FGFRi futibatinib (<a href="https://www.pnas.org/doi/full/10.1073/pnas.2317756121#core-r10">10</a>, <a href="https://www.pnas.org/doi/full/10.1073/pnas.2317756121#core-r20">20</a>, <a href="https://www.pnas.org/doi/full/10.1073/pnas.2317756121#core-r21">21</a>). Simulations of FGFR1 binding one of our early selective compounds suggested that its selectivity arose because the compound stabilized the FGFR1 P-loop in an extended conformation with such a low degree of flexibility that covalent engagement of Cys488 (homologous to FGFR2 Cys491) was discouraged. </p></div><p>In other words, targeting the P-loop may be a way to inhibit FGFR2 (bind to) but not FGFR1! </p><p>The paper goes on to use MD for docking purposes between potential molecules and binding to the P-loops of FGFR1 and FGFR2. This is a cool idea! But it&#8217;s also a bit hard to tell how useful MD was here, eyeballing the results don&#8217;t show huge differences in how ligands interacted with the P-loop, and they don&#8217;t attempt to calculate free energies anywhere. It feels likely that the key contributions of MD here were in identifying the P-loop as a selectivity handle and in providing post-hoc rationalization of the binding modes, rather than in directly guiding the docking itself. </p><p>Still though, a drug was made!</p><div class="pullquote"><p>Using this approach as part of an iterative process of optimization, our efforts culminated in the identification of lirafugratinib (RLY-4008), a highly selective, orally available small-molecule FGFR2 inhibitor to enter clinical development.</p></div><h2><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9367001/">Characterizing Receptor Flexibility to Predict Mutations That Lead to Human Adaptation of Influenza Hemagglutinin</a></h2><p>Much more of an exploratory paper than an absolutely clinically useful one, but a great demonstration of the diversity of questions that MD can help answer. </p><p>Here&#8217;s the context: influenza pandemics occur when an (typically) avian influenza strain acquires mutations that allow it to infect and transmit between humans. A key step in this process is when the viral surface protein Hemagglutinin (or HA) switches its binding preference from avian to human sialic acid (SA) receptors via a mutation. However, predicting which specific mutations will enable this switch is challenging! One reason for this difficulty is that human SA receptors are quite flexible and can adopt many different conformations when bound to HA. This diversity makes it hard to determine from crystal structures alone which HA mutations will facilitate human receptor binding. An excellent use case for MD!</p><p>The simulations focused on a combination of SA&#8217;s and HA&#8217;s. </p><p>For SA&#8217;s (the cell surface receptor), 3-SLN, representing the avian sialic acid receptor, 6-SLN and 506-SLN, representing the human sialic acid receptors. </p><p>For HA&#8217;s (the viral protein), DK76, IN05, and SH13 were used. Along these, there were also several mutated forms of these HA&#8217;s used:DK76<sup>Q226L,G228S,A227S</sup> , DK76<sup>Q226L,G228S,P186N</sup>, DK76<sup>Q226L,G228S,P186N,A227S</sup>, DK76<sup>Q226L,G228S</sup>, DK76<sup>E190D,G225D</sup> IN05<sup>Q226L,G228S,S227A</sup>, IN05<sup>Q226L,G228S</sup>. Should these mutations mean anything to you? Some of these are just mutations that are known to &#8216;switch&#8217; affinity of the viral protein between the human SA and the avian SA, and others are hypothetical gain-of-function mutations. </p><p>These were placed in cubic water boxes with periodic boundary conditions. The proteins were modeled using the Amber99SB-ILDN force field, while the GLYCAM06 force field was used for the sialic acid sugars &#8212; an example of multiple specialized force fields working alongside one-another. For each HA and SA combination, they ran simulations for up to 25 &#956;s. The resulting set of trajectories (as in, every frame) were clustered together, yielding 7 clusters of identifiable conformations amongst all SA-HA pair frames (alongside an 8th &#8216;other&#8217; category). </p><p>The rest of the analysis of this is&#8230;confusing. Probably not to someone who studies MD! But given what we&#8217;ve learned so far, hard to grasp. They come up with a way to measure <a href="https://pubs.acs.org/doi/suppl/10.1021/acs.jctc.1c01044/suppl_file/ct1c01044_si_001.pdf">binding affinity without relying on free energy difference</a>s &#8212; likely assisted by how long-running the simulation is, as they are able to directly observe <strong>many</strong> binding-unbinding events &#8212; and assert that their MD-derived numbers match up quite well with experimentally determined binding affinity values. They confirm this by suggesting a mutation to an HA to increase affinity and seeing that both MD and experimental validation both show an increase in binding affinity. </p><p>But I took the liberty of plotting their Kd values (binding affinity) derived from MD (KD_MD) and experimentally-determined (KD_MST) across multiple HA-SA pairs and&#8230;I&#8217;m not really seeing a strong correlation? There are a few cases in which it clearly works, but overall it feels quite random, outside of one outlier case. <a href="https://gist.github.com/Abhishaike/a04c81f5400329dde291980c6fbc9592">I&#8217;ve added a Github gist here of the (very basic) plotting work</a>. It feels like the one-off de-novo mutation showing increased binding affinity in both MD and experimental techniques was either spurious or something that is real, but only works in some situations. I could very much be wrong about this though, please let me know if there&#8217;s something I&#8217;m missing!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lwLT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lwLT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 424w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 848w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 1272w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lwLT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin" width="1456" height="798" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:798,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Output image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Output image" title="Output image" srcset="https://substackcdn.com/image/fetch/$s_!lwLT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 424w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 848w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 1272w, https://substackcdn.com/image/fetch/$s_!lwLT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fcdb45-57c7-4783-8e44-550a444c54d1_2008x1101.bin 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Overall, very interesting hypothesis and cool how large conformational diversity was observed in both experimental determined structures AND in MD! But the binding affinity part of the paper was a bit off, unsure on whether I&#8217;m misunderstanding something or the results in this particular area are quite weak. </p><h1>Conclusion</h1><p>We covered the key components of an MD workflow - defining the system, choosing a force field, energy minimization, running the production simulations. We also looked at a few miscellaneous items, like quantum effects, enhanced sampling, and free energy calculations. Finally, we looked at a few papers, and saw cases of the jargon we&#8217;ve been learning this whole time being used to build <strong>useful</strong> atomic simulations. </p><p>But seven-thousand words later, I've only scratched the surface here! There are endless rabbit holes to dive down - coarse-grained simulations, more information on enhanced sampling, modifying the pH in simulations, and so on. But hopefully this post has given you a solid foundation to build on.</p><p>MD isn&#8217;t perfect and will likely remain imperfect for years to come. The timescales are still severely limited compared to biological reality. Quantum effects are largely ignored, even using approximate QM methods is hard. Force fields are approximations and there's no clear "best" choice. Setting up and running simulations requires a ton of expert knowledge. </p><p>But the future is interesting! </p><p>Things like <a href="https://arxiv.org/abs/2308.11155">neural force fields</a>, <a href="https://github.com/bjing2016/alphaflow">trajectory sampling using Alphafold2-esque models</a>, and <a href="https://pubs.acs.org/doi/10.1021/acs.jpclett.3c01723">trajectory interpolation using neural nets</a> are all on the horizon and could lead to a revolution in the way we work with MD. Hopefully I get a chance to write about some of these directions someday! For now, it&#8217;s early days with these methods, and nothing has popped out as immediately, groundbreakingly useful. But, as with everything in biology-ML, things could shift overnight. </p>]]></content:encoded></item><item><title><![CDATA[A primer on why computational predictive toxicology is hard]]></title><description><![CDATA[3.4k words, 16 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-why-computational-predictive</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-why-computational-predictive</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sun, 05 May 2024 13:32:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Rs5S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rs5S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rs5S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rs5S!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8240831,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/144021611?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rs5S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!Rs5S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7cce9f4-416e-4b7c-a83f-5a9cc96471d4_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.abhishaike.com/i/144021611/introduction">Introduction</a></p></li><li><p><a href="https://www.abhishaike.com/i/144021611/some-background">Some background</a></p></li><li><p><a href="https://www.abhishaike.com/i/144021611/the-hard-stuff">The hard stuff</a></p><ol><li><p><a href="https://www.abhishaike.com/i/144021611/the-relevance-of-toxicity-datasets-to-the-clinical-problem">The relevance of toxicity datasets to the clinical problem</a></p></li><li><p><a href="https://www.abhishaike.com/i/144021611/methodological-problems-in-toxicity-datasets">Methodological problems in toxicity datasets</a></p></li><li><p><a href="https://www.abhishaike.com/i/144021611/intraspecies-toxicity-variability">Intraspecies toxicity variability </a></p></li><li><p><a href="https://www.abhishaike.com/i/144021611/toxicity-synergism">Toxicity synergism  </a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/144021611/conclusion">Conclusions</a></p></li></ol><h1>Introduction </h1><p>There are now (claimed) foundation models for <a href="https://www.science.org/doi/10.1126/science.ade2574">protein sequences</a>, <a href="https://arcinstitute.org/tools/evo">DNA sequences</a>, <a href="https://www.biorxiv.org/content/10.1101/2023.09.20.558508v1">RNA sequences</a>, <a href="https://www.nature.com/articles/s41467-024-46440-3">molecules</a>, <a href="https://www.nature.com/articles/s41592-024-02201-0">scRNA-seq</a>, <a href="https://www.biorxiv.org/content/10.1101/2023.09.24.559168v1">chromatin accessibility</a>, <a href="https://www.nature.com/articles/s41591-024-02857-3">pathology slides</a>, <a href="https://arxiv.org/ftp/arxiv/papers/2402/2402.01034.pdf">medical images</a>, <a href="https://www.nature.com/articles/s41746-022-00742-2">electronic health records</a>, and <a href="https://arxiv.org/abs/2403.00868">clinical free-text</a>. It&#8217;s a dizzying rate of progress.</p><p>But there&#8217;s a few problems in biology that, interestingly enough, have evaded a similar level of ML progress, despite there seemingly being all the necessary conditions to achieve it.</p><p>Toxicology is one of those problems. </p><p>This isn&#8217;t a new insight, <a href="https://www.science.org/content/blog-post/ai-and-hard-stuff">it was called out in one of Derek Lowe&#8217;s posts</a>, where he said: <em>There are no existing AI/ML systems that mitigate clinical failure risks due to target choice or toxicology. </em><a href="https://www.science.org/content/blog-post/another-new-ai-biopharma-company">He also repeats it in a more recent post</a>: <em>&#8216;&#8230;the most badly needed improvements in drug discovery are in the exact areas that are most resistant to AI and machine learning techniques. By which I mean target selection and predictive toxicology.&#8217; </em><a href="https://practicalcheminformatics.blogspot.com/2023/08/we-need-better-benchmarks-for-machine.html">Pat Walters also goes into the subject with much more depth</a>, emphasizing how difficult the whole field is.</p><p>As someone who isn&#8217;t familiar at all with the area of predictive toxicology, that immediately felt strange. Why such little progress? It can&#8217;t be that hard, right? Unlike drug development, where you&#8217;re trying to precisely hit some key molecular mechanism, assessing toxicity almost feels&#8230;brutish in nature. Something that&#8217;s as clear as day, easy to spot out with eyes, easier still to do with a computer trained to look for it. </p><p>Of course, there will be some stragglers that leak through this filtering, but it should be minimal. Obviously a hard problem in its own right, but why isn&#8217;t it close to being solved? </p><p>What&#8217;s up with this field? </p><h1>Some background</h1><p>One may naturally assume that there is a well-established definition of toxicity, a standard blanket definition to delineate between things that are and aren&#8217;t toxic. While there are terms such as <strong>LD<sub>50</sub></strong>, <strong>LC<sub>50</sub></strong>, <strong>EC<sub>50</sub></strong>, and <strong>IC<sub>50</sub></strong>, used to explain the degree by which something is toxic, they are an immense oversimplification. </p><p>When we say a substance is "toxic," there&#8217;s usually a lot of follow-up questions. Is it toxic at any dose? Only above a certain threshold? Is it toxic for everyone, or just for certain susceptible individuals (as we&#8217;ll discuss later)?  The relationship between dose and toxicity is not always linear, and can vary depending on the route of exposure, the duration of exposure, and individual susceptibility factors. A dose that causes no adverse effects when consumed orally might be highly toxic if inhaled or injected. And a dose that is well-tolerated with acute exposure might cause serious harm over longer periods of chronic exposure. </p><p>The very definition of an "adverse effect" resulting from toxicity is not always clear-cut either. Some drug side effects, like mild nausea or headache, might be considered acceptable trade-offs for therapeutic benefit. But others, like liver failure or birth defects, would be considered unacceptable at any dose. This is particularly true when it comes to environmental chemicals, where the effects may be subtler and the exposure levels more variable. Is a chemical that causes a small decrease in IQ scores toxic? What about one that slightly increases the risk of cancer over a lifetime (20+ years)? </p><p>And this is one of the major problems with applying predicting toxicology at all &#8212; defining what is and isn&#8217;t toxic is hard! One may assume the FDA has clear stances on all these, but even they approach it on a &#8216;vibe-based&#8217; perspective. They simply collate the data from in-vitro studies, animal studies, and human clinical trials, and arrive to an approval/no-approval conclusion that is, very often, at odds with some portion of the medical community. </p><p>Of course, we needn&#8217;t get extremely precise with what isn&#8217;t toxic or not toxic to start off with &#8212; something are painfully obviously toxic, whereas other things aren&#8217;t. One common method of handling toxicity earlier in the drug discovery process is to minimize the creation of &#8216;<a href="https://en.wikipedia.org/wiki/Toxicophore">toxicophores</a>&#8217;, or structural motifs in chemical designs that are known to cause downstream issues, during the design process, such as <a href="https://pubmed.ncbi.nlm.nih.gov/24532466/">nitroaromatic compounds</a> (a hyperbolic case). The existence of easily recognizable toxicophores spurned interest in establishing mappings between facets of a chemical structure and the physiological impact it had on organisms, leading to a field of study called &#8216;Quantitative Structure-Activity Relationship&#8217;, or <a href="https://en.wikipedia.org/wiki/Quantitative_structure%E2%80%93activity_relationship">QSAR</a>. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!szK-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!szK-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 424w, https://substackcdn.com/image/fetch/$s_!szK-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 848w, https://substackcdn.com/image/fetch/$s_!szK-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 1272w, https://substackcdn.com/image/fetch/$s_!szK-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!szK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png" width="570" height="425.15294117647056" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:634,&quot;width&quot;:850,&quot;resizeWidth&quot;:570,&quot;bytes&quot;:51953,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!szK-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 424w, https://substackcdn.com/image/fetch/$s_!szK-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 848w, https://substackcdn.com/image/fetch/$s_!szK-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 1272w, https://substackcdn.com/image/fetch/$s_!szK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a4454bc-e4d7-4a9b-9272-dcf74ebf6636_850x634.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Nitroaromatic compounds. From <a href="https://www.researchgate.net/figure/Structure-of-common-nitroaromatic-compounds_fig1_335009441">here</a></figcaption></figure></div><p>Early forms of QSAR&#8217;s utilized hand-crafted features derived from a chemical structure, s<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6371683/">uch as atom count, chemical bonds, and so on</a>, as features to statistical models that learned their correlations to toxicity readouts (amongst other things). In time, the count of these chemical fingerprint features slowly grew, attempting to encompass every nuanced characteristic of a drug &#8212; eventually including measurements about how the chemical interacts with the world, such as their solubility in water or binding to certain enzymes. As with every other field, the explosion of deep learning led to a pivot &#8212; i<a href="https://www.bing.com/search?q=predictive+toxicity+foundation+model&amp;cvid=72321c7310b34b3ea6d5703579aab213&amp;gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQABhAMgYIAhAAGEAyBggDEAAYQNIBCDQ2MjJqMGo0qAIAsAIA&amp;FORM=ANAB01&amp;PC=NMTS">nstead of working with derived features understandable to a chemist, neural networks were instead given the raw molecule as input</a>, represented in either 2D or 3D space, building their own conception of what is/isn&#8217;t important for the problem of toxicity. </p><p>But still, little massive progress. <a href="https://www.science.org/doi/10.1126/sciadv.adk6669">A recent (March 2024) Science paper applied transformers to the problem, </a>walking away triumphant over more basic QSAR models, but no Alphafold-level jump in capabilities. </p><p>What&#8217;s missing?</p><h1>The hard stuff</h1><h2>The relevance of toxicity datasets to the clinical problem</h2><p>There&#8217;s a more fundamental problem here: the datasets we use to train predictive toxicology models are potentially too simplified for us to benefit from, even if models using them have perfect accuracy. </p><p>The <a href="https://paperswithcode.com/dataset/tox21-1">Tox21</a> and <a href="https://www.epa.gov/comptox-tools/exploring-toxcast-data">ToxCast</a> (both subsets from a larger dataset called <a href="https://moleculenet.org/">MoleculeNet</a>), are both very widely used datasets for predictive toxicology. They both contain dozens of different cellular assay readouts related to things like how drugs changed nuclear receptor activity, stress response pathways, and various cytotoxicity markers. </p><p>But the biological relevance of many of these individual in-vitro assays to true organism toxicity is on shaky ground. One could say that any toxicity seen in-vitro will likely be seen in-vivo as well, but it&#8217;s unclear how true this is either. Cell lines may have unrealistically sensitive reactions to certain compounds, compounds may be toxic in petri-dishes but lose a fair bit of bioavailability upon ingestion, and the concentrations of drugs delivered via the blood stream may be dramatically lower than the ones given to cell lines. In-vitro is always a good start, but in-vivo translation must occur at some point!</p><p>The<a href="https://paperswithcode.com/dataset/clintox"> ClinTox dataset in MoleculeNet</a> does attempt to touch on a more complex notion of toxicity via a label denoting whether an in-vivo clinical trial using a given drug found that it was toxic. But clinical toxicity here is boiled down to a 1/0, no notion of whether the drug displayed hepatotoxic, cardiotoxic, neurotoxic, or otherwise properties. Another similar dataset is <a href="https://academic.oup.com/nar/article/51/D1/D1432/6833256">TOXRIC, </a>which annotates a wide range of molecules with in-vivo, in-vitro, and qualitative toxicity measurements, specifying whether drugs display acute toxicity, carcinogenetic properties, respiratory toxicity, and 12 other categories. But, while this goes far to include in more dense label information for each molecule, the underlying physiological impact of the toxicity is still missed! </p><p>But why is the underlying &#8216;toxicity phenotype&#8217; important?</p><p>To answer this, I&#8217;d like to refer to the<a href="https://stanfordmlgroup.github.io/competitions/chexpert/"> Stanford-released CheXpert dataset</a>, a collection of 500,000~ chest x-ray&#8217;s with diagnostic annotations released back in 2019. It was the largest medical image dataset released at the time, but the clinical utility of any model built off it was questionable! There were a lot of issues with the dataset, one of the more interesting ones being that the human-performance accuracy rate was artificially low, <a href="https://medium.com/@BalintBotz/a-few-thoughts-about-chexnet-and-the-way-human-performance-should-and-should-not-be-measured-68031dca7bf">since the X-ray had been sufficiently down-sampled enough from its original resolution such that some conditions became nearly impossible to detec</a>t.</p><p>But the problem much more relevant to the toxicity discussion was the so-called hidden stratification problem; <strong><a href="https://laurenoakdenrayner.com/2019/10/14/improving-medical-ai-safety-by-addressing-hidden-stratification/">chest x-rays with a certain diagnosis label could be further subdivided into subtly different conditions with</a></strong><a href="https://laurenoakdenrayner.com/2019/10/14/improving-medical-ai-safety-by-addressing-hidden-stratification/"> </a><strong><a href="https://laurenoakdenrayner.com/2019/10/14/improving-medical-ai-safety-by-addressing-hidden-stratification/">significantly different clinical outcomes</a>. </strong>The last part is important, because otherwise the existence of a subclass underneath the labeled class isn&#8217;t actually useful for a model to be aware of. This exact situation may have a parallel in the toxicology dataset world; there is a whole world of hidden classes underneath the basic toxicity labels attached to each chemical and lacking it may lead you to the meaningfully wrong direction! Some forms of toxicity, despite being in the same &#8216;class&#8217; of toxic, may have significantly different underlying phenotypes! </p><p>For example, a drug that causes ocular toxicity via immune system overreaction is far easier to deal with than a drug that is just straight-up toxic to ocular cells &#8212; one requires simply immune suppressors to use it, the other requires rethinking the drug entirely. </p><p>One could imagine a world in which we have access to <em>so much</em> toxicity data that this problem ceases to matter &#8212; the model will figure it out. But, as it stands, ClinTox is composed of only 1478 molecules, Tox21 + ToxCast with 15,000~ molecule, and <a href="https://academic.oup.com/nar/article/51/D1/D1432/6833256">TOXRIC </a>with 100k+ molecules (in total, many of which lack all labels) &#8212; a sizable number, but a far cry from NLP-level token sizes. Perhaps pushing dataset sizes up even more alleviates this problem, but it feels more likely that alternate directions should be explored. </p><p>How could we fix this? <strong>Instead of relying on our own fuzzy definitions of toxicity, we could perhaps instead defer it to a model capable of understanding phenotypes of toxicity more nuanced than ours could ever be.</strong> Microscopy foundation models, like <a href="https://www.rxrx.ai/phenom#:~:text=We%20call%20this%20model%20Phenom-Beta.%20It%20flexibly%20processes,create%20a%20meaningful%20representation%20of%20the%20input%20image.">Phenom-Beta</a> by Recursion Pharmaceuticals, feels like a step in the right direction &#8212; perhaps the next generation of toxicology datasets are images of cell lines, or histology slides from a patient, subjected to a certain chemical, and such foundation models are used to understand them.<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9399120/"> After all, we do see morphological cell changes after application of toxic drugs!</a> Maybe there&#8217;s even a time element, a new image 2, 8, 24, and so on hours after the application of the drug. Of course, the bull case here is that Recursion <strong>hasn&#8217;t</strong> billed their platforms utility for toxicity prediction, so perhaps this isn&#8217;t the right direction&#8230;</p><h2>Methodological problems in toxicity datasets </h2><p>Outside of the current set of toxicity datasets not being entirely connected to the problem of clinical toxicity, the datasets themselves have quality issues! <a href="https://practicalcheminformatics.blogspot.com/2023/08/we-need-better-benchmarks-for-machine.html">This is a bit of a cop-out, but I&#8217;d honestly recommend reading Pat Walter&#8217;s post about this, it goes into much more detail than I ever could. </a>But here&#8217;s the general TLDR for the problems with the datasets that many predictive toxicology papers rely on: </p><ul><li><p>Invalid chemical structures that can't be parsed by common cheminformatics tools</p></li><li><p>Inconsistent stereochemistry and chemical representations</p></li><li><p>Combining data from different sources without standardization</p></li><li><p>Poorly defined training/test splits</p></li><li><p>Data curation errors like duplicate structures with conflicting labels</p></li><li><p>Assays with high rates of artifactual activities</p></li></ul><p>+ some other points also addressed in this post! Again, excellent read, highly recommend.</p><h2>Intraspecies toxicity variability </h2><p>While most drugs are designed to hit specific molecular targets, there's still a huge potential for person-to-person differences in how they're absorbed, distributed, metabolized and excreted (<a href="https://en.wikipedia.org/wiki/ADME">ADME properties</a>). This pharmacokinetic variability can lead to big differences in the actual tissue-level exposure to a drug for a given dose.</p><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7692531/">Genetic polymorphisms in drug metabolizing enzymes are the primary case of this phenomenon.</a> For example, Cytochrome P450 2D6 enzymes are responsible for the metabolism of a huge number of drugs. The enzyme is encoded for by the CYP2D6 gene; the variations of which can lead to immense differences in drug clearance and bioavailability. </p><p>For example, people with certain CYP2D6 polymorphisms are "poor metabolizers" of drugs like codeine and can end up with much higher exposure levels compared to the average person. There are also "ultra-rapid metabolizers", who clear drugs so quickly that they may not get a therapeutic effect at normal doses. And this doesn&#8217;t cleanly translate to <em>&#8220;poor metabolizers should receive lower dosages of drugs&#8221;</em> either, because the chemical in question matters! If the chemical is such that metabolization of it results in a weaker resulting chemical, the clinical impact of these polymorphisms will switch sides. </p><p>And the rate of CYP2D6 variation isn&#8217;t particularly low either; <a href="https://www.ncbi.nlm.nih.gov/books/NBK574601/">one study pegged the rate of ultra-rapid metabolizers at 1-11% and poor metabolizers at 1-5% of the population, depending on the race</a>. Finally, CYP2D6 isn&#8217;t even the only gene whose alleles can causes drug metabolism variation, there are way more &#8212; <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7692531/">generally also known as &#8216;pharmacogenes&#8217;. </a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gLPV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gLPV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gLPV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg" width="632" height="427.9685039370079" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:516,&quot;width&quot;:762,&quot;resizeWidth&quot;:632,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;An external file that holds a picture, illustration, etc.\nObject name is genes-11-01295-g001.jpg&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="An external file that holds a picture, illustration, etc.
Object name is genes-11-01295-g001.jpg" title="An external file that holds a picture, illustration, etc.
Object name is genes-11-01295-g001.jpg" srcset="https://substackcdn.com/image/fetch/$s_!gLPV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 424w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 848w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!gLPV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F651515c2-296a-4a9f-a775-ac64a8eb733b_762x516.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Known pharmacogenes. From <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7692531/">here</a></figcaption></figure></div><p>What does this mean for ML? The very existence of pharmacogenes mean that any molecular-toxicity dataset that lacks sequence readouts of known pharmacogenes (and there may be unknown ones!) from the individual the data is derived from is, ultimately, limited in how generalizable it can be when applied to drugs for different individuals. Again, perhaps this problem eventually fixes itself with enough chemical data, but the case here is fishier. <strong>Even an all-powerful toxicology foundation model would be unable to pick up the underlying rules behind why drug toxicity variation exists if provided only toxic/not-toxic labels, it would simply model drug toxicity as a fundamentally noisy phenomenon.</strong></p><p>How do we fix this? Full sequence readouts for every organism included in a toxicology dataset would obviously be prohibitively expensive. But there is a potential way out: real world evidence, or RWE. Those who have worked in RWE will understandably immediately recoil at this &#8212; it&#8217;s a field that is notorious for vastly overpromising and underdelivering, several blog posts could be written about how RWE datasets are rarely trustable + how companies leading the way in RWE have generally failed to capitalize on it. To be clear, I agree, but it&#8217;s still an interesting thought experiment!</p><p>RWE, often represented via insurance claims or electronic health records, was a <strong>big</strong> deal post-2015, or roughly when healthcare companies/national governments began to realize the potential value of the claims dataset they had. The core idea here was that, as a result of billing practices, we had accidentally created a low-fidelity dataset of an individual&#8217;s interaction with the healthcare system over their lifetime. We know their familial history, their chronic conditions, and so on, it&#8217;s all recorded <em>somewhere</em>. <strong>And perhaps, within it, is a similarly fuzzy representation of a patients set of pharmacogenes &#8212; indirectly represented within the joint distribution of the patients race, their conditions, their allergies, and everything else.</strong> If this sort of clinical data could be easily combined with toxicity datasets from phase 1/2/3 clinical trials, it may allow us to more deeply understand individual drug response heterogeneity, possibly helping us close this otherwise irreducible toxicology prediction error. </p><p>One last note: while pharmacogenes likely account for the majority of drug efficacy/toxicity, there is likely one more player: your microbiome. Very little has been published on the topic, <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6712421/">but there are documented cases of gut flora affecting how a drug is metabolized!</a> One major case is described here:</p><div class="pullquote"><p>The dramatic impact of microbial metabolism on the toxicity of metabolites derived from drugs was clearly manifested in the death of fifteen patients, who were orally administered with sorivudine (SRV, 1-b-d-arabinofuranosyl-(E)-5-(2-bromovinyl) uracil) within forty days. This effect was attributed to the enterobacteria-mediated SRV hydrolysis, thus leading to the formation of 5-(2-bromovinyl) uracil. This transformation is mainly carried out by E. coli and Bacteroides spp. (B. vulgatus, B. thetaiotaomicron, B. fragilis, B. uniformis and B. eggerthii) and increases toxicity of the anticancer chemotherapy with 5-fluorouracil pro-drugs.</p></div><h2>Toxicity synergism </h2><p>Our final challenging problem are drug-drug interactions, also known as DDI. Drugs, especially amongst its largest consumers, do not exist in a vacuum; <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7294476/">a fair bit of the US is on multiple drugs at the same time</a>. And these drugs <em>do</em> interact in the bloodstream, potentially causing fatal events. An example of this phenomenon is with warfarin and aspirin &#8212; both extremely common drugs! If they are taken together, they will compete for binding to blood plasma proteins; <a href="https://www.ahajournals.org/doi/full/10.1161/CIRCULATIONAHA.112.000491">the warfarin that cannot be bounded to plasma proteins will remain in the blood, eventually causing acute bleeding in patients</a>. </p><p>The rate of polypharmacy, which is taking five or more medications at a time, is between 10% and 50% depending on the age group.<strong> </strong>And to be clear, the warfarin-aspirin problem as described above isn&#8217;t exactly an edge case, one study found that amongst a patient population defined as having polypharmacy, <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8028718/">the rate of at least one severe adverse effects from DDI were as high at 77%.</a> </p><p><strong>The complexity of predicting toxicity in these cases (maybe!) ramps up dramatically; it is likely that a fair number of such patients will have a drug regimen that&#8217;s largely unique to them alone.</strong> And the impact of pharmacogenes still exist, potentially even amplifying!</p><p>The state of the art is a bit fuzzy here. <a href="https://www.nature.com/articles/s41598-024-54409-x#Tab3">There has been headway in predicting DDI&#8217;s</a>, but the datasets here are usually quite small in terms of number of molecules, on the order of a few hundred, often with many potential interactions missing <a href="https://onlinelibrary.wiley.com/doi/full/10.1002/qub2.32">(and subsequently being, maybe falsely, labeled as a negative example)</a>. And, given how common DDI&#8217;s are, it feels unlikely there is a current, good solution for it being done in drug-design beyond simple &#8216;does it interact with the same hypothesized mechanism&#8217;. It&#8217;s challenging to know the progress here; production-grade datasets here are, in my opinion, quite a long way off. This is true of many interaction-based problems in the life sciences and it&#8217;s especially true with toxicity-related datapoints. </p><p>It&#8217;s challenging to know how to fix this. But it may end up being a non-issue. Interactions between molecules in our body aren&#8217;t exactly orthogonal to the interactions between molecules and the body; everything is still atoms at the end of the day after all. Perhaps as we amass more singular molecular datapoints, we&#8217;ll accidentally get better at predicting DDI's. A similar phenomenon was seen with Alphafold2 in a mild sense; despite never having been trained on multimeric proteins, its monomer training regimen was enough such that it still performed well in the multimer case &#8212; though, of course, still worse than a version of Alphafold2 trained on multimers. </p><p><strong>But there&#8217;s an even more interesting possibility here: ultra-precise, high-throughput in-vivo screening</strong>. <a href="https://www.gordian.bio/science/mosaic-screening/">Gordian Biotechnologies Mosiac Screening</a> platform feels immensely interesting in this regard. Their platforms allow one to use barcoded viruses to deliver drugs to extremely specific cells in-vivo, allowing you to test an incredibly high number of drugs in-vivo at the same time. With the current aim of the platform, it seems like these deliveries are meant to be to separate cells, ensuring that each drug can be understood independently of others. But one could imagine the platform be repurposed; perhaps multiple drugs could be delivered to the same set of cells, with thousands of different combinations, allowing us to create a large and high-fidelity drug interaction dataset extremely quickly. This said, the platform doesn&#8217;t currently bill itself as being able to better understand DDI&#8217;s, but more focused on the target discovery problem by speeding up in-vivo testing.</p><h1>Conclusion</h1><p>I really did scratch the surface of toxicology here, there&#8217;s <em>so</em> much material here. I am once again astonished by the immense amount of work on drug design written by medicinal chemists and biologists, and how little we still understand everything. I want to emphasize that toxicity is a <em>really</em> big deal. Each drug failing a clinical trial account for billions of wasted dollars and many thousands of work hours lost, and that rate of failure due to toxicity is frightingly high. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6978558/">One study has this to say about it:</a></p><div class="pullquote"><p>Overall, approximately 89% of novel drugs fail human clinical trials, with approximately one-half of those failures due to unanticipated human toxicity</p></div><p>Even more concerningly, the danger of toxicity can remain danger even after approval, implying even a clinical trial isn&#8217;t the end-all-be-all for toxicity concerns. The same study continues: </p><div class="pullquote"><p>Of 578 discontinued and withdrawn drugs in Europe and the United States, almost one-half were withdrawn or discontinued in post-approval actions due toxicity. Van Meer et&nbsp;al. found that of 93 post-marketing serious adverse outcomes, only 19% were identified in preclinical animal studies. In the first decade of the 21st century, approximately one-third of FDA-approved drugs were subsequently cited for safety or toxicity issues. or a combination of both, including human cardiovascular toxicity and brain damage, after remaining on the market for a median of 4.2 years </p></div><p>Despite all the problems we discussed here, I still believe the future is bright! There are so many scale-related things going on in biology right now, and it does feel like we&#8217;re hitting the precipice of something really interesting here. </p><p>Finally, shout out to <a href="https://twitter.com/SimonDBarnett">Simon</a> for the discussion we had over this topic + introducing me to Pat Walter&#8217;s wonderful blog!</p><p></p>]]></content:encoded></item><item><title><![CDATA[A primer on the next generation of antibodies]]></title><description><![CDATA[5.7k words, 27 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-to-the-next-generation-of-antibodies</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-to-the-next-generation-of-antibodies</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Wed, 24 Apr 2024 00:14:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3wi3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3wi3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3wi3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3wi3!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:9164755,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/143615613?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3wi3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!3wi3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70fd0867-d97a-4969-afc4-5272bdf8e378_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.abhishaike.com/i/143615613/introduction">Introduction</a></p></li><li><p><a href="https://www.abhishaike.com/i/143615613/whats-wrong-with-antibodies">What&#8217;s wrong with antibodies?</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143615613/production-demands">Production</a></p></li><li><p><a href="https://www.abhishaike.com/i/143615613/storage">Storage</a> </p></li><li><p><a href="https://www.abhishaike.com/i/143615613/efficacy">Efficacy</a></p></li></ol></li><li><p><a href="http://What does a better antibody look like?">What does a better antibody look like?</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143615613/single-chain-variable-fragment-scfv">Single-chain variable fragment (scFv)</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143615613/advantages-scfv">Advantages</a> </p></li><li><p><a href="https://www.abhishaike.com/i/143615613/disadvantages-scfv">Disadvantages</a> </p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143615613/nanobody-vhh">Nanobody (VHH)</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143615613/advantages-vhh">Advantages</a> </p></li><li><p><a href="https://www.abhishaike.com/i/143615613/disadvantages-vhh">Disadvantages</a> </p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143615613/antibody-mimetics">Antibody mimetics </a></p><ol><li><p><a href="https://www.abhishaike.com/i/143615613/advantages-mimetics">Advantages</a> </p></li><li><p><a href="https://www.abhishaike.com/i/143615613/disadvantages-mimetics">Disadvantages </a></p></li></ol></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143615613/conclusion">Conclusion</a></p></li></ol><h1>Introduction</h1><p><a href="https://www.abhishaike.com/p/a-primer-on-ai-in-antibody-engineering">If you want a primer over antibodies, I recommend reading my last post! </a>This one will contain some jargon that the other post will explain.</p><p>It's important to remember that antibodies aren't inherently special, proteins are just strings of amino acids, and the shape of a protein is all that (mostly) matters. One can imagine a world in which we ditch full antibodies entirely and instead work on protein modalities that improves upon it; reducing their downsides and improving on what they are already good at.</p><p>The medical world focused on antibodies for an obvious reason: it clearly works well for the adaptive immune system of every single multicellular organism out there, which is a pretty strong endorsement of its clinical utility. But the pressures under which antibodies evolved are completely different to the pressures of our medical system, which is far less tolerant of extreme complexity, more interested in scalable production, and is equally interested in both the short-term + long-term quality-of-life of a patient. Moreover, our understanding of biology is rapidly advancing to the point where we can look beyond the tools that evolution has provided. </p><p>But over the next decade, where will we expand? In this post we&#8217;ll go over what is wrong with full-length antibodies and three potential alternatives to them:</p><ul><li><p><strong>scFv&#8217;s.</strong> An older entry in the antibody engineering field, with 9 drugs released under this class of antibody, but still relatively new in terms of the antibody world. </p></li><li><p><strong>Nanobodies.</strong> The most exciting current development in the antibody field, with only one released drug in this category and many more potential ones. </p></li><li><p><strong>Antibody mimetics.</strong> Where I believe the future is heading. </p></li></ul><p>One quick note before we move on. People more familiar with antibodies may wonder why I&#8217;m not discussing Fab&#8217;s, or chimeric antibodies, or bispecifics, or trispecific antibodies, or any one of the many other varieties of antibodies out there outside of the above three. This is because the above scFv&#8217;s, nanobodies, and antibody mimetics are very much in a clinical gray area; very studied from an academic perspective, but the medical impact is still badly understood. All others largely fall into the bucket of so-old-that-they-aren&#8217;t-really-next-generation or so-new-that-it&#8217;s-challenging-to-assess-how-valuable-they-will-be.</p><h1>What&#8217;s wrong with antibodies?</h1><p>Motivating the question here fully is important: why fix something that isn&#8217;t broken? Well, there <em>are</em> a few things that are broken about antibodies. </p><h2>Production demands</h2><p>Antibodies have an extraordinarily difficult production process. Here&#8217;s a breakdown (warning, long). </p><p>You first need to find the genetic sequence that makes each of an antibody chain (heavy and light, so 2 unique sequences). You then take these sequences and insert it into an expression vector, which is a circular piece of DNA. The expression vector is then washed over a cloned mammalian cell line, most commonly Chinese Hamster Ovary (CHO)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> cells. The vector is able to enter the CHO&#8217;s and directly integrate into their genome. But, while these cells are often happy to produce the protein of any gene that wanders into its genome, they may still vary in <em>ability </em>to produce that protein. This can be for a lot of reasons. Maybe the vector ended up in a section of the genome that is rarely read from, so nearly zero antibody will be produced. Maybe some CHO&#8217;s randomly mutate, a phenomenon called <strong>clonal variation</strong>, which reduces ability to produce antibodies. </p><p>In any case, you&#8217;ll need a way to select high-performing cells. There&#8217;s isn&#8217;t really high-throughput way to do this, you literally just grow the CHO&#8217;s in small cultures and check the antibody levels in each cluster every now and then using techniques like <a href="https://en.wikipedia.org/wiki/ELISA">ELISA</a>. Let&#8217;s say you stumble across cell that produce antibody well AND is consistent, implying its output will be predictable over the long term. Now you just need to clone these high-producing cells, which can take weeks of careful preparation, and now you&#8217;re finally ready to scale up!</p><p>At this point, you can take your vast supply of high-producing CHO&#8217;s and dunk them in a <a href="https://en.wikipedia.org/wiki/Bioreactor">bioreactor</a>, a massive steel tank with controlled oxygen, temperature, and pH levels, and let the CHO&#8217;s produce antibodies into the growth medium it&#8217;s immersed in. But, because mammalian cells are immensely fragile, antibody growth may stall at any time. Maybe the pH of the bioreactor is off, or microbial/toxic contamination occurs, or maybe the CHO&#8217;s simply mutate, or some other unknown other reason! If there are any deviations, adjustments need to be made to the bioreactor conditions to get the cells back on track. Mammalian cells are notoriously fickle with their preferences and die easily, so this process may take awhile. If everything goes right, the surrounding growth medium of the antibodies will slowly become heavily enriched in free-floating antibodies, eventually reaching a desirable concentration. Now it&#8217;s time to harvest! </p><p>This medium, while enriched in antibodies, also contains a complex mixture of other proteins, nutrients, and CHO-produced debris that needs to be removed. This is where the purification process comes in, usually relying on a technique called &#8216;<a href="https://en.wikipedia.org/wiki/Affinity_chromatography">affinity chromatography</a>&#8217;, which allows us to isolate the antibody via finding something that binds to it.<a href="https://www.separations.eu.tosohbioscience.com/products/process-media/protein-a-affinity#:~:text=Protein%20A%20chromatography%20is%20a,to%20achieve%20adequate%20product%20purity."> In practice, a protein called &#8216;Protein A&#8217; is used for this, which binds to the constant heavy-chain Fc region of antibodies.</a> This is usually insufficient for meeting the FDA&#8217;s standard for antibody purity, which is 95%, and subsequent purification steps are required, such as ion exchange chromatography (IEX) or hydrophobic interaction chromatography (HIC). But let&#8217;s assume we only need the first step and move on with our liquid filled with pure antibodies. </p><p>We&#8217;re nearly done. Now, we just need to filter the purified antibody solution to remove any remaining particulate matter or aggregates. This is typically done using a series of filters with decreasing pore sizes, down to 0.2 microns, which is small enough to remove most bacteria and other small contaminants. The filtered antibody solution is then concentrated to the desired level, usually using <a href="https://en.wikipedia.org/wiki/Diafiltration">diafiltration</a> to increase the antibody concentration even further. We mix the resulting hyper-concentrated antibody collection in with a buffer to control pH, salts to control tonicity, and stabilizers like sugars or surfactants to prevent the antibody from degrading or aggregating during storage. The sterile antibody solution is then filled into the final containers, which are often vials or syringes. </p><p>Throughout the entire manufacturing process, from the initial cell culture to the final packaging, strict quality control measures are in place. Samples are taken at various stages and tested for purity, potency, identity, and safety. Any deviations from the specified parameters can result in the rejection of the entire batch, which means you have to start over from scratch. Moreover, each step of the process post-bioreactor has inefficiencies; chromatography and screen filtering and diafiltration all slightly reduce the yield of the final product.</p><p>Given all this, it&#8217;s no wonder antibodies are extraordinarily expensive drugs; even the generic version of the widely-used antibody drug Humira <a href="https://health.wusf.usf.edu/health-news-florida/2023-09-19/save-billions-or-stick-with-humira-drug-brokers-steer-americans-to-the-costly-choice">can still cost $1k~ a month</a> at the lowest end. To compare this to typical small-molecule drugs, generic versions of Keppra, an anti-epilepsy drug, <a href="https://www.healthline.com/health/cost-epilepsy-medications#prices">can cost less than 10 dollars per month.</a> Antibody production is uniquely challenging and costly in a way that very little else in drug manufacturing is. </p><p>Let&#8217;s ask some questions. </p><p><em>Why can&#8217;t we simply synthesize antibodies, much like how we synthesize typical drugs, and avoid this whole bioreactor thing?</em> Antibodies are among the largest and most complex molecules used as therapeutics. They are composed of four proteins chains linked together by disulfide bonds, each chain is intricately folded into specific domains, which together form the characteristic Y-shape of an antibody. And synthesizing such a large, precisely folded protein from scratch is simply beyond our current capabilities. Modern chemical protein synthesis is typically limited to peptides of less than 100 amino acids, while each antibody chain is 200-500 amino acids long. Even if we could synthesize the individual chains, getting them to assemble and fold correctly into a functional antibody would be nearly impossible. Cells, on the other hand, have evolved to do exactly this.</p><p><em>If mammalian cells are so fragile and finicky, why can&#8217;t we find a better cell line to produce antibodies?</em> There is in fact a cell line that is much simpler to work with: yeast. <a href="https://www.sciencedirect.com/science/article/abs/pii/S1389172315001346">Yeast is challenging to kill, replicates easily, grows fast, and is amenable to genetic manipulation.</a> So why don&#8217;t we use it? <a href="https://pubmed.ncbi.nlm.nih.gov/25912450/">There&#8217;s a really wonderful review paper that discusses all this.</a> In short, antibodies require specific post-translational modifications, particularly glycosylation at a specific residue (residue 297 of each heavy chain), to function ideally in the human body. While yeast cells do have their own glycosylation machinery, it differs substantially from that of mammalian cells. Yeast tends to add high-mannose type glycans (as in, sugar molecules that contain a lot of <a href="https://en.wikipedia.org/wiki/Mannose">mannose</a>) to its produced proteins, which are not typically found on human proteins and can potentially make the antibody more immunogenic, which is obviously undesirable. Looking beyond yeast has similar issues, many types of bacteria lack any glycosylation system at all or struggle with the size of the antibody. </p><p>This all said, there is progress here, research is ongoing in &#8216;<a href="https://pubmed.ncbi.nlm.nih.gov/24632452/">glycoengineering</a>&#8217; yeast to produce antibodies with human-like glycosylation, engineering aglycosylated antibody variants that work well, and even trying to add in glycosylation systems into bacteria. But mammalian cells are still very much considered the gold standard (for now). </p><h2>Storage</h2><p>Let&#8217;s say we have our set of purified and packaged antibodies ready to go in a few thousand vials. Now we&#8217;d like to ship these life-saving drugs to clinics around the globe. What other problem do we have to contend with? </p><p>Stability is the biggest one. Proteins in <em>general</em> are inherently unstable. These are long chains of amino acids that are folded into complex three-dimensional structures, and these structures are (usually) extremely difficult to maintain outside of their native environments. The forces that hold proteins together - hydrogen bonds, van der Waals interactions, hydrophobic interactions - are relatively weak. The immense size of an antibody only adds to this fragility, a larger size means more exposure to the environment, more failure links amongst the residues, a more complex structure to maintain. Small molecules, in contrast, are nothing like this. They rely on covalent bonds to stick together (which are much stronger than the forces antibodies use), don&#8217;t rely on any semblance of folded structure to function (so there isn&#8217;t any parallel to misfolding), and are several orders of magnitude smaller (and, in the chemistry world, smaller usually means more stable) </p><p><em>Well, okay, why can&#8217;t we just freeze it? </em>Freezing is where most people&#8217;s mind would go to when the question of stability comes up, and it&#8217;s a good idea, cold temperatures reduce atom vibration + reduce reaction rates and thus increase protein stability. It works well for antibodies but having to store your drugs in refrigeration at 2-8&#176;C (with cryoprotectants to prevent ice crystal formation of course!) does increase the cost of your antibody drug. Moreover, freezing does not completely solve the stability problem! Even at low temperatures, antibodies will still continue to undergo the usual protein degradation process (e.g. oxidation). The rate of these reactions is slowed down by cold, but not stopped entirely. This means that antibodies have a limited shelf life even under refrigeration, typically around &lt;1 year. In contrast, many small molecule drugs can remain stable for much longer periods, often several years, even at room temperature.</p><p>The final issue here is <strong>aggregation</strong>. Antibodies, particularly when exposed to stresses like temperature fluctuations or agitation, can clump together to form large, inactive aggregates. This process is often irreversible, meaning that once an antibody has aggregated, it cannot be returned to its original, active form. Aggregation can occur at basically any point during the production and storage of antibodies, and it's a massive cause of loss of product and reduced shelf life. Why does it aggregate? Once again, the size of the protein can be indirectly implicated as a problem; there&#8217;s just so many fragile forces going on inside an antibody. It&#8217;s a wonder we can transport these things <em>at all. </em><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6698864/">Here&#8217;s one papers explanation of antibody aggregation: </a></p><blockquote><p><br><em>&#8230;One bottleneck limiting mAbs therapeutics&#8217; development is aggregation [12,13]. mAbs with 12 sub-domains, large hydrodynamic radii and surface areas, non-symmetrical hydrophobicity and charge distributions are prone to aggregation [14,15]. The immunoglobulin Greek-key &#946; sandwich folding of mAbs is susceptible to edge-edge association [16]. Besides, complementarity determining regions (CDRs) of mAb responsible for antigen binding can also contribute to aggregation due to the frequent occurrences of hydrophobic and electrostatic residues [17,18]. Furthermore, the extensive hydrophobic patches on the surfaces of mAbs, especially on Fc could mediate aggregation [19,20]. These aggregation propensities are amplified by the natural bivalency of mAb. Importantly, the aggregation of mAb could be increased when administered by subcutaneous (SC) delivery in a high mAb concentration of &gt;100 mg/mL [21]. At such high concentrations, mAbs are more susceptible to aggregation&#8230;</em></p></blockquote><p>How is it possible that handling antibodies is so punishing when we&#8217;re filled with these things? It&#8217;s important to remember that these problems are much less of a concern in natural antibodies floating around in your bloodstream because the concentration is far lower than antibody therapeutics, <a href="https://www.sciencedirect.com/science/article/abs/pii/B9780123864833000045">which can be 100x more concentrated in terms of milligrams of antibodies per milliliter</a> than in in-vivo. </p><h2>Efficacy</h2><p>Let&#8217;s say you&#8217;ve produced your antibodies, stored them, and have safely delivered them to the clinic that desperately needs them. Everything is fine now, right?</p><p>There&#8217;s one last, small thing. As mentioned, antibodies are reasonably large structures with a large molar mass. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3332369/">The large molar mass means that they are often incapable of diffusing throughout dense tissues</a>, such as solid tumors, and their size means they cannot easily access tissues that have restricted entryways, such as the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8235515/">central nervous system</a>. So, there are some conditions for which antibody therapy is simply not useful.</p><p><strong>But overall, this is the smallest issue that antibodies face. When antibody therapy works, it </strong><em><strong>works.</strong> </em></p><h1>What does a better antibody look like?</h1><p>Any alternative to antibodies must be able to tackle the challenges laid out in the prior sections. In short, it must display the following characteristics:</p><ol><li><p><strong>Be easier to manufacture than antibodies</strong></p></li><li><p><strong>Be easier to store than antibodies</strong></p></li><li><p><strong>Be more efficacious than antibodies</strong></p></li></ol><p>Let&#8217;s go through all three of the major alternatives to antibodies and assess how well they tackle these items. </p><h2><strong>Single-chain variable fragment (scFv)</strong></h2><p>One approach to improving upon antibodies is cut out the Fc region <em>and</em> the constant section of the Fab region, since, really, the variable regions are the ones doing most of the antigen binding. Doing this would result in a &#8216;single-chain variable fragments&#8217;, or scFVs, which consist of only the variable regions of the heavy (VH) and light (VL), connected with a short linker peptide of a few amino acids long. This forms a structure that is 1/6 the size of a full antibody while (mostly) preserving the antigen-binding affinity of the parent antibody, as we retain all six of the CDR loops of the variable region.  </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gOp5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gOp5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 424w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 848w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 1272w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gOp5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png" width="598" height="401.67857142857144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:489,&quot;width&quot;:728,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;scFv Fragment Antibody - Creative Biolabs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="scFv Fragment Antibody - Creative Biolabs" title="scFv Fragment Antibody - Creative Biolabs" srcset="https://substackcdn.com/image/fetch/$s_!gOp5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 424w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 848w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 1272w, https://substackcdn.com/image/fetch/$s_!gOp5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4d26437-9591-4de7-83d7-0dbdd63b8332_728x489.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From <a href="https://www.creativebiolabs.net/scfv-fragment-antibodies_25.htm">here</a></figcaption></figure></div><h3>Advantages (scFv)</h3><p><strong>The primary advantage of scFv&#8217;s here are claimed to be on two fronts: efficacy and ease of creation.</strong> </p><p>Because of its far smaller size (and molar mass) compared to antibodies, scFv&#8217;s can penetrate through solid tissues far more easily. <a href="https://aacrjournals.org/cancerres/article/52/12/3402/497837/Rapid-Tumor-Penetration-of-a-Single-Chain-Fv-and">In an interesting study comparing typical IgG antibodies to scFv&#8217;s</a> (which they call sFv) ability to rapidly penetrate solid tumors, scFv&#8217;s clearly came out on top. In their words:</p><blockquote><p><em>These studies revealed that most of the intact IgG delivered to the tumor was concentrated in the region of or immediately adjacent to vessels, while the sFv was more evenly distributed throughout the tumor mass&#8230;.The sFv demonstrated maximum tumor penetration at 0.5 h postinjection, while the intact IgG reached an equivalent degree of penetration at 48 to 96 h postinjection.</em></p></blockquote><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455005/#:~:text=In%20addition%2C%20scFvs%20cause%20few%20or%20no%20hypersensitivity,since%20their%20average%20life%20span%20is%200.5%E2%80%932%20h.">Prior studies have also shown that scFv are much more rapidly cleared by the body</a> compared to typical antibodies, potentially massively reducing the side effects of any scFv drug. Moreover, lacking the Fc region means that scFv drugs get to avoid &#8216;<a href="https://pubmed.ncbi.nlm.nih.gov/35902887/">antibody-dependent cellular cytotoxicity</a>&#8217;, potentially also massively reducing the cytotoxic effects that usual antibodies can sometimes have. <a href="https://www.nature.com/articles/s41598-020-80746-8">This is all while retaining the usual binding capacity of typical IgG&#8217;s to its designed-against antigen</a>. </p><p><strong>But, I&#8217;m going to be honest, the efficacy claim-to-fame of scFv&#8217;s is a little bit suspect.</strong> Most papers trot out the same line of scFv having much better pharmokinetic profiles compared to IgG antibodies, but&#8230;I&#8217;m finding basically zero control studies on the subject. There are lots of scFv only papers studying scFv phenomena, but nobody ever pairs it up with an IgG antibody to study the exact differences. The above quote is the only one I could find and even that isn&#8217;t necessarily about efficacy, just a proxy of it! It&#8217;s genuinely strange, I have to imagine that these comparison studies exist for clinical trial purposes, but basically all scFv papers I&#8217;m finding are scFv only, like <a href="https://www.nature.com/articles/s41423-021-00691-y">this</a>. Please let me know if I&#8217;m missing something significant here! </p><p>Here&#8217;s something potentially interesting though: there is a singular released scFv drug, brolucizumab. Its primary competitor is aflibercept, both are intended for treatment for age-related macular degeneration. Aflibercept is technically a fusion protein, composed of two binding regions of a proteins fused with&#8230;drum roll...the Fc portion of the human IgG1 immunoglobulin. So, not exactly an antibody, but not the worst comparison in the world to see how well an scFv compares. <a href="https://pubmed.ncbi.nlm.nih.gov/35902887/">And here are the phase 3 results</a> for brolucizumab versus aflibercept. While it does prove &#8216;non-inferiority&#8217; for brolucizumab, it&#8217;d be tough to say that it goes far part aflibercept results.</p><p>At the absolute most, I currently think that scFv&#8217;s are primarily useful in getting into parts of the body that larger (full antibodies) cannot, like the central nervous system.</p><p><strong>The much, much larger advantage with scFv&#8217;s is in production; they do not need mammalian cells, they can be produced by bacterial colonies.</strong> </p><p>This is an absolute gamechanger. <a href="https://link.springer.com/article/10.1007/s00253-019-10145-1">scFv&#8217;s don&#8217;t require complex glycosylation due to lacking an Fc region, are small enough such that small microbes can pump it out of themselves, and have a simple enough fold that production levels can remain high even with the simpler cellular machinery in non-mammalian cells.</a> Because of this, most people use E. coli for scFv production. Being able to use something like E. coli to produce one&#8217;s drugs is an immense boon. <strong>E. coli is extremely cheap to grow, is not finicky to its environment, has the same level of genetic malleability at CHO&#8217;s (and maybe even more!), and is capable at growing at extreme scales (in 1000+ liter tanks) easily.</strong> One still must go through the same process of transfecting E. coli cells, selecting high-producers, cloning them, incubating them in a bioreactor, and purifying bioreactor medium to grab out the scFv&#8217;s. <strong>The primary changes are that your producers are much easier to keep alive and much faster to replicate, which dramatically speeds up the main bottlenecks in the drug creation process.</strong> Of course, there are some mild downsides; E. coli is still, strictly speaking, a worse medium for creating any large-ish complex protein than mammalian cells due to both its size and genetic simplicity. <a href="https://academic.oup.com/jb/article-abstract/166/6/455/5564198">As such, e. coli may struggle with correctly folding producing the heavy and light chains, necessitating an expensive chemical process to encourage refolding.</a> Overall though, it seems like the upsides heavily outweigh the downsides. </p><h3>Disadvantages (scFv)</h3><p>On the side of storage, scFv&#8217;s don&#8217;t do well. <strong>scFv&#8217;s are much more prone to denaturation from thermal stress</strong>. This implies that aggregation is a bigger deal here as well. Why is this the case? It&#8217;s&#8230;hard to tell, most papers that discuss the stability downsides of scFv&#8217;s go into relatively little detail, but it has something to do with the constant regions of the antibody being extremely important for chemical stability of the whole antibody. Without it, things go a bit downhill. <a href="https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2020.01927/full">From a review paper:</a></p><blockquote><p>Further elimination of CH1-CL pair in Fab, resulting in fragment variable (Fv), significantly discounts thermodynamic stability (Webber et al., 1995; Jager and Pluckthun, 1999b). This is presumably due to the unnatural exposure of the lower VL and VH regions, flanking CH1 and CL, where hydrophobic interaction used to contribute to the stability as a whole as well as the absence of the contribution of CH1, which controls the assembly of heavy and light chains of the whole IgG molecule (Feige et al., 2009),</p></blockquote><p>There&#8217;s also a few efficacy issues as well, but it&#8217;s less clear how big of a problem they are. One of them is the downside of a scFv advantage we mentioned earlier: fast clearance from the blood stream. <strong>While quicker clearance from bloodstream due to its size can be a good thing in terms of side effects, <a href="https://pubmed.ncbi.nlm.nih.gov/20093855/">it also means that scFv therapeutics potentially lack enough time to have any therapeutic effect.</a></strong> But we also saw earlier that scFv can, at least in the case of cancer, still exert a therapeutic effect despite the fast clearance. Again, I&#8217;m finding relatively little information on how much fast clearance really changes therapeutic effects.</p><p><strong>In any case, it doesn&#8217;t matter </strong><em><strong>too</strong></em><strong> much.</strong> Both of these issues largely have fixes for them if needed, such as <a href="https://pubmed.ncbi.nlm.nih.gov/35941164/">modifying the peptide linker connecting the heavy chain to the light chain to increase thermostabilit</a>y (they modify the peptide linker to be more hydrophilic, maybe allowing the hydrophobic bottom of the variable heavy chains to add stability) and <a href="https://pubmed.ncbi.nlm.nih.gov/19062633/">performing PEGylation on the scFv to increase half-life in the body</a> (which is just attaching a polyethylene glycol molecule to the scFv, a common method in drug development to increase circulation time in the body). The only issue is that any method to fix the downsides of scFv will also wind increasing the cost of it!</p><h2>Nanobody (VHH)</h2><p>One thing we&#8217;ve noticed from antibody engineering is that the bulk of antigen binding actually stems from the heavy-chain CDR regions. Given that, we could also attempt to remove the variable light chain from the scFv, creating what&#8217;s also known as '<a href="https://en.wikipedia.org/wiki/Single-domain_antibody">single-domain antibody</a>' or V<sub>H</sub>H fragments; which are typically composed of only 110~ amino acids. There is actually a nanobody drug on the market, <a href="https://en.wikipedia.org/wiki/Caplacizumab">Caplacizumab</a>, first approved by the FDA in 2019. To offer a comparison, the first FDA-approved antibody drug was released in 1986.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1PPN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1PPN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 424w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 848w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 1272w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1PPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:48,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!1PPN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 424w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 848w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 1272w, https://substackcdn.com/image/fetch/$s_!1PPN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe671cc8b-5f92-4a04-a039-01cd2740eb9d_641x534.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://www.biochempeg.com/article/375.html">here</a>. 'Heavy-chain-only antibodies', or HCAb's, are found in nature in camelids and were the inspiration behind VHH's (nanobodies)</figcaption></figure></div><h3>Advantages (VHH)</h3><p>Basically all the same as the advantages from scFv&#8217;s, but there&#8217;s a surprising amount more we gain we get from shearing off the light-chain!</p><p>Let&#8217;s start with the least impacted: efficacy. We notice similar advantages as scFv, faster tissue diffusion, faster blood clearance, and ability to cross/touch antigens that typical large antibodies cannot, such as hidden epitopes in viruses or antigens within the blood brain barrier. <strong>But, again, it&#8217;s hard to tell the true clinical impact here, most studies assess these characteristics independent of actual clinical benefit.</strong> </p><p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10057852/">There is a much larger impact on the production end. </a>Again, all the main advantages from scFv&#8217;s carry over: there&#8217;s no need to use mammalian cells. But there&#8217;s more: not having to deal with an extra chain means e. coli becomes a degree more efficient in producing the drug, the higher stability of the protein (discussed later) means misfolding/aggregation cases are rarer, and the smaller size of the nanobody also means microbial colonies can more efficiently excrete it out. </p><p>But the most impacted by far is stability. Nab&#8217;s are <strong>extraordinarily</strong> tough, exhibiting vastly improved thermostability, pH variability resistance, reversible misfolding, and  lower aggregation compared to even typical antibodies. <a href="https://biomarkerres.biomedcentral.com/articles/10.1186/s40364-021-00332-6">One review article said the following:</a></p><blockquote><p>Nbs are more resistant to chemical denaturants and protease enzymes [40] and have higher stability under harsh PH or ionic strength [41]. This higher conformational stability also stems from the presence of an extra disulfide bond, which lowers the probability of heat-induced aggregation and limits VHHs flexibility [42,43,44,45,46,47]. Because of higher stability, they show high refolding efficiency, which means raising or lowering the sample temperature does not affect Nb conformation, i.e., it de-binds and binds to the target, respectively, without any aggregation or denaturation [48]. This rigidity in structure is a favorite property in the clinic since non-native protein aggregation is a common downside of antibody treatment, raising the immune response in severe cases [49, 50]. </p></blockquote><p>Nanobodies are <strong>so</strong> strong that they don&#8217;t even need refrigeration! This was the subject of extreme interest during the peaks of the COVID epidemic, <a href="https://www.biorxiv.org/content/10.1101/2020.08.08.238469v1.full.pdf">with one study in particular finding that their isolated nanobodies could be be freeze-dried and aerosolized with zero loss in potency</a>. One could imagine nanobodies being used for all sorts of diseases in a much cheaper manner because of this, perhaps being given even in inhalers. Surprisingly though, looking this up yields relatively relatively little beyond more SARS-CoV-2 stuff, and basically all work into this stops from 2022 onwards. Hard to find a reason for this, potentially there is some hidden flaw in nanobodies here that I&#8217;m missing&#8230;</p><p>One quick question before we move on: h<em>ow exactly does going from IgG to a single Fab chain (scFv) reduce stability, but cutting off the constant parts of that Fab chain increase it?</em> <a href="https://www.nature.com/articles/s41598-018-26338-z">There are a wide range of structural reasons why</a>. Some extra bonds are created as a result of dropping the light chain, some loops are extended in a more stable way, some hydrophobic residues are better able to be packed away, etc. There&#8217;s no singular thing that&#8217;s driving the massive stability, just a bunch of small things adding up.  </p><h3>Disadvantages (VHH)</h3><p>The only real disadvantage of nanobodies is on the efficacy front; the faster bodily clearance can be issue for some diseases.<a href="https://www.science.org/content/blog-post/nanobodies-get-their-due"> Derek Lowe has a nice essay about the single nanobody drug released, caplacizumab, where he writes&#8230;</a></p><blockquote><p>&#8230;but realizing the potential of nanobodies was, as they say, nontrivial. They tend to have shorter half-lives than their full-sized cousins (some of which are spectacularly long-lived after dosing), and their smaller size has an inevitable trade-off in potency. In a head-to-head competition against a monoclonal, they're probably going to lose, unless you've got some specialized edge working for you.</p></blockquote><p>That&#8217;s really it! One might ask then, what&#8217;s stopping people from adopting nanobodies given the extreme advantages and relatively minor downsides? One reason is that this clearance rate problem is so high (without any obvious solutions) that it simply isn&#8217;t worth it compared to most existing typical antibodies. But, as another hypothesis, it may also lie in the fact that nanobodies are simply high risk; antibodies are already an expensive therapy to develop, so the friction to shift may simply be insurmountably high.</p><h2>Antibody mimetics </h2><p>There is something beyond anything that even slightly resembles antibodies: antibody mimetics, which refer to any protein that can bind to antigens, but lack any structural similarity to antibodies. <a href="https://pubmed.ncbi.nlm.nih.gov/38059436/">Antibody mimetics represent the 'fourth generation' of antibody engineering</a>, following polyclonal antibodies, monoclonal antibodies, and antibody fragments (such as scFv&#8217;s and nanobodies). </p><p>This topic will be a little stranger than the prior two, because antibody mimetics come in all sorts of categories; there are affilins, affimers, DARPins, monobodies (still unrelated to antibodies!), nanoCLAMPs, optimers, and many more. Each of them are all built off known protein scaffolds or motifs, such as DARPins coming from a 33-residue motif called &#8216;Ankyrin repeats&#8217;, and undergo typical antibody engineering processes to optimize their binding to desired antigens. Moreover, there isn&#8217;t even clear agreement on what is an antibody mimetic, some studies claim some drugs as mimetics, others place them in different categories. As such, we&#8217;ll discuss them from the perspective of &#8216;what are the general trends amongst antibody mimetics&#8217; and not offer too much specificity. </p><p>There is only one released antibody mimetic drug: Ecallantide, which uses a <a href="https://en.wikipedia.org/wiki/Kunitz_domain">Kunitz domain</a> as its scaffold, and was FDA-approved in 2012. One may notice that this is far before the first nanobody drug released in 2018, so how could antibody mimetics be considered the &#8216;next step&#8217;? This is a subjective decision on my end; it feels very much like the full scope of potential that mimetics have has not at all been sufficiently mapped out, whereas it has been a fair bit for nanobodies. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r1YM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r1YM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 424w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 848w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 1272w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r1YM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png" width="554" height="351.2725274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:577,&quot;width&quot;:910,&quot;resizeWidth&quot;:554,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Bispecific Antibody Mimetics Development - Creative Biolabs&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bispecific Antibody Mimetics Development - Creative Biolabs" title="Bispecific Antibody Mimetics Development - Creative Biolabs" srcset="https://substackcdn.com/image/fetch/$s_!r1YM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 424w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 848w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 1272w, https://substackcdn.com/image/fetch/$s_!r1YM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3b7e587-2d90-4aa5-8ecd-559776da2a44_910x577.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The structural diversity that mimetics can have! From <a href="https://www.creative-biolabs.com/bsab/bispecific-antibody-mimetics-development.htm">here</a></figcaption></figure></div><h3>Advantages (mimetics)</h3><p><a href="https://pubmed.ncbi.nlm.nih.gov/25264572/">We see, generally, the same advantages as we do for scFv and nanobodies</a>.</p><p>The storage angle is great: most antibody mimetics are based on highly stable protein scaffolds, making them more resistant to harsh conditions such as extreme pH, temperature, and presence of proteases. The efficacy angle is present as well, given that antibody mimetics can be even smaller (up to half the size) than nanobodies, they potentially allow better tissue penetration, allow access to cryptic epitopes and improve delivery to target sites, all while retaining antigen-binding efficacy. But, as previously seen, this is something that is more-so claimed than really heavily tested. Finally, the production angle is great as well; since antibody mimetics are small, single-chain, and often based on natural proteins, they can be easily produced in bacterial or yeast expression systems. </p><p>But why switch over to mimetics instead of the real deal? </p><p>Another benefit of mimetics, which scFv&#8217;s and nanobodies lack, is their flexibility (in terms of functionality, not structure!). <a href="https://www.sciencedirect.com/science/article/pii/S2213048922000474#sec0020">Antibody mimetics have a staggering array of shapes and can be purpose built to do almost almost anything.</a> Using scaffolds with a low residue count may allow you to perform <a href="https://foundry.lbl.gov/2019/12/17/opening-a-new-chapter-in-antibody-mimetics/">chemical synthesizing</a> of the mimetic, cutting costs by another order of magnitude. The stability of mimetics may allow you to push things even further than nanobody-level stability, <a href="https://www.sciencedirect.com/science/article/pii/S2213048922000474#sec0020">potentially allowing for orally-dosed antibodies</a>, able to brave stomach acid pH conditions. The options here are vast, and it&#8217;s really only recently that they have started to be explored. </p><h3>Disadvantages (mimetics)</h3><p>Again, the same clearance issues as we see in any protein that&#8217;s small. </p><p>But there&#8217;s another, more subtle one. Antibodies, even the fragments, are a well-trodden therapeutic target. Decades of research have gone into understanding the structure, function, and clinical applications of antibodies.</p><p>In contrast, antibody mimetics are a relatively new class of therapeutics, and each mimetic platform (e.g., DARPins, Affibodies, Anticalins) has its own unique characteristics and challenges. While some platforms have made significant strides, it largely pales in comparison to how much we understand antibodies. Everything from discovery, optimization, manufacturing, regulatory approval, and clinical adoption may end up being a mess, the whole therapeutic modality is still a story that has yet to go beyond the first few chapters. This may mean, despite the potential they have, mimetics will be more costly, harder to produce, and harder to understand for a <strong>long</strong> time to come. </p><h2>Conclusion</h2><p>Unlike my other posts, the &#8216;story&#8217; here is hard to unravel, lots of things are still opaque. If scFv&#8217;s are so hard to keep stable, how have they racked up 9 drugs based on them? Nanobodies seem incredible on the surface, but why have they stalled in clinical development for decades; were the clearance problems and risk issues really that big of a deal? If mimetics are really the next generation of therapeutic, why does it feel like no one is strongly focusing on them? This feels like the nature of discussing any topic touching on clinical science; negative results are hidden or never written about, the complexity of drug development become even more apparent, and unknown unknowns ramp up.</p><p>Here&#8217;s one version of the story: there is no superior modality here. For all the disadvantages IgG antibodies have, they have by far positively affected millions of more lives than any other modality on this list. It may very well be the case that Fc mediation is important for many therapeutics, and if you lack this immune-system crosstalk, the utility of most antibodies disappear. And if we solve the biggest bottleneck of antibodies, its dependence on mammalian cells, maybe this so called &#8216;next generation of antibodies&#8217; will never truly come to pass. They may be used for certain, hyper-specific diseases, but there will never be a Humira-esque blockbuster of a drug amongst antibody alternatives.</p><p>But there&#8217;s another story we could tell, one that is on shakier ground, but a lot more interesting to think about. <strong>Potentially, antibodies, as a functional category of therapeutic, may very well be on their way out</strong>. scFv&#8217;s and nanobodies may take us a bit further, but even that may disappear. Sticking with an evolutionarily-derived protein only gets you so far, and the space of possible biology is staggeringly vast &#8212; it feels extremely unlikely that antibody-like structures is the best we can do. The future very much looks like antibodies mimetics; custom-built antigen-binding protein able to be precisely tuned for their exact task, no attachment to shapes or binding sites, only focusing on efficacy, stability, and ease of production. But over what time horizon will this happen? Given the historical precedent of nanobodies, a drug modality with also a fair bit of promise taking nearly 30 years to reach the clinic, it&#8217;s hard to tell. It may be the case that the medical community moves closer to antibody fragments over the years, such as scFv&#8217;s and nanobodies, but keep a wary eye on mimetics; waiting until there&#8217;s enough evidence to finally push on it. </p><p>But this story may end up playing out differently than historical precedent suggests, because the era of nanobodies did not have one thing we have today: <strong>ML-based</strong> <strong>de-novo protein design that is getting better month-by-month</strong>. The immense de-risking that this provides may rapidly speed up our evolution from typical antibodies and make drug companies embrace modality sooner than later. In such a world, drugs that work as well as usual antibody drugs may become far cheaper, more effective, and easier to transport to those who need it most. Antibodies and even antibody-fragments become part of a bygone era of drug development, only used for very specific diseases and conditions. </p><p>Or maybe not! Feels like forecasting anything accurately in the drug development space is an exercise in futility, all that we can say for certain is that the future is interesting. </p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Why CHO&#8217;s? It goes beyond this post, but it&#8217;s a lot of little things. They have efficient methods for secreting out large proteins, attach the correct (read: works well in humans) sugars to produced proteins, and are amenable to genetic manipulation. </p></div></div>]]></content:encoded></item><item><title><![CDATA[A primer on machine learning in antibody engineering]]></title><description><![CDATA[7.4k words, 34 minutes reading time]]></description><link>https://www.owlposting.com/p/a-primer-on-ai-in-antibody-engineering</link><guid isPermaLink="false">https://www.owlposting.com/p/a-primer-on-ai-in-antibody-engineering</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Mon, 15 Apr 2024 01:01:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!-fzJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-fzJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-fzJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-fzJ!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8156923,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/143614942?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-fzJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!-fzJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0bfaf2b-f41a-412a-89c6-af17b0ed37bc_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Note: per October 30th, 2024, I don&#8217;t think this post was ever actually emailed to anyone, and all traffic to this was via my <a href="https://twitter.com/owl_poster">Twitter</a>. This was actually written on April 14th, 2024. So, if you&#8217;ve already read this, feel free to ignore it! But, if not, and you&#8217;re curious about why so many ML-bio companies seem to be antibody design companies + how it works internally, read on! </em></p><ol><li><p><a href="https://www.abhishaike.com/i/143614942/introduction">Introduction</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/antibody-background">Antibody background</a> </p><ol><li><p><a href="https://www.abhishaike.com/i/143614942/what-is-an-antibody">What is an antibody?</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/antibody-structure">Antibody structure</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/antibody-antigen-binding">Antibody-antigen binding</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/why-would-we-want-to-design-antibodies">Why would we want to design antibodies?</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143614942/antibody-engineering">Antibody Engineering</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143614942/traditional-antibody-engineering">Traditional antibody engineering</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143614942/rational-design">Rational design</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/directed-evolution">Directed evolution</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143614942/antibody-engineering-as-an-ml-problem">Antibody engineering as an ML problem</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143614942/datasets">Datasets</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/predicting-antibody-folds">Predicting antibody folds</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/llm-guided-antibody-mutations">LLM-guided antibody mutations</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/de-novo-cdr-design-conditioned-on-an-antigen">De-novo CDR design conditioned on an antigen</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614942/de-novo-nanobody-design-conditioned-on-an-antigen">De-novo nanobody design conditioned on an antigen</a></p></li></ol></li></ol></li><li><p><a href="https://abhishaike.com/2024/04/15/a-primer-on-ai-in-antibody-engineering/#do-we-even-need-antibodies">Conclusion</a></p></li></ol><h2>Introduction</h2><p>I increasingly see that a fair bit of the progress in ML-based protein design methods is occurring in antibody engineering, which is the process of creating antibodies tailor-made to bind to specific things in the body. I've noticed this phenomenon for the past year, and never quite understood what was going on in the space.</p><p>I've been recently taking the time to better understand this whole field a bit more and made a post about it to cement what I've learned. To note, this is <em>not</em> a machine-learning post in a traditional sense. I won't be going into the exact details of how antibody ML methods exactly work, but rather setting up the contextual knowledge needed to understand these methods at all + going through a few antibody ML papers.</p><h2>Antibody background</h2><h3>What is an antibody?</h3><p>I'm posting this picture right now so you have a place to refer back to it later. Ignore it for now, but it'll probably be useful down the road.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5O-5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5O-5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 424w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 848w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 1272w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5O-5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a634878d-e44f-48db-8b41-48de4afc559a_680x399.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!5O-5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 424w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 848w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 1272w, https://substackcdn.com/image/fetch/$s_!5O-5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa634878d-e44f-48db-8b41-48de4afc559a_680x399.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://absoluteantibody.com/antibody-resources/antibody-engineering/antibody-fragments/">here</a></figcaption></figure></div><p>The immune system is one of those things that is, unfortunately, impossible to understand unless you know all the pieces involved in it. This becomes increasingly clear whenever you look into antibodies; while a surface-level understanding is easy enough to derive from a sentence, fully grasping whats going on requires a bit more work + circling around.</p><p>Let's start with B cells.</p><p>B cells belong to the white blood cell family. Each B-cell is meant to be unique in one capacity: what they bind to. The thing they bind to is commonly referred to as an 'antigen'. Antigens can be extremely diverse, it could be the venom of an insect to fragments of the influenza virus to proteins from shellfish. Let's assume for now that an antigen is <em>bad</em>. This practically isn't the case, 'antigens' are a subjective category, and the human body contains trillions of 'self-antigens'. But, for now, an antigen is a bad thing that shouldn't exist inside you.</p><p>How are they binding to these antigens?</p><p>Well, most B-cells are actually identical to one another except for the part of it that binds to their unique antigen. The singular part of them that is unique and responsible for this binding is called the B-cell-receptor, or BCR. This functionality is meant to be available from the very moment a B-cell is born, it lives to bind to <strong>something</strong>, even if it doesn't know what it is. It may live its entire life never binding to the thing it was born to bind to. This will actually be the case for most B-cells in your body; there are such a staggering number of them that the antigen it was created for <strong>may not even exist in the natural world</strong>. How does the body create such a high diversity of these cells? It goes beyond the scope of this post, but look into <a href="https://en.wikipedia.org/wiki/V(D)J_recombination">V(D)J Recombination</a> if curious.</p><p>But let's say the BCR does bind to something in your bloodstream. We'll brush over some steps here, but if all goes right, upon the BCR successfully binding to its own unique antigen, the B-cell will enter an 'activated' state. An activated B-cell will differentiate into 1 of 2 possible new cell states: a memory B-cell and a plasma cell. For now, we'll ignore the former case, it is largely irrelevant for the discussion of antibodies. If an activated B-cell goes the latter path, or becomes a plasma cell, it becomes a large-scale production factory of BCR's, pumping them out by the billions into your bloodstream. But, at this point, the secreted form of a BCR is no longer a BCR, but an <strong>antibody</strong>. Antibodies are simply the secreted form of a BCR, composed of only the part of a BCR that can bind to an antigen. Past producing antibodies, activated B-cells also start replicating at a massively high rate, helping ramp up even more production of antibodies.</p><p>But let's return back to the antibody-producing plasma cell, why is it producing so many antibodies that bind to the antigen that activated it? Well, we'd like to interfere with the activity of the antigen in some way. Antibodies can do just that, regardless of what exactly the antigen is. If the antigen is a bacteria or virus, antibodies being bound to the surface of it can prevent it from successfully entering a cell. If the antigen is some toxin, antibodies can prevent it from interacting from the tissue around it. Regardless of the antigen type, antibodies also serve to alert cells <strong>other</strong> than the activated plasma cell about what the antigen is and also what binds to it. We won't discuss that aspect much, but look into the <a href="https://en.wikipedia.org/wiki/Complement_system">complement system</a> if interested!</p><p>So that's the life-cycle of an antibody. Now, knowing the biological context in which it exists in, we can ask 'what does an antibody look like and how does it bind to things'?</p><h3>Antibody structure</h3><p>For understanding antibody structure, having a visualization is basically necessary. We can view BCR's as looking largely equivalent to an antibody, sans a few extra proteins at the bottom of the antibody that allow it to transduce information about its current state to the rest of the B-cell. As a matter of terminology, you may see '<em>immunoglobulins</em>', or Ig/IgG, used to refer to antibodies or BCR, it really all means the same thing. You may also see 'monoclonal antibodies', which just refer to normal antibodies that have been created from a single cloned cell, so you can have many (often billions) of the same one.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xL6Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xL6Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 424w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 848w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 1272w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xL6Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!xL6Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 424w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 848w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 1272w, https://substackcdn.com/image/fetch/$s_!xL6Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79df5ec2-fc39-4e3a-a6cf-6d19d004a68f_1024x528.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://teachmephysiology.com/immune-system/adaptive-immune-system/antibodies/">https://teachmephysiology.com/immune-system/adaptive-immune-system/antibodies/</a></figcaption></figure></div><p>Antibodies always come in a symmetric 'Y' shape. The top two segmented ('\' and '/') are identical to one another and really the only thing we'll be talking about. They define exactly what antigen the antibody binds to, and, consequently, this region is known as the '<strong>Fab</strong>', or fragment antigen-binding, region. We can, for all intents and purposes, ignore whatever is going on in the 'trunk' of the Y shape, or the '<strong>Fc</strong>' region, which stands for fragment-crystallizable region. It is extremely important, but not for antibody engineering as it stays basically the same always, it's mainly to allow the antibody to interact with other parts of the immune system.</p><p>On the tips of each of the '\' and '/' segments are '<strong>variable</strong>' regions. This is what is usually modified to cause a change in antigen binding. To note, not <em>all</em> of this region is variable, only specific parts of it are (which we'll discuss later). At the base of the '\' and '/' are 'constant' regions. One word you'll often see in antibody literature is '<strong>isotype</strong>'. The constant regions encode the isotype of the antibody, which change how it interacts with other cells (including the rest of the immune system). There are 5 known isotypes: IgG, IgM, IgA, IgD, and IgE. As the variable regions alone define what antigen the antibody is capable of interacting with, we'll largely ignore isotype entirely in this post.</p><p>The outside of each segment is composed of a 'light chain', and the inside is a 'heavy chain'. Light chains and heavy chains are disambiguated by the number of amino-acids in each one; light chains have 220~ amino acids, heavy chains have 450~ amino acids. You'll notice that each one heavy and light chain have variable regions, meaning that altering either one will change which antigen the antibody can bind to.</p><p>Finally, antibodies are usually composed of four distinct proteins, strings of amino acids. You may look at the above image and think 'arent there more? maybe 6? or 12?'. There are only four, the heavy-chain on each side of the 'Y' continues downwards. As in, the Fc region is comprised of the remaining parts of the two heavy chains that extend down from the Fab regions and join together.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lq-t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lq-t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 424w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 848w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 1272w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lq-t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!lq-t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 424w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 848w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 1272w, https://substackcdn.com/image/fetch/$s_!lq-t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d963471-d23a-4846-8ac3-bb5f80dcc554_1023x682.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>And let's preempt some questions about the whole process:</p><p><em>Why is it Y-shaped? </em>We don't fully know, but we have some guesses. For one, having multiple binding regions on a single antibody increases binding strength to any given antigen, or potentially allows it to bind to multiple of the same antigen. Two, the separation of the Fab regions and the Fc region is important; they are involved in binding to two completely entities (antigens versus immune system cells). Finally, the flexibility of the antibody head is important in order to allow it to bind to antigens at a variety of angles. A 'Y' shape may simply be what evolution arrived at as the optimal shape to meet all these objectives.</p><p><em>Do all creatures have similar-looking antibodies? </em>Yes, for the most part. Of course, there are always exceptions in biology: there <strong>are</strong> antibodies that exist in nature that deviate from the norm. There are so-called 'heavy-chain antibodies', or HCAbs, that are found in camelids and sharks, which have the usual Y-shaped structure but no light-chains.</p><p><em>Can we modify antibody structure even further?</em> Yes! We've created 'bispecific antibodies', which have two different Fab regions, allowing binding to two completely different antigens. We could also strip out the Fv region and Y shape entirely to create 'scFv' antibodies, which have only a single variable heavy chain + light chain. We've also made antibodies that have only a single heavy-chain variable region, referred to as single-domain antibodies, or 'nanobody', or VhH. However, the bulk of antibody re-engineering with AI does not bother with large structural changes of the antibody, only redesign of the variable region, so we won't discuss this aspect much...outside of one section at the end.</p><p>I'm leaving out a lot. As Derek Lowe <a href="https://www.science.org/content/blog-post/nanobodies-against-coronavirus-something-new">wrote in his nanobody pos</a>t (which I highly recommend!),<em> 'in any discussion of immunology that runs to less than about 500 pages in 6-point type you'll be leaving out a lot.'</em>, but we've gone through the basics.</p><h3>Antibody-antigen binding</h3><p>Let's focus in our attention to the regions of the antibody that are actually binding to the antigen. As discussed before, these are part of the Fab region, and specific segments of this region are called the 'complementarity-determining region', or <strong>CDR</strong>. It may also be referred to as 'hypervariable' regions, but this seems to be less common terminology. It may be useful now to abandon the more simplistic antibody structures shown so far and now focus in on the crystallized structure of the antibody. <strong>The set of CDR's on an antibody are largely all that matters for binding to an antigen.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4kiP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4kiP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 424w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 848w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 1272w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4kiP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!4kiP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 424w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 848w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 1272w, https://substackcdn.com/image/fetch/$s_!4kiP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980e9aa0-54e6-4a68-9c59-f8222692569f_700x408.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://en.wikipedia.org/wiki/Complementarity-determining_region#/media/File:Complementarity_determining_regions.PNG">here</a>.</figcaption></figure></div><p>The above pictures shows the two Fab regions of an antibody. The CDR's of the&nbsp;<a href="https://en.wikipedia.org/wiki/Immunoglobulin_heavy_chain">heavy chain</a>&nbsp;are shown in red, denoted as CDR1, CD2, and CDR3. The light chain will have three CDR's too. These are typically denoted with an H or L. So, each Fab segment will have 6 unique CDR's in total: HCDR1, HCDR2, HCDR3, LCDR1, LCDR2, LCDR3. You may also see slightly different notation depending on the text, I have also seen CDR-H1 or H-CDR1. And this will repeat for the other Fab segment, so each antibody will have 12 total CDR's. The sum combination of all of these regions make up the <strong>paratope </strong>of the antibody.</p><p>Finally, how do we refer to the non-CDR's of the variable region? As in, the region that's connected to the CDR's, but not a part of it, That's the <a href="https://en.wikipedia.org/wiki/Framework_region">framework region</a>, often referred to as the 'FWR' region. These display some variability, but massively less than the CDR regions, and aren't often modified in antibody engineering problems.</p><p>Here's a question to test our understanding: how many possible paratopes are there? Given that each CDR is between 8-15 amino acids, with 6 CDR's in total, and 20 amino acids in total in the human body, that means we range from (20^8)^6 to (20^15)^6 possible paratopes, or 10^62 to 10^117. On a more practical level, the theorized upper limit of all paratopes is more-so 10^13, there are several practical limitations to antibodies that prevent us from reaching the true upper limit, but we won't get into that.</p><p>Here's another picture, zoomed in on a single paratope.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7PzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7PzW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 424w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 848w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 1272w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7PzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!7PzW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 424w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 848w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 1272w, https://substackcdn.com/image/fetch/$s_!7PzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82e7e936-2102-462f-89f3-aa804baaad97_1024x733.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://www.mdpi.com/2073-4468/8/4/55">From here</a></figcaption></figure></div><p>A quick note about antigens: while antigens can be basically <em>anything</em>, there is still some terminology for them. The <strong>epitope</strong> defines the region of an antigen that an antibody is capable of binding to. As one might expect, defining the space of all possible epitopes is complicated, as its dependent on the antibody used for the given antigen, the environment in which the antibody and antigen are in, and even the flexibility of the antigen (see the below picture).</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G1q0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G1q0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 424w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 848w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 1272w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G1q0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!G1q0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 424w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 848w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 1272w, https://substackcdn.com/image/fetch/$s_!G1q0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe90ab7f9-c960-414b-8990-d0c2a1a04a83_1023x528.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Picture taken from <a href="https://www.rapidnovor.com/what-is-hdx-ms-epitope-mapping/">here</a>. Some more reading: <a href="https://en.wikipedia.org/wiki/Conformational_epitope">Conformational epitope</a></figcaption></figure></div><p>One immediate question we may have about this whole process is 'what do we mean by binding?'. Without getting too into the weeds, binding between antibodies and antigens are typically measured via the <strong>equilibrium dissociation constant</strong>, often referred to just as <strong>Kd value</strong>. This represents the concentration of antibody at which half of the antigen binding sites are occupied at equilibrium. A lower Kd value indicates a higher affinity, meaning that the antibody binds more tightly to its antigen. Suppose we have an antibody that binds to its target antigen with a Kd value of 1 nM (nanomolar). This means that at equilibrium, half of the antibody's binding sites will be occupied by the antigen when the antibody concentration is 1 nM. If the Kd value was instead 5 nM, that means it would take 5x the amount of antibody for half of its binding sites to be occupied. <strong>Antibodies with Kd values in the nanomolar range (e.g., 1-10 nM) are generally considered to have high affinity, while those with Kd values in the micromolar range (e.g., 1-10 &#181;M) have lower affinity.</strong></p><p>And this is about all the background knowledge needed. We have <strong>paratopes </strong>of an antibody, composed of 6 variable <strong>CDR's</strong>, 3 heavy and 3 light, regions that we can modify. Each CDR has between 8-15 amino acids. And all of this is meant to bind to the <strong>epitope </strong>of some antigen, the strength of which can be measured via the <strong>Kd value</strong>.</p><p>Before we further discussing the engineering of the antibody, let's quickly motivate the problem.</p><h3>Why would we want to design antibodies?</h3><p>Being able to custom-create antibodies to target specific antigens allows clinicians, in some sense, to take the immune system's job into their own hands. Rather than relying on the body's natural immune response, which may be inadequate or even nonexistent in certain diseases, engineered antibodies provide a way to directly intervene and guide the immune system to fight the desired target. Engineered antibodies function by the same principles as the body's native antibodies, binding selectively to their target antigen and triggering various immune responses such as <a href="https://en.wikipedia.org/wiki/Neutralizing_antibody">neutralization</a>, <a href="https://en.wikipedia.org/wiki/Antibody_opsonization">opsonization</a>, or complement system activation.</p><p>What are some examples of engineered antibodies being uniquely valuable in a clinical context? The most obvious one is in the case of infectious diseases.</p><p>The general public became more aware of the utility of monoclonal antibodies during the peak of the SARS-CoV-2 pandemic, especially <a href="https://www.nytimes.com/2020/10/02/health/trump-antibody-treatment.html">after Trump's infamous usage of the Regeneron antibody cocktail</a>. This particular cocktail was a combination of two designed antibodies, casirivimab and imdevimab. Both of these were designed to bind directly to SARS-CoV-2, each one targeting a slightly different region of it (<a href="https://pubmed.ncbi.nlm.nih.gov/26547132/">combinations of antibodies that bind to the same antigen + different epitopes are known to prevent mutational escape</a>).</p><p>But why would you need these tailor-made, engineered antibodies instead of relying on your immune system to go through the antigen -&gt; B-cell recognition -&gt; plasma cell producing antibodies?</p><p>Here are some reasons.</p><ol><li><p><strong>Time:</strong> Developing a natural antibody response takes time, as the immune system needs to recognize the pathogen, activate B cells, and produce specific antibodies. In severe cases, this delay can lead to serious complications or even death. Engineered antibodies, on the other hand, can be administered immediately, providing rapid protection against the pathogen.</p></li><li><p><strong>Affinity: </strong>The natural immune response may not always produce antibodies with high affinity, or binding strength, or Kd, for the pathogen. While I have talked at length about how your body has BCR's (and antibodies) for almost every known antigen, it isn't assured that these natural antibodies can <em>tightly</em><strong> </strong>bind to an antigen, only that it can bind at all! Engineered antibodies, like casirivimab and imdevimab, are designed to have extremely high binding affinities with the antigen, ensuring that the antibody can do its job that much more effectively.</p></li><li><p><strong>Vulnerability:</strong> Some patients, such as the elderly or immunocompromised individuals, may not be able to mount an effective adaptive immune response at all. Engineered antibodies can, in these cases, serve to largely replace the role that their adaptive immune system would otherwise take on.</p></li></ol><p>Infectious diseases are one of the more obvious places for antibody therapies to be applied. But common antibody therapies also extend into two more categories: cancer immunotherapies and autoimmune conditions.</p><p>In the former case of cancer, antibody development closely matches the infectious disease framework, only instead of viruses, the target can be certain cell receptors that are highly expressed by certain types of cancer. An example case here is Rituximab, which targets the CD20 antigen, which is highly expressed on the surface of B-cell lymphomas. Since the adaptive immune system often fails entirely to recognize the traits of cancerous cells, engineered antibodies are a vital line of therapy; though, of course, it fails entirely for cancers that mutate too fast for there to be a consistent display of recognizable cell receptors. An especially interesting direction here are '<a href="https://en.wikipedia.org/wiki/Antibody%E2%80%93drug_conjugate">antibody-drug-conjugates</a>', which are antibodies engineered to bind to cancer-cell-specific receptors, but are also chemically linked to chemotherapy compounds; allowing us to deliver chemotherapy to only cancerous cells while sparing healthy ones.</p><p>The latter case of antibodies for autoimmune conditions is a bit more interesting; instead of defending from a threat, like an internal mutation or virus, the focus is on dampening down the body's own immune system in a hyper-specific way. An example here is the engineered antibody adalimumab (Humira), created to alleviate symptoms from rheumatoid arthritis (RA). The logic here is that RA drives the upregulation of tumor necrosis factor-alpha (TNF-&#945;), pro-inflammatory cytokine that causes joint inflammation, and adalimumab is engineered to bind to TNF-&#945;. By binding to the cytokine and preventing its action on other cells, the antibody therapy massively reduces the primary symptoms of RA. Without engineered antibodies here, a sufferer of RA only has one other instant fix: broad immune system suppressants, which are obviously undesirable.</p><p>I've shown the bright sides of antibody therapies, but given that infectious diseases, cancer, and autoimmune conditions are still a problem in today's world, it is obviously not a panacea. While there's a bevy of problems we could discuss here, there is one that is especially salient to antibody engineering:<strong> polyreactivity</strong>. Natural antibodies that exist in the human body go through an immensely complex process called <strong><a href="https://en.wikipedia.org/wiki/Central_tolerance">central tolerance</a></strong>, which weeds out any would-be antibodies that react to antigens that are on the surface of our cells, secreted proteins, and so on. The reason for this is simple: you don't want an immune response to be mounted against parts of <em>you </em>(otherwise known as autoimmunity!). When we engineer antibodies, which have obviously not gone through this central tolerance process, we may find that they not only bind to the antigen we want, but a great deal of other, very important proteins in the human body! In such a case, we'd label the engineered antibody to be polyreactive and (hopefully) shelve it, as giving it to a patient could occur in an almost immediate (potentially fatal) autoimmune reaction.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hA_u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hA_u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 424w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 848w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 1272w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hA_u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!hA_u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 424w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 848w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 1272w, https://substackcdn.com/image/fetch/$s_!hA_u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ba71c35-e4bd-4476-9d2c-a7fef466d222_600x314.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://www.cusabio.com/c-21045.html">here</a></figcaption></figure></div><h2>Antibody Engineering</h2><p>Now that we understand what antibodies are, the clinical context behind why engineered antibodies is important, and what does engineering an antibody actually involve (modifying the paratopes of an antibody to better bind to the epitopes of an antigen), we're ready to start discussing how design actually works in practice. We'll first cover how design of an antibody is traditionally done, the drawbacks of it, and then go over how ML is being applied here.</p><h3>Traditional antibody engineering</h3><p>Traditional antibody engineering has relied on a combination of 'rational design' approaches and 'directed evolution'.</p><h4>Rational design</h4><p>In rational design, researchers use their knowledge of antibody structure and desired epitopes binding to make targeted changes to the paratope. The approaches here are extremely diverse. One particularly common technique <a href="https://www.pnas.org/doi/full/10.1073/pnas.1422401112">relies on crystallized structures of protein-peptides to generate peptides to graft onto CDR's</a>. The core idea here is that if a segment of protein (say, a stretch of 7-amino-acids) are often closely found near the desired epitope of an antigen (using databases of crystallized proteins), one could simply put those same 7 amino acids in a CDR loop of our antibody to get the same binding effect! In this paper specifically, they find that multiple stretches of interactions could also be tied together, allowing you to arrive to a 7-amino-acid insertion from observing occurrence of a 3-amino-acid x epitope and 4-amino-acid x epitope. Surprisingly, this method of 'plucking out interactions' often works; distilling protein-protein interactions down to a few specific amino acid groups is definitely more an art than a science, but experienced protein designers can recognize the situations in which methods like this work. There are plenty of other methods used in the rational design space, but they all fall in these lines of 'recognize what evolution has already produced and copy it'.</p><p>Of course, this is often a painstaking process that requires years of experience to effectively do, is still often quite error prone, and is massively low-throughput.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8EZ3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8EZ3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 424w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 848w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 1272w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8EZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8EZ3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 424w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 848w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 1272w, https://substackcdn.com/image/fetch/$s_!8EZ3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85bd1bac-d344-43a4-9a5c-65bccedc0e60_874x872.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://www.pnas.org/doi/epdf/10.1073/pnas.1422401112">here</a>, picture demonstrates using crystallized structures in the antibody design process</figcaption></figure></div><p>Rational design for antibodies is often also coupled alongside molecular dynamics (MD) simulations. These simulations allow a researcher to understand how their designed antibody interacts with an epitope, how it reacts to temperature/pH changes, and how structurally stable it is, all via simulating the underlying physics of the antibody as it interacts with the world around it. <a href="https://www.nature.com/articles/s41598-023-42698-7">Here is an example paper,</a> where they analyze the results of MD simulations of the Fab region of an antibody designed to bind to the human protein <a href="https://en.wikipedia.org/wiki/GPA33">A33</a>, finding that the antibody is unstable (according to the simulations) at lower pH values.</p><p>However, it's unclear how accurate these simulations actually are and how much value they provide in the antibody development process. For practical purposes, it seems like simply confirm the results of wet-lab experiments and rarely provide unique information by themselves. As with rational design in general, MD is also extremely slow, with nanoseconds of protein fragments often taking hours to days of compute time, and setting up the 'correct' parameters for a simulation is very much an art honed by its practitioners over years.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UjJG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UjJG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 424w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 848w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 1272w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UjJG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!UjJG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 424w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 848w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 1272w, https://substackcdn.com/image/fetch/$s_!UjJG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ff0bf8f-b7f7-40be-bc1b-03c61c4c63fc_685x611.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://www.nature.com/articles/s41598-023-42698-7">here</a>. Antibodies were simulated via a variety of starting conditions (pH and different starting crystal states) and their flexibility (Rg) were recorded over 100 nanoseconds.</figcaption></figure></div><h4>Directed evolution</h4><p>The directed evolution (DE) case is a fairly large step up in efficiency. Here, researchers start with a 'parent' antibody that already binds to the target antigen, but perhaps not with the desired affinity or specificity. They then create a library of antibody variants by randomly mutating the gene that encodes the antibody, focusing particularly on the regions that make up the paratope. These variant libraries, which can contain billions of different antibody sequences, are then expressed in phage, yeast, or mammalian cells and screened for binding to the target antigen (using techniques such as surface plasma resonance or phage display). The top performers are selected, and the process is repeated, with each round of screening and selection resulting in antibodies with increasingly optimized properties.</p><p>The drawbacks here are bit more nuanced than the rational design case.</p><ol><li><p><strong>Bias towards high-affinity binders</strong>. Directed evolution methods tend to select for antibodies with the highest affinity for the target antigen. While high affinity is often desirable, it may not always be optimal for certain applications. For example, in some cases, moderate-affinity antibodies may have better tissue penetration or faster clearance rates, which may be considered more valuable than pure binding strength.</p></li><li><p><strong>Lack of control over epitope specificity</strong>. Random mutagenesis and selection based on binding affinity alone may not always result in antibodies that bind to the desired epitope on the target antigen. This can be especially an issue when aiming to develop antibodies that target specific conformations or post-translational modifications of the antigen.</p></li><li><p><strong>Dependence on the starting antibody</strong>. The success of directed evolution largely depends on the quality of the starting antibody. If the initial antibody has poor specificity or binds to an irrelevant epitope, the resulting optimized antibodies may not have the desired properties.</p></li><li><p><strong>Limited sequence diversity</strong>. Although directed evolution can generate large libraries of antibody variants, the sequence diversity is still limited by the starting antibody sequence and the mutation methods employed. This may restrict the exploration of novel antibody sequences that could potentially have better properties.</p></li><li><p><strong>Time and resource intensive.</strong> Directed evolution requires multiple rounds of library generation, screening, and selection, which can be time-consuming and resource-intensive. Each round may take several weeks to complete, and the entire process may require several months to a year to develop an optimized antibody.</p></li></ol><p>Some may consider these to be minor problems and, in many ways, they are. At the level of billions of created antibodies, over multiple optimization rounds, at least one often meets the minimum clinical criteria. But the cost that goes into directed evolution can still be significant, especially as the number of rounds expand, and the turnaround time can be problematic for pandemics that require immediate fixes.</p><p>Is it possible for ML to handle the antibody design problem entirely and allow us to design antibodies in under an hour, with zero experimental follow-up, at any desired affinity value, with unlimited diversity, and at specific epitopes? Well...not yet, but we're getting there.</p><h3>Antibody engineering as an ML problem</h3><p>There are several ways that people have proposed applying ML to the antibody engineering problem. I'll first go through an explanation of the major datasets used in the field (and what they contain + value they provide to practitioners). Then we'll go through four well-known models in the antibody x ML field. Each section is meant to deepen our understanding of the problem, not to fully explain the underpinnings of each method! This field moves fast, and this blog post will undoubtedly be out of date within a few months, but the intuition we'll build by going through these papers will continue to be useful.</p><h4>Datasets</h4><p>There are only two main datasets here. We'll refer to them by their acronym in the following sections.</p><p><strong>OAS</strong>, or <a href="https://opig.stats.ox.ac.uk/webapps/oas/documentation/">Observed Antibody Space</a>, is a comprehensive collection of antibody data derived from next-generation sequencing experiments. To note, sequencing is limited to the Fab region, primarily on the variable heavy-chain (<strong>VHC</strong>) and variable light-chain (<strong>VLC</strong>) regions. For a significant fraction of the dataset, some constant information is captured, allowing isotypes to be assigned as well. The primary value of this data comes from the sequences of the variable regions, which is helpful in characterizing the exact space of all 'possible' antibody sequences, especially in the CDR regions, allowing researchers to assess diversity of generated antibodies compared to natural ones. The dataset also contains data from nontraditional antibodies, such as nanobodies, but that's a minority.</p><ul><li><p>This dataset is divided into two sections: unpaired sequences and paired sequences.<strong> Unpaired sequences have the VHC and VLC sequences separated and cannot be tied back to one another. </strong>This is an unfortunate consequences of the limitations of sequencing technology; capturing sequences of proteins chained to one another (as is the case in antibodies) is challenging to do at a high-throughput scale! Advancements in sequencing technology means we can sequence paired sequences as well these days, being able to have the VHC and VLC sequences from the same antibody tied to one another, but it's at a much scale. Whereas the OAS has 3B unpaired VHC/VLC sequences, there are only 120k paired VHC/VLC sequences.</p></li></ul><ul><li><p>An interesting drawback note about this dataset, and many others like it, is that it's derived from 'naive' B-cell BCR's, not raw antibodies! We've discussed BCR's, but what makes for a naive B cell? We've also discussed B-cell activation, which is when B-cells come across an antigen that binds to its unique BCR and transforms to an 'activated' B-cell, which pumps out billions of antibodies + begins replicating. What we <em>haven't</em> mentioned is that this replication process is intentionally error-prone in regions of the CDR, about a <em>million</em> times more-so than usual B-cell division. This means the antibodies produced by the 'children' of an activated B-cell are extraordinarily diverse. Most importantly, these children cells often have much higher binding affinity to the antigen it's meant to bind to than its 'parent' B-cell<a href="#708a9589-33ab-4f8e-9171-43cf5e647e1b"><sup>1</sup></a>. Unfortunately, activated B-cells are the minority of B-cells in your body, making sampling them by sequencing methods challenging. This leads to a sampling problem, where OAS is primarily composed of so-called '<strong>germline</strong>' B-cells that have a ceiling on the amount of expected diversity + binding affinity in BCR's, as germline B-cells typically have low affinity for any given antigen. In the era of large-scale LLM's, this type of dataset bias can lead to potential problems, <a href="https://www.biorxiv.org/content/10.1101/2024.02.02.578678v1.full.pdf">here's an interesting paper about it. </a>Antibody AI papers will often bring up this problem as well, but it's not a huge concern for the time being.</p></li></ul><p><strong>SAbDab</strong>, or <a href="https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab">Structural Antibody Database</a>, curates experimentally determined antibody structures from the <a href="https://www.rcsb.org/">PDB</a>. As with OAS, more exotic forms of antibodies are also included here, like nanobodies, but it's the minority. Some statistics on dataset size are below, it's quite a bit smaller than the sequence-only dataset. This dataset is primarily used for models that rely on structural information, such as antibody-antigen complexes.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pT04!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pT04!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 424w, https://substackcdn.com/image/fetch/$s_!pT04!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 848w, https://substackcdn.com/image/fetch/$s_!pT04!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 1272w, https://substackcdn.com/image/fetch/$s_!pT04!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pT04!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!pT04!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 424w, https://substackcdn.com/image/fetch/$s_!pT04!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 848w, https://substackcdn.com/image/fetch/$s_!pT04!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 1272w, https://substackcdn.com/image/fetch/$s_!pT04!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc95813c7-38d0-40f9-9f9f-17c296e182d5_1023x761.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">From <a href="https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/stats/">here</a></figcaption></figure></div><h4>Predicting antibody folds</h4><p><a href="https://www.nature.com/articles/s41467-023-38063-x">Link here.</a></p><p>The protein folding problem is a familiar one to any biologist and has been, in the monomer case, largely solved by <a href="https://www.nature.com/articles/s41586-021-03819-2">Alphafold2</a>. Being able to easily access the fold of any protein can explain a great deal about how a protein interacts with the world around it. Antibodies are no exception to this, the final fold of the paratope (remember, HCDR1-3 and LCDR1-3!) is the deciding factor in well it can bind to certain antigens. If we could accurately model antibody folds, the aforementioned molecular dynamic methods may become all that more powerful.</p><p>Interestingly, despite the variability of the CDR regions, the primary challenge of predicting the final antibody fold stems from HCDR3 alone! This is a repeating pattern in many of these papers, despite all CDR's being technically involved in antigen binding, HCDR3 is consistently the most important + challenging to modify. The paper has this to say about it:</p><blockquote><p>Five of the CDR loops tend to adopt canonical folds that can be predicted effectively by sequence similarity. However, the third CDR loop of the heavy chain (CDR H3) has proven a challenge to model due to its increased diversity, both in sequence and length. Further, the position of the H3 loop at the interface between the heavy and light chains makes its conformation dependent on the inter-chain orientation. Given its central role in binding, advances in prediction of H3 loop structures are critical for understanding antibody-antigen interactions and enabling rational design of antibodies.</p></blockquote><p>IgFold was an attempt to see if an LLM focused on <em>only</em> antibody folding could outperform Alphafold-Multimer, a version of Alphafold2 extended to work with multimers generally. Alphafold-Multimer was a step forwards for modeling the folding of multimeric proteins, but it wasn't quite the slam-dunk as the original model was for monomers, so IgFold did have some claim to make for the problem.</p><p>It was trained on 4,275 paired antibodies and nanobodies from SAbDab, but also had this (in my opinion, strange) second step of data augmentation where it used Alphafold2-predicted (not Alphafold-Multimer, because it hadn't been released yet...but it was released in time for the validation?) structures of 26,971 unpaired and 16,141 paired sequences from OAS.</p><p>The final result was only slightly positive; IgFold was largely identical to Alphafold-Multimer in accuracy (maybe <em>slightly</em> better). However, whereas Alphafold-Multimer takes over 10 minutes to run per antibody sequence, IgFold prediction was typically under 25 seconds; a 24x speedup. As far as I can tell though, these sorts of antibody structure prediction models haven't lead to anything particularly interesting in terms of antibody engineering.</p><h4>LLM-guided antibody mutations</h4><p><a href="https://www.nature.com/articles/s41587-023-01763-2#Sec12">Link here</a></p><p>The basic idea here is to take FDA-approved antibodies that have already been through an optimization process and re-optimize them via proteomics LLM's. The way they did this was by computing log-likelihoods of every single-amino-acid-substitution of the VHC and VLC. So, for example, switching out a glycine to a cysteine on a VHC, and asking a set of pLM if the mutated version has 'higher likelihood to exist in nature'. These are questions that protein LLM's are quite good at answering, since they have been trained on a significant fraction of all proteins that have ever evolved. There were six LLM's used in total (ESM-1b and the five models part of ESM-1v), and they took the consensus of what all 6 said. They performed two rounds of optimization, the first round mutating a single residue and the second round mutating another one on top of the prior one. Importantly, no extra training of these networks were done, they used the pre-trained weights.</p><p>I originally learned about this paper from a <a href="https://www.science.org/content/blog-post/try-antibody-over-here">Derek Lowe post,</a> and his words best sum up the extraordinary results:</p><blockquote><p>The results are impressive, particularly because they&#8217;re not starting with random antibodies to random proteins. No, they start with seven that are already in clinical use, and thus have already been through stages of optimization for their binding affinities and physical properties. Even with these, the pLM-suggested mutations are an improvement - better thermal stability, lower immunogenicity, and in&nbsp;<em>every single case</em>&nbsp;better potency in neutralization assays. This after scoring fewer [than] 20 suggested variants for each case and after two rounds of traditional laboratory evolution, which is&nbsp;<em>far</em>&nbsp;less work than usual for this kind of thing.</p></blockquote><p>One minor nit about Derek's overview: he interprets the papers usage of pre-optimized antibodies as particularly impressive, <strong>but this is likely a far easier problem than optimizing random antibodies</strong>. It's a reasonably common experience in life-science ML to find that models are quite good at optimizing things that already have desirable properties, but terrible at finding those things in the first place. Similar phenomenons pop up elsewhere too, in binder generation, small-molecule generation, and so on. Of course, I cannot confirm this for sure, I just heavily suspect it to be the case.</p><p>A final interesting part of the study is that a majority of beneficial mutations were not in the CDR regions! For all I've been discussing the importance of paratope mutation, there may be a significant amount of value in exploiting mutations outside of it. From the paper:</p><blockquote><p>Thirty-six out of all 76 language-model-recommended, single-residue substitutions (and 18 out of 32 substitutions that lead to improved affinity) occur in framework regions, which are generally less mutated during conventional affinity maturation compared to the complementarity-determining regions (CDRs)</p></blockquote><h4>De-novo CDR design conditioned on an antigen</h4><p><a href="https://www.biorxiv.org/content/10.1101/2023.01.08.523187v1.full.pdf">Link here.</a></p><p>Folding an antibody is really a step removed from the problem we really care about. What we'd really like is the ability to provide an antigen structure and have the model design an antibody specifically tailored to bind that antigen with high affinity and specificity. A paper in early 2023 poked at this exact problem, they designed fully antibodies that bind to a given antigen, all via using a model that has never seen antibodies that bind to that specific antigen! This is the 'de-novo' part, there are many antigens for which we've never observed an antibody that could bind to it, so a model that doesn't require that is exceptionally powerful.</p><p>They redesigned HCDR3 regions (HCDR3 coming up again as important!) of a well-known antibody (trastuzumab) that binds to an antigen protein called HER2. Given only the HER2 amino-acid sequence, they had the model fill in the HCDR3 regions of trastuzumab. What exactly is this 'model'? It is unfortunately not discussed at all in the paper, likely for commercial reasons given that an antibody design company released the paper.</p><p>Notably, this means we are running into the same issue as before with the LLM evolution paper! Given that binding outside of HCDR3 is known to be important and that trastuzumab contains mutations outside of HCDR3, we may expect that the modeling strategy here may pick a 'free ride' on the already good sequence space that the non-HCDR3 regions of trastuzumab inhabit. Moreover, is this actually 'de-novo', if we're starting from a validated antibody. We'll return to this point later on...</p><p>Via their model, they generate 400k possible antibodies and test binding of them in a high-throughput fashion to the HER2 antigen. Of these, 4k show possible binding and 421 of them were selected for follow-up validation via pre-filtering by some molecular dynamics magic they explain in the supplementary material. 71 of these had &lt; 10nM Kd values (remember, &lt;10nM Kd is the range of 'high' affinity), and three of them had higher Kd values than trastuzumab. They claim high sequence diversity amongst all generated antibodies. This is definitely a win for HCDR3 design in some capacity, though the issues raised above should be kept in mind.</p><p>Things are less rosy when the model is extended to redesign all HCDR regions...though, of course, while these fully redesigned antibody Kd values aren't terrible, they are quite far off from an established drug. The dream of full antibody design is far off, even redesigning HCDR regions (and not HCDR, or even non-CDR regions) of an established antibody is still quite challenging!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o7CB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o7CB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 424w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 848w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 1272w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o7CB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!o7CB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 424w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 848w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 1272w, https://substackcdn.com/image/fetch/$s_!o7CB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc217239-cc6b-4473-9158-3588cbf4c4a4_1022x469.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>There's another reason we're discussing this paper, and thats because of an interesting twitter (X) thread that another antibody design startup made about this work! They were fairly pessimistic about the results and mirror our HCDR3 + de-novo concerns from earlier, and we're finally in a place to understand their critiques, so I highly recommend reading the thread! It's a good way to test our understanding of the subject: </p><p><a href="https://twitter.com/SurgeBiswas/status/1613232556673224705?s=20&amp;t=msepm203T3AoyEkIuMRaDA">https://twitter.com/SurgeBiswas/status/1613232556673224705?s=20&amp;t=msepm203T3AoyEkIuMRaDA</a> </p><p>So, what's the verdict here? There really isn't one, the role of ML in antibody design is still fuzzy and what is 'impressive' or worthy of attention is still a little vague. <a href="https://www.biorxiv.org/content/10.1101/2023.12.08.570889v1.full.pdf">The company behind this paper did release some follow-up work that did directly address some of the concerns raised in the thread</a>, using a method called 'IgDesign' that generates de-novo antibodies conditioned on an antigen. We won't discuss this paper heavily, since the results are, in my opinion, much more nuanced and difficult to explain without an entire post dedicated to it. But I do highly recommend reading the paper! The methodology involves redesign of HCDR1-3 (alongside HCDR3 only to compare) focused on a variety of antigens (<strong>still using a validated antibody for the antigen for the rest of the antibody though!</strong>), and they achieve some interesting results: <em>For 5 out of 8 antigens (antigen 1, 2, 6, 7, and 8), IgDesign generates binders with equal or higher affinities to the reference antibody. For these 5 targets, several designed binders have affinities within one order of magnitude of the reference antibody&#8217;s affinity. </em>If you're interested in what is the 'state of the art' when it comes to antibody design, I feel mildly confident in saying that it is this one.</p><h4>De-novo nanobody design conditioned on an antigen</h4><p><a href="https://www.biorxiv.org/content/10.1101/2024.03.14.585103v1.full.pdf">Link here.</a></p><p>Of course the Baker lab, a legend in solving protein design challenges, has an approach to this. And it comes closest to our dream of complete design of an antibody with nothing more than the antigen provided as input. They use a re-trained <a href="https://github.com/RosettaCommons/RFdiffusion">RFDiffusion</a> model as their primary method for doing this. Getting into the weeds of RFDiffusion would be really hard, so, while I do have a <a href="https://abhishaike.com/2024/03/08/rfdiffusion-a-year-retrospective/">post of RFDiffusion that explains some very surface-level details of the model, </a>we could simply view this re-trained RFDiffusion model as a magic box that produces new antibodies given an antigen.</p><p>But, it's not exactly an antibody that they are designing, it's a 'nanobody', or VHH. In the eyes of this paper, designing full antibodies isn't particularly important, given that smaller, more compact versions of those antibodies work with basically the same efficacy, and maybe have some advantages as well. We've discussed these form of antibodies earlier in this post, but as a reminder, nanobodies are composed of <em>only </em>one of the VHC's of an antibody; so, only 3 CDR's, all of them heavy, and no FWR region.</p><p>Let's walk through exactly what they did. Their model creates structures and sequences for HCDR1-3 regions, but, once again, keeps the FWR region provided by the user + constant. Moreover, unlike all prior method(s), they are able to target specific epitopes of an antigen at inference time for the nanobody to bind to, which is a huge step forwards in design. They use an established nanobody that has been confirmed to work in humans as the framework for <em>all</em> designs called '<em>NbBcII10FGLA</em>', which is basically just a natural nanobody found in camels but <a href="https://en.wikipedia.org/wiki/Humanized_antibody">humanized</a>. Finally, they design nanobodies for a range of antigens: Clostridium difficile toxin B (TcdB), influenza H1 hemagglutinin (HA), respiratory syncytial virus (RSV) sites I and III, SARS-CoV-2 receptor binding domain (Covid RBD) and IL-7R&#593;, so, lots of diversity in antigens.</p><p>So, what are the results? Best for them to state it in their own terms:</p><blockquote><p>9000 designed VHHs were screened against RSV site III and influenza hemagglutinin with yeast surface display, before soluble expression of the top hits in E. coli. Surface Plasmon Resonance (SPR) demonstrated that the highest affinity VHHs to RSV site III and Influenza Hemagglutinin bound their respective targets with 1.4&#956;M and 78nM respectively. C) 9000 VHH designs were tested against SARS-CoV-2 receptor binding domain (RBD), and after soluble expression, SPR confirmed an affinity of 5.5&#956;M to the target. Importantly, binding was to the expected epitope, confirmed by competition with a structurally confirmed de novo binder (AHB2, PDB: 7UHB). D) 95 VHH designs were tested against the C. difficile toxin TcdB. The highest affinity VHH bound with 262nM affinity, and also competed with an unpublished, structurally confirmed de novo binder to the same epitope (right)</p></blockquote><p>So, while none of their designed nanobodies could be reasonably characterized as 'strong' binders to their antigen (nothing less than &lt;10nM Kd), some could be described as moderate (&lt;10&#956;M Kd); there were also nanobodies that reached the &lt;1 &#956;M Kd region for HA and TcdB. Nanobodies for RBD and TcdB were also hitting the epitope desired during inference time, which is huge, especially considering that binders to this epitope weren't found in nature! They do little discussion on the diversity of sequences produced; I can't find any comparison to OAS, as is typically done to assess diversity, here.</p><p>Moderate results overall. A huge jump in applying antibody design techniques to a minimized form of antibody (which shows more promise than antibodies generally, even outside of AI), but the binding capabilities of these engineered nanobodies are lacking compared to what we expect from clinical-grade antibodies. However, <strong>this is closest to true de-novo design;</strong> the base nanobody used to scaffold the HCDR's was never explicitly meant to bind to any of the antigens, so a model being a 'freeloader' on non-HCDR changes is unlikely here. Hoping that the Baker lab releases a follow-up paper here soon given that RFDiffusion-AA was released, might dramatically improve accuracy.</p><h2>Conclusion</h2><p>Immunology is complicated, and antibody engineering using ML is still very much in the early days. I originally started this post expecting that there were some real magic bullets hiding away in papers I couldn't understand, but it seems like that isn't quite the case. People are still figuring out exactly what's going on in the field, and the results show that. There are aspects of ML-assisted antibody engineering that aren't even yet being discussed, like polyreactivity and, up until the Baker lab paper just a month ago, even epitope targeting! And I'm sure there are more nuanced aspects of antibody development that I'm not even aware of and are crucially important to rational antibody designers, but hasn't even been touched by ML researchers.</p><p>But, as with everything in this field, things can change overnight!</p>]]></content:encoded></item><item><title><![CDATA[A primer on scRNA-seq foundation models]]></title><description><![CDATA[6.4k words, 30 minutes reading time]]></description><link>https://www.owlposting.com/p/the-lore-behind-scrna-seq-foundation-models</link><guid isPermaLink="false">https://www.owlposting.com/p/the-lore-behind-scrna-seq-foundation-models</guid><dc:creator><![CDATA[Abhishaike Mahajan]]></dc:creator><pubDate>Sun, 17 Mar 2024 00:16:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0SOc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0SOc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0SOc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0SOc!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png" width="1200" height="672.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:816,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:8320607,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.owlposting.com/i/143614943?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0SOc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 424w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 848w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 1272w, https://substackcdn.com/image/fetch/$s_!0SOc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3773cb70-57f7-4eb8-9084-f062620565df_2912x1632.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><a href="https://www.abhishaike.com/i/143614943/introduction">Introduction</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614943/scrna-seq">scRNA-seq</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143614943/what-is-scrna-seq-1">What is scRNA-seq?</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614943/what-are-cell-atlases">What are Cell Atlases?</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614943/the-problem-with-scrna-seq">The problem with scRNA-seq</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143614943/scrna-foundation-models">scRNA Foundation Models</a></p><ol><li><p><a href="https://www.abhishaike.com/i/143614943/history">History</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614943/the-problem-with-scrna-foundation-models">The Problem With scRNA Foundation Models</a></p></li><li><p><a href="https://www.abhishaike.com/i/143614943/universal-cell-embeddings">Universal Cell Embeddings</a></p></li></ol></li><li><p><a href="https://www.abhishaike.com/i/143614943/what-does-the-future-look-like">What does the future look like?</a></p></li></ol><h2>Introduction</h2><p>A few weeks ago, the New York Times published an article titled <a href="https://www.nytimes.com/2024/03/10/science/ai-learning-biology.html">A.I. Is Learning What It Means to Be Alive</a>. I'm not the biggest fan of the title, but I am happy the subject in it is being talked about more! It laid out the story of how so-called 'scRNA-seq Foundation Models' may potentially change how single-cell RNA sequencing (scRNA) data is interpreted, used, and applied. Though the Times article was fantastic in its own right, it asks very surface-level questions about the whole process. I'd like to do a much deeper dive in this topic and try to walk through the motivation, ideas, and process of creating models, along with what they do well on and what they still struggle with.</p><p>Warning in advance: single-cell biology is complex to the extreme and pretty much every facet of it is hotly debated. This means this post is partially an interpretation, meant to offer some wider background context on the field, and may strongly deviate from others perspectives when it comes to specifics. </p><h2>scRNA-seq</h2><h3>What is scRNA-seq?</h3><p>Let's come up with a goal for ourselves: we'd like to learn the cellular identity of every cell in the human body. Why? Well, for knowledge sakes to start off with, we'll come up with a concrete use-case later.</p><p>First question: what exactly is cellular identity?</p><p>Maybe cell types? Let's consider an organ, like the brain. You may recall there are different cell types in the brain, neurons, glial cells, oligodendrocytes, and so on. But what are the functional differences between these cells, upon what means have scientists chosen their 'type'? One way that we often rely on is the proteins they have; the proteins that live within the cells, the proteins that they put on their surface, the proteins that they secrete. Cells that share many of the same proteins could serve as a notion of cellular identity. Indeed, proteins <strong>are </strong>involved in an awful lot of processes: how cells communicate with cells around them, how cells intake in resources, how cells respond to environmental stimuli, and many more. One could make the argument that proteins aren't actually the true differentiator, potentially cell metabolites or epigenetic modifications are more important. Well, we could discuss that in a future post. But let's move on with the axiom 'cell state could be best understood as the proteins they have'.</p><p>An example is syncytiotrophoblasts, which are cells found in placental tissue and are involved in nutrient transfer from maternal blood to fetal capillaries. As such, you may naturally expect that they express proteins specialized in nutrient transfer. And, indeed, they express TFRC (Transferrin receptor) at an extremely high rate compared to almost all other cells in the human body, TFRC is mainly involved in intaking in iron, and iron is an extremely important building block for crafting a living thing from scratch.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QAAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QAAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 424w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 848w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 1272w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QAAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!QAAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 424w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 848w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 1272w, https://substackcdn.com/image/fetch/$s_!QAAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c424fd1-0dff-4f62-bffa-d8ed519bbde0_1024x877.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">TFRC in the wild</figcaption></figure></div><p>I've given quite a neat example, but is it always this way? Unfortunately, not every protein has such a clear-cut role, many of them are so-called 'housekeeping proteins' (or, more specifically, housekeeping genes) that are essential for the ongoing maintenance of <strong>any</strong> cell, no matter what they do. The canonical example are proteins involved in cell division, such as <a href="https://en.wikipedia.org/wiki/Cyclin_H">Cyclin-H</a>, without which your cells would be entirely incapable of reproducing. But still, a significant fraction of all proteins in the human body do have extremely unique functionality well-suited to the cells that produce them, enough to where we could potentially arrive to a notion of cell identity purely from observing where certain proteins do and do not appear.</p><p>And, with that, we run into our first challenge: how exactly do we assess protein content of any given cell? There's a few things we could try. Feel free to skip this section if it gets a bit much! Spoiler alert: none of them will work for us anyway.</p><ol><li><p><strong>Immunocytochemistry.</strong> The basic idea is that we create a special protein (an antibody) that's designed to stick to the protein we're interested in, potentially the TFRC from before. We also attach a little chemical tag (fluorophore) to the antibody that lights up when a specific wavelength of light is shined on it. Now, we simply take our cells and run the antibody over them. Using the TFRC example, the antibodies will diffuse through the cell, hunting for TFRC proteins. When an antibody finds a TFRC molecule, it'll tightly bind to it. The final step is to look at the cells under a fluorescence microscope. This type of microscope shines light of a specific wavelength onto the cells, exciting the fluorophores. The fluorophores emit light in response, revealing the location of the TFRC proteins. If TFRC is primarily located on the cell surface, for example, we would see a bright outline around each cell.</p></li><li><p><strong>Flow Cytometry: </strong>This technique is similar to immunocytochemistry in that it uses antibodies to detect specific proteins. However, instead of looking at the cells under a microscope, flow cytometry uses a specialized instrument called a flow cytometer to analyze the cells one by one. We start by creating a single-cell suspension, meaning that the cells are no longer stuck together in a tissue but are floating freely in a liquid. We then incubate these cells with antibodies that are specific to the protein we're interested in (like TFRC). These antibodies, as before, are typically connected to a fluorophore and will bind to our suspended cells. The cell suspension is then run through the flow cytometer. Inside this machine, the cells are forced to pass through a narrow channel one at a time. As each cell passes through, a laser beam is shone onto it. If the cell has been labeled with a fluorescent antibody, the laser will cause the dye to emit light. The flow cytometer detects this light and can quantify how much of the protein is present on each cell.</p></li><li><p><strong>Mass spectrometry:</strong> Here, we essentially take a cell, break it open, and then analyze all the proteins within it to get a picture of what proteins were there to begin with. We first chop the cellular proteins up into smaller pieces using enzymes. For example, we may use 'trypsin', which cuts proteins whenever it encounters the amino acids lysine or arginine. The resulting pieces, which we call peptides, are typically around 10-20 amino acids long. We'll then load the mixture of peptides into a liquid chromatography (LC) system, which separates the peptides based on their chemical properties. As the peptides exit the LC system, they are sprayed into the mass spectrometer using a process called electrospray ionization (ESI). This process converts the peptides into charged particles (ions).Once inside the mass spectrometer, the peptide ions are further separated based on their mass-to-charge ratio (m/z) in a component called the mass analyzer. There are different types of mass analyzers, but they all use electric and/or magnetic fields to manipulate the path of the ions, Because all the ions are given the same kinetic energy by the electric field, their velocity (and thus the time it takes them to reach the detector) depends on their mass. Lighter ions will reach the detector faster than heavier ones. As the ions reach the detector, their m/z and abundance are recorded. This data is then processed to identify which peptides were present and, by extension, which proteins those peptides originated from. By comparing the identified peptides to databases of known protein sequences, we can determine which proteins were present in the original sample and even quantify their relative abundance.</p></li></ol><p>I've already given it away; despite the power of these methods, none of will adequately work for us in our lofty goal to learn about the protein landscape of every cell in the human body.</p><p>Immunocytochemistry is a time-consuming process, requires known antibodies that bind to known proteins (which not every protein has!), and extremely low throughput (so measuring even a thousand cells is an undertaking). Flow cytometry, while higher throughput, is similarly dependent on known antibodies, and does not work for tissues that cannot be easily suspended, such as brain tissue or adipose tissue (both due to their inherently fragility). Finally, mass spectrometry, while not relying on known antibodies, requires an enormous number of cells to capture an accurate readout of protein content (which means we lose out on understanding single cells), may miss out of rarer proteins, and is by far the most time-consuming method in this list. Of course, I'm oversimplifying, all of these methods likely have some alternative protocol which somewhat solves these problems, but even those remain insufficient.</p><p>So, now what? Unfortunately, there isn't really a backup, learning proteomic landscapes of cells in a way that is scalable to the millions, allows us to discover rare proteins, and has a single-cell-resolution is, thus far, an unsolved problem. Let's think backwards; if we cannot have a perfect view of proteins in a cell, perhaps we can settle for measuring the things that cause the proteins to exist at all?</p><p>Of course, as the title of this post may suggest, we're talking about RNA, specifically messenger RNA (mRNA). mRNA is created (more specifically, 'transcribed') directly from a cells DNA and is transported to the ribosome, where each snippet of mRNA is interpreted (3 nucleic bases as a time) to create a single amino acid, which is chained together in order. For example, if an mRNA segment is composed of UUU-GUA-CCA, that is mapped to a protein of amino acids Phe-Val-Pro.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G8Ch!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G8Ch!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 424w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 848w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 1272w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G8Ch!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!G8Ch!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 424w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 848w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 1272w, https://substackcdn.com/image/fetch/$s_!G8Ch!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ed931b-4b64-47a5-9ab4-049b98cfbe0a_671x483.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>If we measure the sum amount of all mRNA that is in this intermediary zone between transcribed and translated at a given point in time, or its 'transcriptional landscape', this indirectly also gives us the proteomic landscape, correct? Well, not exactly.</p><p>Not all mRNAs are translated into proteins at the same rate or efficiency. Some mRNAs are highly stable and can persist in the cell for hours or even days, while others are rapidly degraded. Which means that mRNAs are translated multiple times by ribosomes, yielding many copies of the corresponding protein, others may only be translated once or not at all. This means that the relative abundance of mRNAs in a cell doesn't always directly correspond to the relative abundance of their encoded proteins. Similarly, protein half lives dramatically vary; proteins can persist for days or even weeks. This means that the protein composition of a cell can reflect its past transcriptional states, not just its current one. A protein that was abundant in a cell yesterday could still be present today, even if its associated mRNA has been degraded. Conversely, a newly transcribed mRNA may not yet have been translated into a detectable amount of protein. Finally, mRNA do not map to proteins perfectly, plenty of proteins undergo 'post-translational modification', which means that the protein you end up with from a given mRNA in a given environment may have a completely different structure than the same mRNA in a different environment.</p><p>Yet, for all the nuanced problems that the relationship of mRNA and protein expression has, they <strong>do</strong> have a connection. A 2016 review paper from Cell titled '<a href="https://www.cell.com/cell/pdf/S0092-8674(16)30270-7.pdf">On the Dependency of Cellular Protein Levels on mRNA Abundance</a>' has this to say about it:</p><blockquote><p><strong>At Steady State, mRNA Levels Primarily Explain Protein Levels</strong><br><em>It is challenging to rigorously define the term &#8216;&#8216;steady state&#8217;&#8217; for cells, especially if they are undergoing long-term dynamic processes such as continuous proliferation (Hsieh et al., 2012), differentiation (Kristensen et al., 2013), or other types of fate decisions (Lu et al., 2009; Gru&#168; n et al., 2014). However, large ensembles of cells as they are typically used for &#8216;&#8216;omics&#8217;&#8217; experiments can be regarded being at steady state if the average protein and/or mRNA levels remain relatively stable over time (normally above several hours). Numerous studies published during the last 15 years suggest that, under such circumstances, gene-to-gene variation of protein levels is primarily determined by their respective mRNA levels.</em></p></blockquote><p>A fair bit of the evidence comes from a relatively <a href="https://www.nature.com/articles/nature10098">well-known 2011 paper from Schwanh&#228;usser et al.</a> that studies fibroblast mouse cells, concluding that the variance of mRNA explains about 40% of protein expression levels. From the same review paper:</p><blockquote><p><em>[Schwanh&#228;usser] investigated the relationship between mRNA and protein levels in unperturbed mammalian cells using RNA-seq, pSILAC, and absolute quantification strategies. His study determined that about 40% of the variance of protein levels between different proteins could be explained by mRNA levels (coefficient of determination, i.e., squared Pearson correlation coefficient between mRNA and protein abundances, R2 = 0.41). A follow-up study re-analyzing the same dataset with a different statistical model concluded that about 56%&#8211;84% of the protein variance could be explained by mRNA variance...</em></p></blockquote><p>Another complicating factor is that <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2413246/">one study finds that it's very tissue dependent, with the explained variance range being 46-68%.</a> But, overall, we're in a good spot. Given that we cannot easily access protein expression en-masse, gathering together mRNA transcripts is seeming like it could be a decent approximation of cell identity. But...can we do that?</p><p>We can! And finally, we can start talking about the title of this subsection: scRNA-seq, or single-cell transcriptomic sequencing. This method solves basically every problem that the protein expression ones had: it is massively scalable, capable of capturing single-cell-level resolution, and can find rare transcripts (and thus proteins).</p><p>Finally, we'll end this section with a simple question: how does it work? scRNA-seq is, in my opinion, the most conceptually complicated of the methods we've discussed so far. Again, the explanation of this method isn't actually important, but is included here for completeness, it can be skipped. Here's the general workflow, and <a href="https://www.youtube.com/watch?v=6UVOdCc1Q7I">also an excellent youtube video that goes over the same stuff</a>:</p><ol><li><p>Cell Isolation: The first step is to obtain a single-cell suspension from the tissue or sample of interest. This often involves enzymatic digestion or mechanical dissociation to break down the extracellular matrix and separate the cells from each other. The goal is to have a suspension where each cell is floating freely.</p></li><li><p>Cell Capture: Next, individual cells need to be isolated into separate reaction chambers for further processing. There are several methods for this, including microfluidic devices, microdroplet-based methods, or microwells. Each of these methods aims to seperate single cells from one-another.</p></li><li><p>Cell Lysis and RNA Capture: Once individual cells are isolated, they are lysed (their cell membranes are disrupted) to release their RNA content. The RNA is then captured, typically using oligo-dT primers that bind to the poly-A tails of mRNA molecules. In other words, the end-piece of every cell's RNA is labeled with a unique molecular identifier (UMI) and a cell-specific barcode. The UMI allows for the correction of amplification bias during the next step, while the cell barcode allows the RNA to be traced back to its cell of origin. <strong>This allows us to have single-cell level resolution!</strong></p></li><li><p>Reverse Transcription and Amplification: The captured RNA is then reverse transcribed into cDNA, which is more stable and can be amplified. The cDNA is amplified using PCR, generating many copies of each cDNA molecule. <strong>This is what allows rare transcripts to be detected! We're no longer limited by the base level of mRNA in a cell, it can be scaled up!</strong></p></li><li><p>Library Preparation and Sequencing: The amplified cDNA is then sequenced as normal by Next-Generation-Sequencing (NGS) platforms that work with DNA. <strong>This is what allows massive scalability, NGS platforms are able to operate on a scale of millions of genetic fragments of once.</strong> There is a step afterword for correcting errors in this process, such as removing mitochondrial DNA that we typically don't care about, but we can skip that as it's not super relevant here.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gbjv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gbjv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gbjv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Gbjv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 424w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 848w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 1272w, https://substackcdn.com/image/fetch/$s_!Gbjv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9269b152-8909-4d5f-baca-0b1f7483e3e8_1024x768.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>And what does the end result of scRNA-seq look like? Simple, and one that's familiar to anybody who has worked with dataframes before: a 2 dimensional count matrix, where each column represents a cell, each rowrepresents a gene, and each entry represents the expression level of a particular gene in a particular cell in terms of 'number of UMIs', or transcripts, found for this cell and for this gene.&nbsp;So, for a single-cell analysis of 50 human cells (and there are around 20,000 genes in the human body), we'd have a 50x20000 matrix.</p><h3>What are Cell Atlases?</h3><p>So, we now know what scRNA-seq is. What have people done with it? Lots of things, such as analyzing transcriptional changes that occur as we age, understanding cell differentiation in embryos, and even understanding how tumor cells could be better attacked.</p><p>But, amongst the most ambitious tasks that scRNA-seq is supporting is in the creation of 'Cell Atlases'. These are terabyte-sized databases, ran by international consortia, with hundreds of institutions and thousands of scientists participating. Contained with them are the transcriptional landscapes of hundreds of thousands of individual cells, selected from an extraordinarily diverse set of tissues (tongue, cerebellum, ovaries, and dozens more), cataloged for public usage. As of this writing, there are comprehensive cell atlases for <a href="https://tabula-sapiens-portal.ds.czbiohub.org/">humans</a>, <a href="https://tabula-muris.ds.czbiohub.org/">mice</a>, <a href="https://www.science.org/doi/10.1126/science.aam8940">nematodes</a>, <a href="https://www.nature.com/articles/s41586-022-04587-3">primates</a>, <a href="https://www.frontiersin.org/articles/10.3389/fcell.2021.743421/full">zebrafish</a>, <a href="https://www.science.org/doi/10.1126/science.abk2432">fruit flys</a>, and many more; many of them overlapping in scope, and some of them even mapping fetal or diseased versions of the associated species. Instead of one-off scRNA-seq experiments created for the purposes of single experiments, this was a more concentrated effort, multiple parties divvying up a 'transcriptome map' amongst each other, exploring each one, and combining it all back together.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WB2d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WB2d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WB2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!WB2d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WB2d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95358b19-a6c5-4c7b-bfc0-3773942076b4_763x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Coincidentally, the human cell atlases exactly solve the goal we stated at the start of this blogpost: learning the cellular identity of every cell in the human body. Well, we haven't got at &#8216;every&#8217; cell, but we&#8217;ve definitely completed the 80 part of the 80/20 rule and then some. But, while we've pursued our goal purely for knowledge sake, these cell atlases have much loftier visions. The core idea behind them and why they were publicly released is in hopes that scientists around the globe could use them to find better protein targets for tissues of interest, understand how diseased cell states different from healthy ones, predicting medication response, and uncover new proteins/genes/cell types entirely. These atlases are, in many ways, this generations Human Genome Project; an extraordinarily expensive and challenging biological data collection mission, with grand dreams for how it could be used to advance human health in the future.</p><h3>The problem with scRNA-seq</h3><p>Have the expectations for these atlases paid off? Lets ask an even more basic question, has scRNA-seq paid off? It's...hard to tell.</p><p>On one hand, the <a href="https://scholar.google.com/scholar?as_ylo=2020&amp;hl=en&amp;as_sdt=5,33&amp;sciodt=0,33&amp;cites=13609435083214941495&amp;scipsc=">citations of Tabula Sapiens</a>, the largest known human cell atlas, certainly does champion the utility of the atlas in helping further biology research. Multiple papers do reference it to confirm suspicions they had or as a secondary source of truth for their own scRNA experiments. And, to be fair, it was only released in 2022, and good biology research does take time.</p><p>On the other hand...has it been instrumental to anything? Has there been anything useful we've found purely because of these atlases? I'm absolutely positive there are people who'd define the results we've gotten from scRNA studies as useful, but I'm personally skeptical. <br><br><a href="http://Although these atlases are publicly available, integrating such vast and complex datasets into existing research workflows can be challenging. Not all researchers have the computational resources, expertise, or time to effectively utilize these atlases in their work. This barrier might slow down the adoption and application of these resources in various research settings.">A mildly interesting reddit thread about the whole topic can be found here</a>, where the OP says:</p><blockquote><p>Every day, there is another bloody paper on single cell this, single cell that. All big papers, all big results, all big data, and all came up with the same conclusion: there's a lot of heterogeneity between cells, that cells are different, that at different locations, cells express different genes, that at different stages, cells express different genes, that a clump of extracted tissue contain different cell types and not a homogenous group "like we thought". Like no shit!! And then they dump all of those data in some repository and move on sequencing something else.</p></blockquote><p>While their frustration is more dedicated to scRNA-seq, I'd imagine it applies even more-so to cell atlases; an immense amounts of data, but little follow-up on what exactly any of it means. Others in the thread point out this is somewhat uncharitable, this is simply how science works, but still admit a kernel of truth in the OP's frustration. Generating vast amounts of single-cell RNA data is quickly becoming easy, interrogating it in any capacity is becoming hard. But why is it hard? There are really only two reasons.</p><ol><li><p><strong>Batch integration</strong>. Combining datasets from different scRNA-seq datasets is often challenging due to batch effects; as in, each scRNA-seq dataset is subject to its own unique strangeness due to minor differences in how labs go about things, which can create dramatic differences in the numbers you end up with. There are known methods to help solve this, but going this path is much more an art than a science. For example, let's say we collect a dataset of neuron cells from patients with a rare neurological disease, and would like to know the transcriptional changes these neurons have compared to healthy neurons. <strong>But the numerical 'space' in which the healthy transcriptomes live will, always, unfortunately be different from those of the diseased neurons</strong>. There are ways around this, there's a bevy of batch integration models, but doing that is much more an art than a science. While cell atlases were actually mildly supposed to solve this problem by offering an all-in-one dataset, all subject to the same set of batch effects, realistically the problem still pops up, as it takes months to produce this data + multiple institutions, each with their own processes.</p></li><li><p><strong>Cell annotation</strong>. Typical cell atlases annotate cells based on whether they express certain 'marker genes' (e.g, neuron marker genes), and, for the cells that lack any markers/have multiple markers, they are annotated according to their transcriptomic 'proximity' to these cells with more cleanly defined types. This is related to the data integration problem! Even if you can remove batch efforts from your dataset in order to compare it to the atlas,<strong> the basis upon which you compare cells is often based on their cell type</strong>! And cell type annotation may sometimes fail for even well-known cell types, and be even more unreliable for rare or entirely unknown cell types.</p></li></ol><p>If you completely solve both of these, scRNA-seq becomes far more useful. Keep in mind, the first order impact of solving these two issues is just speeding up typical analyses, instead of relying on a custom-tuned batch effect corrector algorithm, you simply have a method that 'corrects' <strong>everything </strong>and you can go on with your analyses as normal. We may even discover some new rare cell types that would go typically unnoticed in usual algorithms.</p><p>This is all well and good, but it feels marginal, a mild improvement to a field that hasn't had much impact on human health. But the second order impact is much more exciting: the potential for a computational model of perturbation on cell state. If we have a model that has such a generalized understanding of transcriptomes that it is capable of translating them all to the same numerical space, we can naturally assume it understands <strong>much</strong> more about a cell than just how to do this simple mapping. It understands how excessive sugar changes cells using what its seen from T2 diabetes patients. It understands how mechanical stress changes cells using what its seen from hypertrophic cardiomyopathy patients. <strong>In short, it has learned a sense of how perturbations affect cell state</strong>. Such a model could move beyond the datasets it was trained on. It could accurately predict how certain drugs will alter cell states before its ever left a lab, be able to understand the possible evolutionary trajectories of a tumor, or be able to hallucinate whole transcriptomes for rare disease patients that we obviously cannot biopsy in full.</p><p>While the cell atlases they themselves haven't been particularly useful, they may end up being useful in a completely unexpected way: as fuel for an engine.</p><h2>scRNA Foundation Models</h2><h3>History</h3><p>Foundation models are, in a general sense, <a href="https://arxiv.org/abs/2108.07258">'any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks'</a>. They've become most popular in the space of language, but have extended towards different modalities beyond those, including pixels (DALL-E), audio waveforms (MusicGen), and even amino-acid residues of proteins (ESM2). And soon, they began to be applied towards single-cell transcriptomes.</p><p>We should first motivate why they were created at all. As mentioned before, the utility of scRNA is massively dragged down by being unable to easily integrate scRNA datasets together <strong>and </strong>the analyses behind these datasets often relying on fuzzy labels (cell annotations). scRNA foundation models potentially solve both of those in one fell swoop by attempting to map gene count vectors to the <strong>same embedding space in a zero shot manner</strong>. If we can pull this off, batch effects (mostly) cease to be a problem and we can entirely move beyond cell types annotations as the basis of cell-to-cell comparisons -- if the embeddings represent individual cells well enough, those can be relied upon instead!</p><p>It's a simple idea and, as with almost every other concept, it has of course been extensively tried before, with the concept reaching back to the early 2010's; foundation models certainly did not invent the concept of embedding. But it was often limited in its scope, relying on simple linear embedding schemes, being specific to the exact perturbation it was trained on, or simply being trained on a relatively small dataset. A very nice blogpost from <a href="https://markovbio.github.io/scaling-laws-of-agency/">markov.bio about scRNA models discusses these shortcomings more in depth</a>. There were a few papers that poked at non-linear data transformations trained on diverse datasets, <a href="https://www.nature.com/articles/s41592-018-0229-2">such as scVI</a>, but the results were still in a hazy territory of being interesting as a paper, but still on shacky ground for being truly useful.</p><p>But with the <em>Attention Is All You Need</em> revolution and the increasingly fervorous belief that huge models and huge amounts of data could lead to extraordinary models, biologists started to set up their own single-cell foundation models. As of March 2024, there are quite a few: <a href="https://www.biorxiv.org/content/10.1101/2022.11.20.517285v1">scFormer</a>, <a href="https://www.biorxiv.org/content/10.1101/2023.05.29.542705v1">scFoundation</a>, <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10949956/">GeneFormer</a>, <a href="https://www.nature.com/articles/s42256-022-00534-z">scBERT</a>, <a href="https://www.nature.com/articles/s41592-024-02201-0">scGPT</a>,, <a href="https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1">Universal Cell Embeddings</a> (UCE), and several others. <strong>Remember GeneFormer, scBERT, scGPT, and UCE, they'll come up later. </strong>And the fuel behind how they could be created at all is something we've talked about: cell atlases. Each one of these models rely on multiple of such atlases as training data, using on the immense diversity of cells that have been catalogued and studied for the last decade in order to, hopefully, gleam a finer understanding of cell state than any other model ever could.</p><p>How are they? Have the effort we've put into creating these immense cell atlases paid off?</p><h3>The Problem With scRNA Foundation Models</h3><p>Of course, everything in biology that people have poured blood, sweat, and tears into have some sort of problem. There are two preprints assessing the current state of scRNA foundation models, both of them funnily enough released within 5 days of each other in October 2023.</p><p>The first one is <a href="https://www.biorxiv.org/content/10.1101/2023.10.16.561085v1">Assessing the limits of zero-shot foundation models<br>in single-cell biology.</a> They specifically focus on studying scBERT and GeneFormer claims of their 'zero-shot' performance. Zero shot performance here is truly what we care about, if we wanted to dip our toes into having to fine-tune a model for every one of our scRNA problems, the primary point of using a foundation model in the first place is lost! And, unfortunately, they find that while both scBERT and GeneFormer perform decent in zero-shot settings, they are still no better than baseline methods for embedding separation<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, batch integration and cell annotation<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> tasks. The footnotes share some concerns I have with the metrics, but in all of these, the performance of the foundation models were reasonably similar to the aforementioned scVI. Much more concerningly, simple Highly Variable Genes (HVG's), which are calculated in an entirely non-parametric way, were reasonably competitive with the computationally expensive transformers (Fig 2 and 5) Does this imply that the heavy pretraining that goes into these foundation models are largely useless?</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VWEo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VWEo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 424w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 848w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 1272w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VWEo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!VWEo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 424w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 848w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 1272w, https://substackcdn.com/image/fetch/$s_!VWEo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f961648-fcb9-4460-800f-ca9993a8a722_994x421.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QNso!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QNso!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 424w, https://substackcdn.com/image/fetch/$s_!QNso!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 848w, https://substackcdn.com/image/fetch/$s_!QNso!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 1272w, https://substackcdn.com/image/fetch/$s_!QNso!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QNso!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/320dfddb-2a07-474c-a39c-364446e3af10_298x411.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!QNso!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 424w, https://substackcdn.com/image/fetch/$s_!QNso!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 848w, https://substackcdn.com/image/fetch/$s_!QNso!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 1272w, https://substackcdn.com/image/fetch/$s_!QNso!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F320dfddb-2a07-474c-a39c-364446e3af10_298x411.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In one of the most unexpected results I've seen in a paper, the answer is yes, at least for scGPT. The proof for this is a little involved: it compares the MSE loss for scGPT versus the MSE loss of predicting the mean expression value for each gene for the masked-language-modeling loss it set up. Astonishingly, the MSE between the two are largely the same across all pre-trained models of scGPT (scGPT random is not pre-trained, so MSE is obviously quite high). The conclusion here is that pre-training is imparting no more useful information than simply being able to parrot the average transcript expression.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TP1H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TP1H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 424w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 848w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 1272w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TP1H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TP1H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 424w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 848w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 1272w, https://substackcdn.com/image/fetch/$s_!TP1H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6734354-ff61-4639-9cb0-38b5d0694d07_987x391.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>So, things are not looking great. Five days later, another paper titled <a href="https://www.biorxiv.org/content/10.1101/2023.10.19.563100v1">A Deep Dive into Single-Cell RNA Sequencing Foundation Models</a> was released. Here, it looked at two models again: the same scGPT as before, but also GeneFormer. The results from the last paper here are largely recapitulated, but in different flavors. In one task, cell annotation results between zero-shot scGPT mappings and a trained logistic regression model had mixed results:</p><blockquote><p>The authors of scGPT evaluated the model&#8217;s cell type annotation capabilities on three different datasets: multiple sclerosis [19], pancreas [20], and myeloid data [21...We found that for the multiple sclerosis data, scGPT outperformed logistic regression; for the myeloid data, logistic regression outperformed scGPT; and for the pancreas data, the two methods performed similarly (Figure 3; Supplementary Table 4).</p></blockquote><p>And, even in few-shot regimes, where scBERT and scGPT were allowed to see a fraction of the data, logistic regression still was competitive or outright outperformed it.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tvSK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tvSK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 424w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 848w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 1272w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tvSK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!tvSK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 424w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 848w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 1272w, https://substackcdn.com/image/fetch/$s_!tvSK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5510f16-4bba-42ea-93de-f24dd383dda0_1022x512.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QdXT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QdXT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 424w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 848w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 1272w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QdXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!QdXT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 424w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 848w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 1272w, https://substackcdn.com/image/fetch/$s_!QdXT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2fbd525a-0195-4587-acb7-0b18c7577ad1_1024x782.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Finally, they also perform a few experiments suggesting that pre-training in scBERT and scGPT is of questionable importance for cell annotation tasks, with it being largely unimportant in scBERT, and of variable importance in scBERT. This is massively distinct from NLP, where strong pre-training performance correlates quite well with downstream task performance. However, unlike the prior paper, they offer some hope that the value of scRNA foundation models is not in solving the typical challenges posed by the scRNA community (such as cell annotation), but in more complex ones. Here's a section from their discussion section which I quite liked:</p><blockquote><p>These results underscore that cell type annotation is not a challenging enough task on which to demonstrate the potential value of single-cell foundation models. The value of foundation models for single-cell RNA sequencing data will instead be showcased through superior performance at more challenging fine-tuning tasks which simple models currently cannot solve, or else through the models&#8217; abilities to capture and elucidate meaningful gene-gene interactions through their learned attention weights. When claiming the latter, it is important to keep in mind that strong downstream performance does not necessarily imply rich representation learning...</p></blockquote><p>I mentioned earlier to remember four model names, but I've only mentioned three so far, scGPT, scBERT, and GeneFormer. What about the fourth?</p><h3>Universal Cell Embeddings</h3><p>I'm making a specific section for this paper because I think it'll become a really, really important paper for people to build off on.</p><p>Universal Cell Embeddings, or UCE, is the last-to-be-released of all these methods, released as a pre-print in November 2023, whereas the rest of these methods were originally released in mid-2023/2022. There are also a few things unique about this model worth discussing:</p><ol><li><p><strong>It is cross-species. </strong>Whereas all of the other discussed models are human-specific, this one includes datasets from 8 species in total in its training data: human, mouse, lemur, zebrafish, pig, rhesus macaque, crab eating macaque and western clawed frog.</p></li><li><p><strong>It makes clear that it is not meant to ever be fine-tuned.</strong> It goes so far as to say<em> 'UCE is able to map any cell, from any 103 tissue[s], or any species, into one shared universal space, with no additional training'</em>. While no foundation model we've discussed thus far has explicitly relied on fine-tuning, this is the only one that proudly eschews the need for it.</p></li><li><p><strong>It was sponsored by the 800-pound gorilla of cell atlases dataset creators: Chan Zuckerberg BioHub</strong>, of <a href="https://tabula-sapiens-portal.ds.czbiohub.org/">Tabula Sapiens</a>, <a href="https://tabula-muris.ds.czbiohub.org/">Tabula Muris</a>, <a href="https://github.com/czbiohub-sf/tabula-muris-senis">Tabula Muris Senis</a>, <a href="https://tabula-microcebus.ds.czbiohub.org/">Tabula Microcebus</a>, and <a href="https://cellxgene.cziscience.com/">CellXGene</a> fame. If anybody is good at creating scRNA data and open sourcing it, it's them.</p></li></ol><p>How's UCE's performance? Impressive!</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uin0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uin0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 424w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 848w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 1272w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uin0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Uin0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 424w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 848w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 1272w, https://substackcdn.com/image/fetch/$s_!Uin0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e6b18f6-5442-4f1a-a5d6-dcf751effd3c_942x342.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>It outperforms all the other methods to some degree in typical scRNA analyses tasks, including ones that require fine-tuning:</p><blockquote><p>We compared several methods and found that UCE substantially outperforms the next best method Geneformer by 9.0% on overall score, 10.6% on biological conservation score, and 7.4% on batch correction score (Supplementary Table 1). To comprehensively assess the value of these zero-shot embeddings, we also compare UCE to fine-tuned methods that are conventionally used for this task. Notably, UCE even performs slightly better than non-zero shot methods that require dataset-specific training: scVI and scArches.</p></blockquote><p>But the main interesting part about UCE is how it leverages the unique zero-shot generalization capabilities of foundation models to work on things that other models massively struggle with, not focus on tasks that most models are already quite good at. And it presents three:</p><p><strong>The first task is UMAP'ing embeddings of cells collected from an entirely new species</strong> <strong>unseen by UCE! </strong>Due to the UCE being cross-species, this is actually possible! Of course, embedding plots are always going to be a bit controversial, but it's undeniable that there's something going on here; they show reasonable-looking separation of cell types for a primate (suspect because there are other primates in the training dataset) and a chicken (less suspect because there are no other birds in the training dataset). <a href="https://www.nature.com/articles/s41592-024-02191-z">They offer further proof of the validity of their blind cross species annotations here</a>, by the same lead author, released in February 2024.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Phr1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Phr1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 424w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 848w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 1272w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Phr1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a570b5d4-531a-4288-96b1-db305a49686f_499x354.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Phr1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 424w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 848w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 1272w, https://substackcdn.com/image/fetch/$s_!Phr1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa570b5d4-531a-4288-96b1-db305a49686f_499x354.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>The second task is attempting to annotate rare cell types</strong>. They set up an example workflow where they take an scRNA dataset of a brand new discovered cell type (called Norn cells), do a zero-shot embedding of it through UCE, and train a simple binary classifier to tell between the Norn cell embeddings and not-Norn cells, using the original Norn cell papers dataset for the latter dataset. Then, we simply apply this classifier to our dataset of 36M historically collected cells included in UCE, which is simple because each of the 36M are also made up of UCE embeddings. And viola, we can find Norn cells scattered throughout our old dataset.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Xym!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Xym!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 424w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 848w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 1272w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Xym!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2Xym!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 424w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 848w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 1272w, https://substackcdn.com/image/fetch/$s_!2Xym!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F980cd64d-243e-4256-a2a5-e1d7b6eaef25_544x322.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>The third and final interesting task is interrogating gene expression differences in disease populations (COPD and IPF) versus healthy ones.</strong> What's fascinating here is that this analysis was done using the Norn-annotated cells from earlier, which are unconfirmed, but show the expected differential gene expressions between diseases!</p><blockquote><p>(d) Cells predicted to be Norn cells within a lung disease dataset express known Norn markers, as demonstrated by log fold change (LFC). Differential gene expression in predicted Norn cells, grouped by disease status. There are significant differences in gene expression of important Norn markers and genes involved in the production of erythropoietin (Epo) between cells from IPF, COPD and control patients. Patients with IPF and COPD are known to have elevated levels of blood stream Epo, with COPD patients having greater bloodstream Epo levels than patients with IPF.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k2EI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k2EI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 424w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 848w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 1272w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k2EI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:null,&quot;width&quot;:null,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!k2EI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 424w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 848w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 1272w, https://substackcdn.com/image/fetch/$s_!k2EI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7036bee9-f1f2-4ac9-af99-0459b93dd1af_283x505.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Can all of these tasks be done without UCE? Of course, and there are plenty of papers that can poke at each of these tasks. But at this ease, scale, and all-in-one? At that point, we're getting into territory that the UCE is flexing the strengths that foundation models are uniquely suited to. Will UCE truly be the winner here? The Alphafold-esque model that reigns strong even after 3 years? I very much doubt it, it may well be the case that scBERT, scGPT, or one of the other models developed in the past are actually the best architecture choice, but UCE is by far the most ambitious model here and should be given some kudos for thinking beyond typical scRNA workflows.</p><h2>What does the future look like?</h2><p>Next-generation-sequencing made it possible for the scale of scRNA-seq data to explode. But, as I've mentioned earlier, measuring mRNA transcript levels is a few steps removed from the puzzle of proteins, and even proteins are removed from the more ambiguous question of cell state. As we get better and better at creating these ultra-large datasets, it's easy to imagine a world in which these foundation models no longer operate on RNA alone, but on every proxy of cell state simultaneously: metabolomics, glycomics, proteomics, epigenomics, and, likelier earlier than later, spatial transcriptomics. These sorts of models are popping up all over the place, like this <a href="https://www.biorxiv.org/content/10.1101/2023.09.24.559168v1">recent paper on a foundation model that uses chromatin accessibility data to learn transcription regions gathered by ATAC-seq </a>and even <a href="https://www.science.org/doi/10.1126/science.ade2574">UCE relies on the famous proteomics ESM2 paper for some of its pre-training</a>, and its only a matter of time before these are all slowly merged together to take advantage of every possible token.</p><p>The future will be very interesting!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>This is a strange metric in my opinion. The paper says, &#8216;<em>One key aspect of evaluating cell embeddings is the degree to which cell types are distinct within the embedding space.</em>&#8217; This implies that you can actually trust cell types, or that some notion of fuzziness in cell types doesn&#8217;t exist! Embedding utility is of course important, but why not use perturbation datasets for this?</p><p></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The way they measured this is extremely qualitative and&#8230;hard to know if it means anything. It&#8217;s literally just eyeballing: <em>&#8216;As commonly done in single-cell transcriptomics, we used UMAP projections to visually inspect embeddings (Fig. 4). By annotating the UMAP by cell type (Fig. 4A) versus experimental technique (Fig. 4B), we jointly assess if cell embeddings correct for batch effects stemming from techniques while still retaining cell type identity&#8230;..Overall, we observed that while Geneformer and scGPT-human can integrate different experiments conducted with the same experimental technique, they generally fail to correct for batch effects between techniques. As depicted in Fig. 4A, the cell embedding space generated by Geneformer fails to retain information about cell type, and any clustering is primarily driven by batch effects (Fig. 4B). On the other hand, the space created by scGPT offers some separation of cell types (Fig. 4A), but the primary structure in the dimensionality reduction is driven by batch effects (Fig. 4B). In contrast, even the simple baseline of selecting highly variable genes (HVG) qualitatively produces a similar or better result to scGPT, with the Smarter technique now being integrated with InDrop. Finally, we observed that scVI mostly integrates this dataset, forming clusters primarily due to cell type, with most techniques in the same cluster.&#8217;</em> </p></div></div>]]></content:encoded></item></channel></rss>