It may seem like a long shot now but the way I see lab automation driving value is by screening at a scale that allows models to understand bio better. Data generated at scale from automated experiments can be used to build better models that function as 'predictive assays' themselves, allowing us to make better in-silico predictions about which drugs will actually work in practice. If both generative drug design/discovery and lab automation succeed, then maybe one day we’ll have precision medicine tailored to individual patients at scale.
Usually I glance through the conclusion section of articles because it's just a repetition of what was said. But there were so many gems in this one's 💎. Correlation of assay to clinical result is king.
Another viewpoint is that automation for increasing throughput is a game of getting to the highest datapoint/$ (at a certain quality). We can either 1)use robots to pipette faster and for longer, or 2) multiplex the assays themselves to have way more datapoints per well. They are complimentary but I feel that 2) is getting less attention these days.
Thanks for your research, Abhi. A exciting project that belongs in your first camp is PyLabRobot (PLR), https://github.com/PyLabRobot.
PLR makes an open, universal interface for all robots. It also incorporates equipment like plate readers, arms and thermal cyclers.
I remember coding a routine on an OT-2 for qPCr. It took me about 4 days plus 1-2 days for validation. Recently, I coded a similar routine on a Hamilton Starlet in about 2 minutes.
Great article... I wonder how these principles apply in a GMP/scale-up world. There is a bigger need for automation for some workflows in mass-scale production or even late-stage bioprocessing. In GMP, what approaches do you think (e.g., improving translation layers, investing in intelligent error handling, or fully integrated hardware systems) best address the strict validation and compliance requirements? There will be a unique set of challenges for robotics adoption in these compliant spaces versus exploratory science, but more exciting wins as we progress from "cloud labs" to hopefully one day "cloud CDMOs"
great read! appreciate the progression from boxes to arms to different takes from the latest companies
some things that could make an interesting part 2:
1. how are frontier labs exploring lab automation and life science? both open ai and anthropic are starting to focus on integrating their software more deeply into workflows and proof of concepts
A friend of mine is building a cloud vivarium to be able to run mouse experiments in the same spirit as something like ECL (https://oldenlabs.com/cloud-lab/)
In addition to being interesting in the same ways that regular cloud labs are, it has the added benefit of standardizing in vivo experiments, which are an infamously finicky enterprise. A classic study showed that the mere presence of a male experimenter induced a stress response in mice equivalent to being forced to swim for 3 minutes. It was due to male chemosignals, and was replicated just by having a male-worn T-shirt near the cage (https://neznansky.github.io/docs/gae/2014-Sorge.pdf)
Or even weirder — ketamine's antidepressant effects only worked when administered by male researchers, not female ones. (https://pubmed.ncbi.nlm.nih.gov/36042309/) Todd Gould's team in Maryland were mostly women, and couldn't replicate ketamine results from other labs. When they placed mice in a fume hood to eliminate scent, ketamine stopped working entirely regardless of experimenter sex. But put a male-worn T-shirt next to the mice in the hood, and the drug worked again.
Building a cloud lab to do the in vivo work doesn't inherently make more business sense than existing CROs, but I do think there's a lot that preclinical tox labs do that would be ideal tasks to automate. I just think we'll need much smarter robots than liquid handlers if we want robots to dissect mice. Logically, the incentive to develop the robotics (and probably adaptive intelligence in navigating real 3-D challenges) will be for human surgery, but in the medium term future (5-20 years out), I wouldn't be surprised to see big gains from efficiency in those processes.
Your points about behavioral readouts being particularly susceptible to reproducibility issues is well taken, but the real problem may be that few behavioral readouts have good predictive validity. For example, I've never liked the forced swim tests for antidepressants, and not because it's sensitive to lab-to-lab variation (it just doesn't seem like a good predictor for depression!).
Make the robots do the gruesome, meticulous, and somewhat dangerous work of dissecting the mice!
Slight correction on this excellent article - Ginkgo RACs aren’t that recent of a push- the tech (and Will Serber) was acquired from Zymergen and the scheduling software was under development for years there. The intelligence layer integration (LLMs) is recent though.
It is a big topic on our mind (as well as Scannell's other work) at Tetsuwan! Of course, predictive validity is a dimension of a given disease model, but the (incredible) importance of data quality shapes our views around automation and its true role.
Matches my experience at SLAS (lab automation conference) last year. Personally I found Ginkgo to be clunky and Automata to be sleek and practical. The translation layer (your first camp) is admirable, but an army of Claudes will eat them alive. Camp three is worth following.
When I was at {big pharma, REDACTED}, people were just beginning to build some of these things in bio research, outside of the core HTS +library functions. Throughput is nice, but increasingly capable agents will give you superhuman maneuverability, which is the real prize. But it will take time to vertically integrate it all. Lots of new winners to be minted -- we are still early.
Thanks for reading! I'm personally unsure about the army of Claude's :) Harvey and OpenEvidence should've also theoretically been eaten up by LLM's, but there is evidently some value in a stable interface that promises to you that it has built-in guardrails for their LLM harnesses to follow both your existing workflow and to not mess up. Whether this value is enough to build a business on, and whether even Harvey/OpenEvidence eventually fall to the Claude's is another question, but I'm optimistic!
Those are good points! I will admit to being bearish on harnesses -- I think the stiff regulatory envs of law/medicine require brakes on unfettered Claude diffusion but the prize is still too valuable for them to resist forever. Research can be a veritable wild west in comparison
One of the best framings of this landscape I've come across. The three-camp taxonomy is genuinely useful, and I'll be stealing it.
The ending is where I want to push, though. Your beachhead argument — pick one high-value, standardized workflow, own the entire logistics stack so the customer has zero coordination burden, dominate that niche, then expand — feels like the most underappreciated insight in the piece. But you stop at research workflows. IVF fits the beachhead profile even more cleanly, and almost none of these companies are looking there.
The workflow is highly standardized — retrieval, ICSI, culture, vitrification, biopsy, transfer — run near-identically across thousands of clinics globally. The skilled-labor bottleneck is severe: embryologists take 3-5 years to train, there's a global shortage, and the credential doesn't travel well. And unlike most research protocols, the volume threshold problem is solved not at the individual clinic level but across clinics — which is exactly what a hub-and-spoke automation model is designed to exploit.
The Scannell point lands here too, maybe harder than in drug discovery. The primary decision tool in IVF — morphological embryo grading — has genuinely poor predictive validity for live birth. Embryologists visually score embryos on subjective criteria, inter-observer agreement is mediocre, and the correlation with actual clinical outcomes is weak. This is the same problem Scannell is describing, playing out with a human embryo instead of a compound. More throughput through the same grading system doesn't fix it. What could fix it is automating physical handling well enough to run the kind of calibration Scannell advocates — standardized imaging, consistent culture conditions, outcome data at scale — and actually start measuring whether your decision tools work.
The access angle compounds this. A single IVF cycle in the US runs $15,000–$25,000, most of which is clinical labor and infrastructure, not biology. This is a procedure that should be far cheaper and isn't.
Really fantastic piece that slices this field in multiple ways!! I am always skeptical of the “hardware is commodity, software/intelligence is where the value is” assumption that has come back to roost in so many industries. Co-designed hardware/software/AI can generate new and bigger data streams that in turn drive intelligence.
The cloud lab thesis that making your own reagents saves a lot of money because your costs are substantially lower suggests an alternative, simpler business model: automate producing reagents at scale, then undercut the market.
It may seem like a long shot now but the way I see lab automation driving value is by screening at a scale that allows models to understand bio better. Data generated at scale from automated experiments can be used to build better models that function as 'predictive assays' themselves, allowing us to make better in-silico predictions about which drugs will actually work in practice. If both generative drug design/discovery and lab automation succeed, then maybe one day we’ll have precision medicine tailored to individual patients at scale.
Usually I glance through the conclusion section of articles because it's just a repetition of what was said. But there were so many gems in this one's 💎. Correlation of assay to clinical result is king.
Another viewpoint is that automation for increasing throughput is a game of getting to the highest datapoint/$ (at a certain quality). We can either 1)use robots to pipette faster and for longer, or 2) multiplex the assays themselves to have way more datapoints per well. They are complimentary but I feel that 2) is getting less attention these days.
Thanks for your research, Abhi. A exciting project that belongs in your first camp is PyLabRobot (PLR), https://github.com/PyLabRobot.
PLR makes an open, universal interface for all robots. It also incorporates equipment like plate readers, arms and thermal cyclers.
I remember coding a routine on an OT-2 for qPCr. It took me about 4 days plus 1-2 days for validation. Recently, I coded a similar routine on a Hamilton Starlet in about 2 minutes.
PLR + AI is an enormous force multiplier.
From my conversations with people, it is an amazing tool. Someday I'd love to have an article entirely focused on them
Great article... I wonder how these principles apply in a GMP/scale-up world. There is a bigger need for automation for some workflows in mass-scale production or even late-stage bioprocessing. In GMP, what approaches do you think (e.g., improving translation layers, investing in intelligent error handling, or fully integrated hardware systems) best address the strict validation and compliance requirements? There will be a unique set of challenges for robotics adoption in these compliant spaces versus exploratory science, but more exciting wins as we progress from "cloud labs" to hopefully one day "cloud CDMOs"
great read! appreciate the progression from boxes to arms to different takes from the latest companies
some things that could make an interesting part 2:
1. how are frontier labs exploring lab automation and life science? both open ai and anthropic are starting to focus on integrating their software more deeply into workflows and proof of concepts
some examples
1. open ai (aug 2025) retro bio - 4b model - https://openai.com/index/accelerating-life-sciences-research-with-retro-biosciences/
2. open ai (dec 2025) gibson assembly - https://openai.com/index/accelerating-biological-research-in-the-wet-lab/
3. critique of that (dec 2025) by niko - https://x.com/NikoMcCarty/status/2001710892548001889
4. open ai x gingko on cell free protein synthesis (jan 2026) - https://openai.com/index/gpt-5-lowers-protein-synthesis-cost/
5. anthropic working with different partners - https://www.anthropic.com/news/healthcare-life-sciences
6. anthropic agent skills from lit review to opentrons protocol generator in 1 hr - https://worldwide-studios.org/blog/skills-opentrons
2. the latest iteration of cloud labs seem to be a mix of frontier lab research and ambitious hardware builds for materials and life science
check out
- lila ai - https://www.lila.ai/
- periodic - https://techcrunch.com/2025/09/30/former-openai-and-deepmind-researchers-raise-whopping-300m-seed-to-automate-science/
3. there's some interesting work being done at the national lab scale too
berkeley a-lab, collab with deepmind, for first powder material automation - https://www.nature.com/articles/s41586-023-06734-w
4. and the time from idea to experiment has really ben condensed when all of this comes together with small teams
ex. physical MCP hack june 2025 - https://www.linkedin.com/pulse/ai-lab-auto-hack-recap-june-2025-michael-raspuzzi-kdvlc/
ex. ai agent platforms on monomer bio robotic work cell for cell cultivation oct 2025 - https://worldwide-studios.org/blog/ai-science-fall-25
ex. next one in sf is coming up march 2026 - https://luma.com/ii6qxq4r
A friend of mine is building a cloud vivarium to be able to run mouse experiments in the same spirit as something like ECL (https://oldenlabs.com/cloud-lab/)
In addition to being interesting in the same ways that regular cloud labs are, it has the added benefit of standardizing in vivo experiments, which are an infamously finicky enterprise. A classic study showed that the mere presence of a male experimenter induced a stress response in mice equivalent to being forced to swim for 3 minutes. It was due to male chemosignals, and was replicated just by having a male-worn T-shirt near the cage (https://neznansky.github.io/docs/gae/2014-Sorge.pdf)
Or even weirder — ketamine's antidepressant effects only worked when administered by male researchers, not female ones. (https://pubmed.ncbi.nlm.nih.gov/36042309/) Todd Gould's team in Maryland were mostly women, and couldn't replicate ketamine results from other labs. When they placed mice in a fume hood to eliminate scent, ketamine stopped working entirely regardless of experimenter sex. But put a male-worn T-shirt next to the mice in the hood, and the drug worked again.
Building a cloud lab to do the in vivo work doesn't inherently make more business sense than existing CROs, but I do think there's a lot that preclinical tox labs do that would be ideal tasks to automate. I just think we'll need much smarter robots than liquid handlers if we want robots to dissect mice. Logically, the incentive to develop the robotics (and probably adaptive intelligence in navigating real 3-D challenges) will be for human surgery, but in the medium term future (5-20 years out), I wouldn't be surprised to see big gains from efficiency in those processes.
Your points about behavioral readouts being particularly susceptible to reproducibility issues is well taken, but the real problem may be that few behavioral readouts have good predictive validity. For example, I've never liked the forced swim tests for antidepressants, and not because it's sensitive to lab-to-lab variation (it just doesn't seem like a good predictor for depression!).
Make the robots do the gruesome, meticulous, and somewhat dangerous work of dissecting the mice!
Slight correction on this excellent article - Ginkgo RACs aren’t that recent of a push- the tech (and Will Serber) was acquired from Zymergen and the scheduling software was under development for years there. The intelligence layer integration (LLMs) is recent though.
Great point! Added that into the piece
Great writeup! Did you get a sense for whether any of the mentioned companies are thinking about predictive validity?
It is a big topic on our mind (as well as Scannell's other work) at Tetsuwan! Of course, predictive validity is a dimension of a given disease model, but the (incredible) importance of data quality shapes our views around automation and its true role.
Good writeup!
Matches my experience at SLAS (lab automation conference) last year. Personally I found Ginkgo to be clunky and Automata to be sleek and practical. The translation layer (your first camp) is admirable, but an army of Claudes will eat them alive. Camp three is worth following.
When I was at {big pharma, REDACTED}, people were just beginning to build some of these things in bio research, outside of the core HTS +library functions. Throughput is nice, but increasingly capable agents will give you superhuman maneuverability, which is the real prize. But it will take time to vertically integrate it all. Lots of new winners to be minted -- we are still early.
Thanks for reading! I'm personally unsure about the army of Claude's :) Harvey and OpenEvidence should've also theoretically been eaten up by LLM's, but there is evidently some value in a stable interface that promises to you that it has built-in guardrails for their LLM harnesses to follow both your existing workflow and to not mess up. Whether this value is enough to build a business on, and whether even Harvey/OpenEvidence eventually fall to the Claude's is another question, but I'm optimistic!
Those are good points! I will admit to being bearish on harnesses -- I think the stiff regulatory envs of law/medicine require brakes on unfettered Claude diffusion but the prize is still too valuable for them to resist forever. Research can be a veritable wild west in comparison
One of the best framings of this landscape I've come across. The three-camp taxonomy is genuinely useful, and I'll be stealing it.
The ending is where I want to push, though. Your beachhead argument — pick one high-value, standardized workflow, own the entire logistics stack so the customer has zero coordination burden, dominate that niche, then expand — feels like the most underappreciated insight in the piece. But you stop at research workflows. IVF fits the beachhead profile even more cleanly, and almost none of these companies are looking there.
The workflow is highly standardized — retrieval, ICSI, culture, vitrification, biopsy, transfer — run near-identically across thousands of clinics globally. The skilled-labor bottleneck is severe: embryologists take 3-5 years to train, there's a global shortage, and the credential doesn't travel well. And unlike most research protocols, the volume threshold problem is solved not at the individual clinic level but across clinics — which is exactly what a hub-and-spoke automation model is designed to exploit.
The Scannell point lands here too, maybe harder than in drug discovery. The primary decision tool in IVF — morphological embryo grading — has genuinely poor predictive validity for live birth. Embryologists visually score embryos on subjective criteria, inter-observer agreement is mediocre, and the correlation with actual clinical outcomes is weak. This is the same problem Scannell is describing, playing out with a human embryo instead of a compound. More throughput through the same grading system doesn't fix it. What could fix it is automating physical handling well enough to run the kind of calibration Scannell advocates — standardized imaging, consistent culture conditions, outcome data at scale — and actually start measuring whether your decision tools work.
The access angle compounds this. A single IVF cycle in the US runs $15,000–$25,000, most of which is clinical labor and infrastructure, not biology. This is a procedure that should be far cheaper and isn't.
I'm working on this atm — I consult for Conceivable Life Sciences (https://www.conceivable.life/), and I have just begun writing about it for them [conceivablelifesciences.substack.com]. Still early days; would genuinely value your read on it.
I suspect IVF automation is where the first true beachhead win comes. Would love to know if you've looked at this side at all.
Hey — I came across your writing and really liked how you think.
I’m exploring something similar from a different angle — writing about human behavior through a system design lens (like debugging internal patterns).
Just started publishing on Substack. If you ever get a moment to read, I’d genuinely value your perspective.
Also happy to support your work — feels like there’s an interesting overlap here.
Really fantastic piece that slices this field in multiple ways!! I am always skeptical of the “hardware is commodity, software/intelligence is where the value is” assumption that has come back to roost in so many industries. Co-designed hardware/software/AI can generate new and bigger data streams that in turn drive intelligence.
The cloud lab thesis that making your own reagents saves a lot of money because your costs are substantially lower suggests an alternative, simpler business model: automate producing reagents at scale, then undercut the market.