There is surprisingly little written on the scaling laws displayed in current protein language models (pLM's); there is still no compute-optimal paper in the protein world. Doing anything even mildly similar to the Chinchilla paper would take me forever, so I'll do something much more basic: assembling the token:parameter number of several well-known pLMs together.
To simplify things, I'm going to use this papers calculation of the average sequence length of UniRef50, which is 325, so each datapoint has 325 tokens. This will lead to inflated token numbers for BFD30, since those are largely small metagenomic proteins, but that's fine.
Keep reading with a 7-day free trial
Subscribe to Owl Posting to keep reading this post and get 7 days of free access to the full post archives.