What ELN should I use?
You’re on an R&D team that records lab notes in Google Docs. This was fine at first, but now that the group has grown it can feel overwhelming to try to find a detail from a few months ago. This solution is reaching a breaking point, but when should you change and what should you use?
Almost every biotech company I know has been in this position at some point. In the last ten years of being a software engineer in life science, I have been asked for ELN advice countless times. The problem is that “ELN” has come to mean different things to different people and there is no one answer that is perfect for everyone.
What is an ELN?
“I was hoping you would just tell me what ELN to use and I could go back to research”
– Almost everyone
An ELN is an electronic lab notebook. By taking lab notes in a Google Doc, it counts as an ELN and has many advantages over the paper notebook I kept in a locked drawer during my first research project:
Version Controlled
Timestamped changes can help you track how an SOP changed over time as well as prove ownership in an IP dispute.
Shareable
Documents can be easily shared with teammates, making research more transparent and collaborative.
Searchable
The ability to search for information in past records can help you find information about an experiment you performed three years ago or see if a teammate previously explored a particular topic.
When you are doing exploratory research where every experiment is unique, the blank-canvas Google Doc will let you move quickly. But if you are looking for a new ELN, you have likely felt the limitations of a blank sheet.
When you perform the same experiment 5 or 500 times, lack of structure can lead to inconsistency that will make comparing results difficult or impossible. While this was never something a paper notebook could provide, we have come to expect a modern ELN to have structured data.
What is the right amount of structure?
If you try to capture data from a 200 sample experiment in a Google Doc it creates chaos; if you try to capture a one-off experiment in a database it creates unnecessary work. So what is the right amount of structure? Here is how I think about it:
1 - 5 samples / assays
Do something completely unstructured, but not untracked.
This phase is like finding a new hiking trail: you might not know where you are going the first time, but you still need to draw a map to retrace your path in the future.
Examples: Google Doc, Benchling Entry, Notion page
5 - 50
This should be structured, but there should also be a way to capture unstructured notes.
This phase is like going on a hike: you should be following a route, but there may be unexpected events that require you to go off trail.
Examples: Google Sheets, Benchling Entry + Registry Tables, Notion Table
50 - 500
This should be structured, and structured carefully. Someone with data engineering experience and someone with detailed domain knowledge should both contribute to the design.
This phase is like driving on a road. Most of the decisions will be made when you pave the road, and driving down it should be very consistent. This is often when lab automation equipment is purchased.
Examples: Benchling, Notion, AirTable
500 is the upper limit of what many traditional ELNs support. If you do not know the limit of a particular ELN, you can ask the customer service team for benchmark data or run a benchmarking experiment yourself.
500 +
This is very well structured. At this point you want to have versioned SOPs (or even automation!) and versioned data schemas to track structure changes over time.
This phase is the highway. It should be similar to the road, but with fewer potholes and more lanes.
Examples: Airtable, Software databases (e.g., Postgres)
Most research groups have different projects that require different levels of structure, and that’s okay. It’s nice if you have one tool for everything, but it is much more important to have the right tool for each thing. Many companies have two data systems, one for low-throughput experiments and one for high-throughput experiments. The good news is you can focus on the 1-500 problem now and solve the 500+ problem later.
What does biology have to do with this?
There are two categories of ELN / data tools:
Domain Agnostic
Examples: Notion, AirTable
Biology Specific
Examples: Benchling, SciSpot
Generally speaking, domain agnostic tools will be cheaper but you will need to do more work to configure them for your needs. There are also biology specific features that the domain-agnostic tools do not provide out the box, including:
Customer support services with biology experience
E.g. pre-configured schemas for in vivo experiments, advice on how to set up a reagent tracking database
Analysis and visualization tools for biological data
E.g. a plasmid viewer, BLAST tool, small molecule database
Life science regulatory compliance
E.g. HIPAA, GxP
Whether to use a domain agnostic or biology specific tool depends on your requirements and how much time you are willing to invest in setting up the tool. In general, I don’t recommend using a domain agnostic tool unless someone on your team has data engineering or software experience.
I chose an ELN. Am I done now?
Unfortunately, no.
Your research is unique – that is why you are doing it. Even a biology specific tool like Benchling will need to be customized for your research. As your research evolves over time, you will need to keep your ELN up to date. Scientists will need to learn to use it, adopt it into their workflow, and enter data consistently.
Maintaining your ELN is likely to cost more than the ELN
Biotech companies that ask for initial implementation help from applications engineers or data engineers tend to have a better long term ELN experience. Biotech companies over 50 people tend to hire an informatics specialist or data manager to be responsible for the ELN and other information systems.
Don’t forget to factor usability and maintainability into the price. A poorly implemented ELN that slows down your research and takes multiple employees to maintain will cost your company far more than any SaaS.
(Self Promotion) What about computational biology?
Computational research needs an ELN too – and that’s why we built Mantle.
Traditional ELN systems are designed for research done at the bench. This is great for recording sample preparation for a Western blot, but not for capturing the analysis of single cell transcriptomics FASTQ files.
Mantle connects “big biological data” (FASTQs, PDB, micrographs, etc), analysis notebooks (Jupyter, R Studio), and bioinformatics pipelines (Nextflow). With automated workflows, you can run a DNA sequencer, go grab coffee, and receive reproducible results in your inbox. We integrate with the wet lab ELNs above to keep the entire lab connected.
How do you manage data?
What ELN do you use? We’d love to hear what has worked for you and what lessons you’ve learned along the way. Thank you!