Tuesday, October 14, 2025
No menu items!
HomeNaturehow AI changed my career in bioinformatics

how AI changed my career in bioinformatics

Formal portrait of Lei Zhu on blue background.

The rise of agentic AI tools caused Lei Zhu to rethink his role in bioinformatics.Credit: Lei Zhu

When I began my graduate studies, the first thing I needed to do was choose a research direction. The laboratory I joined focused on two main areas: functional assays and bioinformatics. This was more than a decade ago, and the typical workflow involved bioinformatics researchers analysing large data sets to identify genes associated with specific phenotypes or diseases, which would then be handed over to the functional-assay team for validation.

At the time, bioinformatics was a new, promising field, so I chose this path without hesitation. But, I did not have a programming background, so it was tough to get started. I began studying programming languages — first Perl, then R and Python.

Looking back, I’m happy with my choice. It was an exciting time, and the rapid growth of high-throughput technologies and new techniques — such as transcriptomics and genomics, and later single-cell biology — gave me plenty of data to work with. Solving biological problems through the code I wrote gave me a sense of self-worth.

Then, artificial intelligence (AI) tools, including ChatGPT, Manus and Grok, emerged. Their ability to spit out functional code threatened to make me redundant, but I wasn’t concerned at first because AI-generated code often contains errors that only appear during testing and require manual debugging. New ‘agentic’ modes of operation, however, were potential game-changers. These allow tools such as Manus to first generate code and then run it directly in the cloud, creating a seamless loop: from me asking questions, to the tool writing and executing code, to me receiving results. That was when I started to worry: in this age of AI, am I still necessary?

Today’s AI tools can efficiently write code to perform biological analyses. I need only upload my data and provide a simple prompt, such as, ‘Assume you are a bioinformatics expert. Could you create ten visuals to represent different data based on your understanding of the data set above? Display the plots one by one, with a brief introduction.’ The AI provides the answers I need, sometimes exceeding my expectations. So, what is my role in this process?

I found out during a study of lung cancer. We had hundreds of tumour tissue gene-expression profiles, and I asked the AI to set up the analysis. It worked quickly, and even produced a tidy report. The preliminary results looked great — almost too good. The AI identified a statistically significant difference in gene-expression levels before and after a specific time point. But as I dug deeper, I saw that, halfway through the study, the lab had changed how the data were collected. The model had picked up on that difference — not one due to biology. What had looked like a breakthrough was actually just an artefact. Once I adjusted for that change, the difference became less dramatic but reflected real biology.

I realized that my role had shifted from scripting to supervising. What matters now is stating the question clearly, spotting problems that the computer cannot see and taking responsibility for the answer.

Top tips for AI supervisors

People tell me that I could make the AI smarter by ‘putting more context into the prompt’, but my AI always seems to play dumb. No matter how detailed my request, it finds ways to misunderstand. Over the past few years, I’ve developed some methods to double-check its work.

Create a validation set. Keep a small data set that you understand well — a subset of previously published or manually validated data, for instance — as a positive control. Before applying a new AI-generated pipeline to your data, test it on this set. If the AI produces unexpected or inconsistent results, you’ll know immediately that either the prompt or the algorithm needs refinement.

Shuffle the data. AI models can easily overfit data or be swayed by technical artefacts, as happened during the lung-cancer study. To test whether a finding is biologically meaningful, shuffle sample labels, perturb values slightly or otherwise introduce synthetic noise. If the ‘significant’ pattern persists, it’s probably an artefact, rather than a true signal.

Subset the analysis. If a data set is big enough, I will ask the AI to perform the same analysis on random subsets of it. Consistency across subsets increases confidence: if the results vary wildly from one subset to the next, the finding might not hold up.

RELATED ARTICLES

Most Popular

Recent Comments