A new database for researchers to share the genomes of dangerous viruses promises to solve many of the problems that hamper existing alternatives. But first, researchers must be convinced to use it.
Pathoplexus — a portmanteau of pathogen and plexus — was launched last month, and the team of scientists behind the database hopes that it will encourage more researchers to share genetic sequences of known and emerging viruses of public-health importance.
Sharing sequences as quickly as possible is important for identifying new viruses and tracking changes that could make them more dangerous to humans, as well as for designing vaccines, says Edward Holmes, a virologist at the University of Sydney in Australia.
Pathoplexus currently focuses on four viruses that are not specifically included in other databases: Crimean–Congo Hemorrhagic Fever Virus, Ebola Sudan, Ebola Zaire and West Nile Virus. Other pathogens will be added later, the team says.
Existing hurdles
Among the largest existing repositories is GenBank in the United States, which offers unrestricted access to its genomic data. But public access means that anyone can theoretically use the data to publish scientific papers, without acknowledging the data owners. This has discouraged scientists, particularly those from lower-income countries, from sharing their data quickly, such as during a public-health emergency. An alternative repository, GISAID, requires users to register, agree to acknowledge the data owners and make their best efforts to collaborate with the owners. The database was designed to ensure the rights of data submitters.
GISAID was hugely popular during the COVID-19 pandemic, and it contains close to 17 million sequences of SARS-CoV-2, the virus behind COVID-19. But researchers have raised concerns around transparency in its governance, how it mediates disputes over credit and how it sanctions those it believes to have violated its conditions for use.
“GISAID has led to a lot of frustration in the past few years,” but the scientific community have also learnt lessons on how to do things better, says Spyros Lytras, an evolutionary virologist at the University of Tokyo. “Starting from scratch is what we need as a community, and Pathoplexus might be the solution.”
A representative for GISAID said, in an email, that the trust it has with the scientific community is strong, and that more than 70,000 researchers use the site. The roles of its governing bodies and funding sources are displayed on its website, and their terms of use haven’t changed since it was founded in 2008, the representative said.
Building trust
Pathoplexus offers some protections for users. For instance, researchers can set restrictions on how their data are used, such as not allowing them to be included as a key focus of scientific publications for up to a year without their explicit permission. This should give data owners enough time to submit a manuscript on their findings.
Users must also credit the data owners in their publications. “We aim to build a community where researchers feel confident that their contributions will be respected and properly credited,” says Jamie Southgate, a member of Pathoplexus and the head of operations at the global coalition Public Health Alliance for Genomic Epidemiology, based in Cape Town, South Africa.
Pathoplexus doesn’t block individuals who breach the terms of use from accessing the site, which GISAID has done in rare cases. Instead, if published data breach the terms, the team will approach the journals to ensure that the data are used in accordance with the way in which they were shared, says Emma Hodcroft, a co-founder of Pathoplexus and a molecular epidemiologist at the Swiss Tropical and Public Health Institute in Basel, Switzerland. “We have tried to be incredibly explicit” about the terms, she says.
“It’s a good, clever solution,” says Senjuti Saha, a molecular microbiologist at the Child Health Research Foundation in Dhaka, who agrees with the approach of reaching out to publishers. “That’s the way it should be.” She thinks that Pathoplexus’s transparency will breed trust among the scientific community.
But it’s too early to say whether the repository will solve the current data-sharing problems, says Saha. “It is an excellent and fantastic first step.”
Users might also stick to sharing sequences on local databases. For instance, in China, researchers are probably more likely to publish sequences for emerging viruses on Chinese databases, says Shi Mang, an evolutionary biologist at Sun Yat-sen University in Shenzhen, China, who is also on Pathoplexus’s scientific advisory board. But for established viruses, they are likely to use repositories with well-maintained collections, which Pathoplexus offers.
Improved experience
Pathoplexus’s creators have tried to improve the user experience, such as making uploading as easy as possible. Pathoplexus also checks for errors in the sequence data and accompanying information and assists with organizing viruses into subtypes. “This is actually what attracted me to this database,” says Shi. Incorrect sequences in current repositories can cause lots of trouble for researchers, he says.
So far, Pathoplexus has used GenBank data for the four viruses to populate the site. Thousands of people have visited the site, and 50 have created accounts to submit data, but none have submitted sequences, says Hodcroft. “We did not expect high volumes of data for the pathogens that we’ve launched with.”
Researchers who work on other viruses will have to wait until the database expands to include them. And to expand, the team needs to secure long-term funding. The site is currently being run by volunteers and donated computing time, which ends in about six months. Hodcroft says her priority right now is to appeal to donors. “I’m cautiously hopeful.”