x
Black Bar Banner 1
x

Watch this space. The new Chief Engineer is getting up to speed

COVID-origins report sparks debate over major genome hub GISAID

Posted by Otto Knotzer on March 31, 2023 - 11:49pm

COVID-origins report sparks debate over major genome hub GISAID

GISAID revoked researchers’ access following the report, sparking discussion about findings based on data found in online repositories.

Concept of synthetic virolog or artificial virus, graphic of virus cell combined with electronic pattern.

The use of data in public repositories is under debate after a temporary suspension of 19 authors.Jackie Niam/iStock/Getty

On 20 March, an international team of researchers posted a headline-grabbing online report1 describing genomic data found in swabs taken from the market in Wuhan, China, that has been linked to the origins of the COVID-19 pandemic. The data were gathered by a separate team in China and hosted on a major online database, the Global Initiative on Sharing Avian Influenza Data (GISAID). Hours later, GISAID temporarily revoked the authors’ access.

GISAID said that the authors of the report, posted to the online platform Zenodo, had not done enough to protect the interests of the team in China — the ‘data generators’ — and had therefore violated its rules.

The authors of the Zenodo report — an international team of 19 scientists, many of whom have published papers on the origins of SARS-CoV-2 — deny this, saying that they had attempted to collaborate with the team in China, and noting that their new report acknowledges that team.

“I feel that we’re being punished while abiding by both the letter and the spirit of their rules,” says Joel Wertheim, a molecular epidemiologist at the University of California, San Diego, and a co-author of the Zenodo report, referring to GISAID’s rules. “It’s disheartening.”

GISAID reinstated the Zenodo authors’ access within 24 hours. A spokesperson told Nature that the data hub is now reviewing the situation and has granted authors access while the investigation continues.

The events have sparked a debate in the research community about how and when to disseminate findings based on data found in online repositories — especially when they relate to the highly charged question of the pandemic’s origins.

GISAID, a non-profit public–private partnership, is a major disseminator of genomic information relating to viruses. “This is a crucial resource for our work,” says Spyros Lytras, an evolutionary virologist at the University of Glasgow, UK, who was not involved in any of the analyses.

With more than 15 million submissions from almost every country in the world, it is the largest database of SARS-CoV-2 genomes “by quite a margin”, says Karthik Gangavarapu, a computational biologist at the University of California, Los Angeles, and one of the co-authors briefly locked out of the site. “If you want to analyse COVID-19 from a global perspective, then you do want to use data from GISAID,” says Gangavarapu.

Raccoon dogs

The Zenodo report — which has not been peer reviewed — points to raccoon dogs (Nyctereutes procyonoides) and other animal species as possible sources of the spillover of SARS-CoV-2 into humans.

The same data analysed in the report had formed the basis of a preprint2 posted on Research Square in February 2022 by researchers at the Chinese Center for Disease Control and Prevention (China CDC) in Beijing, who generated the data. In March 2023, co-authors of the Zenodo report noticed that the data were publicly available on GISAID and downloaded them, but then the data disappeared from public view following a request from the data generators.

GISAID’s rules require researchers to “make best efforts to collaborate” with data generators and involve the generators in their analyses.

In a public statement on 21 March, GISAID said that it had reached out to the the China CDC and established that the authors of the Zenodo report had not fulfilled this requirement and had communicated “only their intent to publish the data generator’s data”.

GISAID also said that the authors of the Zenodo report had “scooped” the China CDC authors by posting their analysis on Zenodo before the China CDC group’s work was published in a peer-reviewed journal. A spokesperson for GISAID told Nature that the China CDC had submitted a complaint, stating that the Chinese researchers had asked the Zenodo report authors to “hold off” with their publication because they “had a manuscript pending, and were in the middle of updating some of the data they had submitted at the request of the reviewers”. (The China CDC preprint is currently marked as ‘under review’ at Nature Portfolio; Nature’s news team is independent of its publisher, Springer Nature, which publishes Nature Portfolio journals.)

The researchers at the China CDC contacted by Nature have not responded to requests for comment on the situation.

Members of the international team deny these accusations. In their report and an e-mail sent to GISAID on behalf of all the authors by Michael Worobey, an evolutionary biologist at the University of Arizona, Tucson, they say they made multiple attempts to collaborate with the data generators that were declined. The group acknowledges the China CDC team in the analysis.

Wertheim says that the fact that a preprint has already been posted online protects the China CDC researchers from being scooped. And in their Zenodo report, the authors explicitly state that they respect the China CDC researchers’ right to be the first to publish a peer-reviewed paper on their data and do not intend to submit their report for review, which would compete with that process.

The Zenodo authors also point out in their report that more than three years have passed since the collection of the data, which are critical for the understanding of the pandemic’s origins.

Sparks discussion

The debate over the Zenodo report has sparked discussion in the wider research community. By posting a report in advance of the data owners’ work being published in a journal, the Zenodo researchers could deter data generators from sharing information for fear of being scooped, says Gustavo Palacios, a virologist at the Icahn School of Medicine at Mount Sinai, New York City. This could impair the response to future pandemics, he says.

But the fact that a preprint describing the data has already been posted protects the data providers, say other researchers. The pandemic has set a precedent of using preprints to get credit for data and analysis, says Sergei Pond, a computational evolutionary biologist at Temple University in Philadelphia, Pennsylvania. “As soon as you put up a preprint, that stakes your claim that these are your data,” he says.

Senjuti Saha, a molecular microbiologist at the Child Health Research Foundation in Dhaka, questions the need for urgency on the part of the Zenodo authors: she says people’s lives don’t depend on the data at this time and therefore there was no need to rush out an analysis.

The data’s relevance to the pandemic’s origins “is not a good enough reason to undermine a fantastically beneficial open-data enterprise that is entirely dependent on both the goodwill of data producers and the good behaviour of data users”, says Darren Martin, a computational biologist at the University of Cape Town, South Africa.

But rather than an attempt to scoop the China CDC work, others see the Zenodo report as the sort of widely accepted and beneficial scientific discussion that should ensue from a preprint. “Other people can start to analyse the data and can write about their interpretations of the data. And they can do that citing the original researchers, giving them the credit for generating those data,” says Theo Sanderson, a geneticist at the Francis Crick Institute in London, who highlights that his views don’t represent those of his institution.

Lack of clarity

Several researchers also say that there is a lack of clarity in GISAID’s data-access rules and how they are enforced, and, given GISAID’s importance for scientific research, clarity is crucial.

The rules don’t explicitly say anything about scooping, don’t provide clear definitions for what entails “best efforts to collaborate”, don’t specify a time limit for when that requirement to request collaboration ends, and don’t say whether a group may publish using GISAID data if the data producers decline or do not respond, say researchers Nature spoke to.

Greater clarity from GISAID as to the circumstances in which access to GISAID data could be revoked would prevent researchers from unwittingly violating the rules, says Gangavarapu.

GISAID says it expects users to apply “common sense” and seek clarification directly when they are unsure about the rules. It says it does not explicitly refer to ‘scooping’ in its terms of use due to the phrase’s broad interpretation. And if parties disagree on collaboration, this could result in separate publications “but there always needs to be priority for the data generator to tell their story first”.

GISAID describes the recent situation as an “extremely rare event” and says that suspending an account is a measure of last resort. “The vast majority of GISAID’s tens of thousands of users demonstrate sound scientific etiquette which has contributed to the initiative’s overwhelming success over the course of the last 15 years,” says GISAID.

After the Zenodo report was posted, GISAID made the data at the heart of the analyses available to its users once more. It also introduced more restrictions for access to those data, with a pop-up explicitly stating that any analyses based on the data will not be publicly shared. This pop up has now been removed.