Techno Blender
Digitally Yours.

Atlas of biomedical literature could help track down fabricated studies | Science

0 71


Wish there was a quicker way to catch fraudulent papers? Or to find out which types of studies are most likely to be published by women authors? Good luck trying to spot trends in the scientific literature—a morass of millions of papers that is increasing in size at an unrelenting pace.

Now, there’s hope, thanks to artificial intelligence (AI). A new, publicly available atlas of biomedical papers, reported on the preprint server bioRxiv, maps the relationships between nearly 21 million articles, providing a “bird’s-eye view” of the literature. If kept up to date, it could help scientific sleuths identify patterns and trends that are otherwise difficult for humans to trace.

The atlas “gives a compelling picture of the entire structure of biomedicine,” says Kevin Boyack, an information scientist who works on similar visualization methods at SciTech Strategies, a research consulting company. “It should prove quite useful in looking at high-level trends.”

Previous tools for visualizing the biomedical literature have tended to map out publications according to the citations they share. Or they have clustered together articles that contain similar scientific terms. These tools are useful for studying trends in narrow areas of research or finding related articles in the literature. But, “One of our goals was to study more broad, societally interesting questions,” says Dmitry Kobak, a data scientist at the University of Tübingen and co-author of the new paper.

To create the atlas, Kobak’s team downloaded the abstracts of nearly 21 million English-language articles from the PubMed search engine. The team then used an AI large language model known as PubMedBERT to sort the abstracts by similarity. The model looked for scientific terms within each abstract and interpreted their meaning according to the surrounding text. (For example, PubMedBERT will infer whether the word “replicate” refers to copied DNA or a repeated experiment.) Based on this analysis, it grouped similar publications together into so-called “neighborhoods.”

By plotting out this information, the team created a navigable, 2D atlas of all 21 million papers. Publications are scattered throughout the circular map, with papers from the same field tending to group together in large, color-coded bundles, looking like colonies of bacteria in a petri dish. Zoom in, and smaller neighborhoods of related papers on narrower topics become visible.

With the help of a few extra tools, the researchers then used the atlas to visually explore broad trends across the literature. In one analysis, they used an algorithm to predict the gender of authors’ first names. Across all of the biomedical literature surveyed, they found 42.4% of first authors but just 29.1% of last authors were women, consistent with other work that has found fewer women are promoted to supervisory roles in science. But this gender gap varied considerably across different areas of the atlas: Within the field of health care, for example, the team found a bundle of papers on surgery that was written largely by male authors, whereas another bundle on patient care was dominated by female authors. This suggests the atlas could help pinpoint specific research areas in which women in science are most underrepresented.

Although retracted papers can be found peppered across the atlas, many coalesce on dense islands owing to their similarities.González-Márquez et al., bioRxiv 2023.04.10.536208

The atlas may also be useful for zeroing in on fraudulent studies. In another analysis, Kobak’s team highlighted almost 12,000 papers that had been flagged as retracted on the PubMed database. These papers were dotted throughout the atlas, but many of them grouped together into dense “islands.” One such island contained multiple retracted papers all focusing on the cancer-fighting functions of understudied microRNAs—a popular topic of fraudulent articles produced by paper mills, which churn out forged scientific literature.

Kobak argues that closely inspecting these regions could help identify other suspicious papers. Indeed, when the researchers examined 25 of the other, nonretracted papers that were part of this island, they found telltale signs that they may have also been produced by paper mills: For instance, the titles of many of these papers followed the exact same template, and all but one contained authors affiliated with Chinese hospitals, who are known targets for paper mills.

These areas of the atlas may well warrant further scrutiny, agrees publication integrity researcher Jennifer Byrne at the University of Sydney. But, she cautions, “Clusters of similar papers would need further screening to avoid wrongly flagging genuine papers.”

So far, the atlas only covers biomedical literature up to 2021 plus a very small number of papers from 2022. To keep pace with current trends, Kobak and his team plan to update the tool with articles from the past 2 years—and they hope to create similar visualizations of other literature databases, too.


Wish there was a quicker way to catch fraudulent papers? Or to find out which types of studies are most likely to be published by women authors? Good luck trying to spot trends in the scientific literature—a morass of millions of papers that is increasing in size at an unrelenting pace.

Now, there’s hope, thanks to artificial intelligence (AI). A new, publicly available atlas of biomedical papers, reported on the preprint server bioRxiv, maps the relationships between nearly 21 million articles, providing a “bird’s-eye view” of the literature. If kept up to date, it could help scientific sleuths identify patterns and trends that are otherwise difficult for humans to trace.

The atlas “gives a compelling picture of the entire structure of biomedicine,” says Kevin Boyack, an information scientist who works on similar visualization methods at SciTech Strategies, a research consulting company. “It should prove quite useful in looking at high-level trends.”

Previous tools for visualizing the biomedical literature have tended to map out publications according to the citations they share. Or they have clustered together articles that contain similar scientific terms. These tools are useful for studying trends in narrow areas of research or finding related articles in the literature. But, “One of our goals was to study more broad, societally interesting questions,” says Dmitry Kobak, a data scientist at the University of Tübingen and co-author of the new paper.

To create the atlas, Kobak’s team downloaded the abstracts of nearly 21 million English-language articles from the PubMed search engine. The team then used an AI large language model known as PubMedBERT to sort the abstracts by similarity. The model looked for scientific terms within each abstract and interpreted their meaning according to the surrounding text. (For example, PubMedBERT will infer whether the word “replicate” refers to copied DNA or a repeated experiment.) Based on this analysis, it grouped similar publications together into so-called “neighborhoods.”

By plotting out this information, the team created a navigable, 2D atlas of all 21 million papers. Publications are scattered throughout the circular map, with papers from the same field tending to group together in large, color-coded bundles, looking like colonies of bacteria in a petri dish. Zoom in, and smaller neighborhoods of related papers on narrower topics become visible.

With the help of a few extra tools, the researchers then used the atlas to visually explore broad trends across the literature. In one analysis, they used an algorithm to predict the gender of authors’ first names. Across all of the biomedical literature surveyed, they found 42.4% of first authors but just 29.1% of last authors were women, consistent with other work that has found fewer women are promoted to supervisory roles in science. But this gender gap varied considerably across different areas of the atlas: Within the field of health care, for example, the team found a bundle of papers on surgery that was written largely by male authors, whereas another bundle on patient care was dominated by female authors. This suggests the atlas could help pinpoint specific research areas in which women in science are most underrepresented.

section of a 2D biomedical atlas showing retracted papers
Although retracted papers can be found peppered across the atlas, many coalesce on dense islands owing to their similarities.González-Márquez et al., bioRxiv 2023.04.10.536208

The atlas may also be useful for zeroing in on fraudulent studies. In another analysis, Kobak’s team highlighted almost 12,000 papers that had been flagged as retracted on the PubMed database. These papers were dotted throughout the atlas, but many of them grouped together into dense “islands.” One such island contained multiple retracted papers all focusing on the cancer-fighting functions of understudied microRNAs—a popular topic of fraudulent articles produced by paper mills, which churn out forged scientific literature.

Kobak argues that closely inspecting these regions could help identify other suspicious papers. Indeed, when the researchers examined 25 of the other, nonretracted papers that were part of this island, they found telltale signs that they may have also been produced by paper mills: For instance, the titles of many of these papers followed the exact same template, and all but one contained authors affiliated with Chinese hospitals, who are known targets for paper mills.

These areas of the atlas may well warrant further scrutiny, agrees publication integrity researcher Jennifer Byrne at the University of Sydney. But, she cautions, “Clusters of similar papers would need further screening to avoid wrongly flagging genuine papers.”

So far, the atlas only covers biomedical literature up to 2021 plus a very small number of papers from 2022. To keep pace with current trends, Kobak and his team plan to update the tool with articles from the past 2 years—and they hope to create similar visualizations of other literature databases, too.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment