Home Snake Effective Literature Database Searching

Effective Literature Database Searching

1
Effective Literature Database Searching

[ad_1]

The process of indexing ⁣is one of the main outcomes of the collaboration between the Biodiversity‍ Community Integrated Knowledge Library (BiCIKL) and the Horizon 2020-funded project.

The BiCIKL project, which is dedicated to making biodiversity​ knowledge FAIR (Findable, Accessible, Interoperable, and Reusable) and bi-directionally linked, has made significant progress in bringing the SIB Literature Services (SIBiLS) database closer to becoming the “Biodiversity PMC” portal and⁣ working title.

In a joint effort between the Swiss-based Text Mining team led by ⁣Patrick Ruch at SIB (responsible for developing SIBiLS), the text- and data-mining association Plazi, and scientific publisher Pensoft, the long-standing collaborators⁢ have started feeding full-text content from over 500,000 taxonomic treatments extracted by Plazi ‍and ​tens of thousands of full-text articles from ‌40 well-renowned biodiversity journals published by Pensoft to the SIBiLS database.

This means that users of‌ SIBiLS, whether human or AI, now have access to​ advanced ⁢text- and data-mining tools, including AI-powered factoid question-answering capabilities, to query all this full-text indexed content and discover, for example,‍ species characteristics and ‌biotic interactions.

To index and directly feed the content⁢ from its 40+ academic outlets ​to SIBiLS, ⁣Pensoft relies on⁣ advanced and⁤ full-text ‌TaxPub JATS XML journal publication workflows, powered⁤ by the ARPHA publishing platform. Meanwhile, Plazi uses its GoldenGate text- and data-mining software to ​harvest taxon treatments from over 80 journals stored⁢ at TreatmentBank and ⁢the Biodiversity Literature Repository, which⁤ are then further‍ reused by GBIF, OpenBiodiv, and now by SiBILS.

Seen as a pilot, the indexing – the partners believe ⁤– could soon be expanded ⁣to include other⁢ journals that rely on modern publishing or converted legacy publications.

In fact, since ‍its launch in 2020, the queryable ⁣database SIBiLS has been​ retrieving relevant full-text papers directly​ from the NIH’s PubMed Central, including⁢ Pensoft’s ZooKeys, PhytoKeys, MycoKeys, ⁢Biodiversity Data Journal, and Comparative Cytogenetics.

However, there were​ still gaps to be bridged before SIBiLS could truly⁢ be called “the ⁤Biodiversity⁣ PMC”, and these were mainly related to the quantity and breadth of content. While the above-mentioned five ⁤journals by⁤ Pensoft had long been indexed by SIBiLS through harvesting PMC, they were quite an exception since, a few years ago, a reorganization at PMC shifted the focus of the database to almost exclusively biomedical content, leaving biodiversity ‌journals out of its ⁤scope.

Meanwhile, as ‍Plazi has⁤ been feeding SIBiLS with a growing amount of taxonomic treatments and visual data, as it exponentially increased the number of ⁢publishers and journals it⁣ mined data⁢ from,⁣ various biodiversity data (e.g. genetic, ​molecular, ecological) published in the article narratives that were not ⁤taxon treatments could not make it to the portal.

“We are aware of the benefits and‍ practical uses that ‍PMC offers to its users, so we cannot miss the ‌opportunity⁣ to incorporate this well-proven approach to navigate the‍ data deluge in biodiversity science. Undoubtedly, ​it is an extremely ambitious and demanding task. Yet, I believe that, at the BiCIKL consortium, we have made ​it clear that we have the necessary expertise,‍ know-how, and aspiration to take on the ⁢challenge,”

said⁤ Prof. Lyubomir Penev, founder/CEO at Pensoft and project coordinator ‍of BiCIKL.

“For far too long, scientific knowledge about biodiversity has been imprisoned in a continuously growing corpus of scientific outputs, which – most of the time – are published in unstructured formats, such as PDF, or as paywalled content, and ⁣often locked by both! This means⁢ that they are – at best – difficult to access and comprehend by computer algorithms. In the meantime, we need all that knowledge, in order to accelerate our understanding​ of the dynamics of the global biodiversity crisis and to efficiently assess⁤ the impact‍ of climate ⁣change. This is why the need ⁢for advanced workflows⁤ and tools⁣ to annotate, mine, query, and discover new⁢ facts from the available literature is more than obvious,”

added Dr.‍ Donat Agosti, President at Plazi.

“In the course of the BiCIKL project,‌ at SIBiLS, we started indexing a⁢ larger set of biodiversity-related ‌contents in the broad sense, including environmental ​sciences‍ and ecology, to build‌ a⁣ new literature database, or ⁣what we now call ‘Biodiversity PMC’. Now, with the help ‌of Plazi and Pensoft, we provide ‍a unique entry point to half a million taxonomic treatments, which were not included in the original PubMed Central. Next on the list is ⁣to ⁣expand our ⁢network of literature sources and continue this exponential growth of queryable biodiversity knowledge to turn Biodiversity PMC into the ‘One Health’ library. We promise to keep you posted,”

stated Dr. Patrick Ruch, Group Leader⁢ at SIB and Head of Research at HES-SO, HEG Geneva, Switzerland.

***

Follow the BiCIKL Project on Twitter and Facebook.⁣ Join the ‍conversation on Twitter at #BiCIKL_H2020.

***

About the SIB Swiss Institute of Bioinformatics:

SIB is‍ an internationally recognized non-profit organization dedicated to biological and biomedical data ​science. SIB’s data scientists are passionate about creating knowledge and solving complex questions in many fields, from biodiversity​ and evolution ​to

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here