
[ad_1]
The process of indexing is one of the main outcomes of the collaboration between the Biodiversity Community Integrated Knowledge Library (BiCIKL) and the Horizon 2020-funded project.
The BiCIKL project, which is dedicated to making biodiversity knowledge FAIR (Findable, Accessible, Interoperable, and Reusable) and bi-directionally linked, has made significant progress in bringing the SIB Literature Services (SIBiLS) database closer to becoming the “Biodiversity PMC” portal and working title.
In a joint effort between the Swiss-based Text Mining team led by Patrick Ruch at SIB (responsible for developing SIBiLS), the text- and data-mining association Plazi, and scientific publisher Pensoft, the long-standing collaborators have started feeding full-text content from over 500,000 taxonomic treatments extracted by Plazi and tens of thousands of full-text articles from 40 well-renowned biodiversity journals published by Pensoft to the SIBiLS database.
This means that users of SIBiLS, whether human or AI, now have access to advanced text- and data-mining tools, including AI-powered factoid question-answering capabilities, to query all this full-text indexed content and discover, for example, species characteristics and biotic interactions.
To index and directly feed the content from its 40+ academic outlets to SIBiLS, Pensoft relies on advanced and full-text TaxPub JATS XML journal publication workflows, powered by the ARPHA publishing platform. Meanwhile, Plazi uses its GoldenGate text- and data-mining software to harvest taxon treatments from over 80 journals stored at TreatmentBank and the Biodiversity Literature Repository, which are then further reused by GBIF, OpenBiodiv, and now by SiBILS.
Seen as a pilot, the indexing – the partners believe – could soon be expanded to include other journals that rely on modern publishing or converted legacy publications.
In fact, since its launch in 2020, the queryable database SIBiLS has been retrieving relevant full-text papers directly from the NIH’s PubMed Central, including Pensoft’s ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal, and Comparative Cytogenetics.
However, there were still gaps to be bridged before SIBiLS could truly be called “the Biodiversity PMC”, and these were mainly related to the quantity and breadth of content. While the above-mentioned five journals by Pensoft had long been indexed by SIBiLS through harvesting PMC, they were quite an exception since, a few years ago, a reorganization at PMC shifted the focus of the database to almost exclusively biomedical content, leaving biodiversity journals out of its scope.
Meanwhile, as Plazi has been feeding SIBiLS with a growing amount of taxonomic treatments and visual data, as it exponentially increased the number of publishers and journals it mined data from, various biodiversity data (e.g. genetic, molecular, ecological) published in the article narratives that were not taxon treatments could not make it to the portal.
“We are aware of the benefits and practical uses that PMC offers to its users, so we cannot miss the opportunity to incorporate this well-proven approach to navigate the data deluge in biodiversity science. Undoubtedly, it is an extremely ambitious and demanding task. Yet, I believe that, at the BiCIKL consortium, we have made it clear that we have the necessary expertise, know-how, and aspiration to take on the challenge,”
said Prof. Lyubomir Penev, founder/CEO at Pensoft and project coordinator of BiCIKL.
“For far too long, scientific knowledge about biodiversity has been imprisoned in a continuously growing corpus of scientific outputs, which – most of the time – are published in unstructured formats, such as PDF, or as paywalled content, and often locked by both! This means that they are – at best – difficult to access and comprehend by computer algorithms. In the meantime, we need all that knowledge, in order to accelerate our understanding of the dynamics of the global biodiversity crisis and to efficiently assess the impact of climate change. This is why the need for advanced workflows and tools to annotate, mine, query, and discover new facts from the available literature is more than obvious,”
added Dr. Donat Agosti, President at Plazi.
“In the course of the BiCIKL project, at SIBiLS, we started indexing a larger set of biodiversity-related contents in the broad sense, including environmental sciences and ecology, to build a new literature database, or what we now call ‘Biodiversity PMC’. Now, with the help of Plazi and Pensoft, we provide a unique entry point to half a million taxonomic treatments, which were not included in the original PubMed Central. Next on the list is to expand our network of literature sources and continue this exponential growth of queryable biodiversity knowledge to turn Biodiversity PMC into the ‘One Health’ library. We promise to keep you posted,”
stated Dr. Patrick Ruch, Group Leader at SIB and Head of Research at HES-SO, HEG Geneva, Switzerland.
***
Follow the BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.
***
About the SIB Swiss Institute of Bioinformatics:
SIB is an internationally recognized non-profit organization dedicated to biological and biomedical data science. SIB’s data scientists are passionate about creating knowledge and solving complex questions in many fields, from biodiversity and evolution to
I always struggle to find relevant literature for my research. Can’t wait to learn some effective searching techniques!