Digital Libraries
See recent articles
- [1] arXiv:2407.02533 [pdf, html, other]
-
Title: Determining Research Priorities Using Machine LearningComments: 11 pages, 10 figuresSubjects: Digital Libraries (cs.DL); Instrumentation and Methods for Astrophysics (astro-ph.IM)
We summarize our exploratory investigation into whether Machine Learning (ML) techniques applied to publicly available professional text can substantially augment strategic planning for astronomy. We find that an approach based on Latent Dirichlet Allocation (LDA) using content drawn from astronomy journal papers can be used to infer high-priority research areas. While the LDA models are challenging to interpret, we find that they may be strongly associated with meaningful keywords and scientific papers which allow for human interpretation of the topic models.
Significant correlation is found between the results of applying these models to the previous decade of astronomical research ("1998-2010" corpus) and the contents of the science frontier panel report which contains high-priority research areas identified by the 2010 National Academies' Astronomy and Astrophysics Decadal Survey ("DS2010" corpus). Significant correlations also exist between model results of the 1998-2010 corpus and the submitted whitepapers to the Decadal Survey ("whitepapers" corpus). Importantly, we derive predictive metrics based on these results which can provide leading indicators of which content modeled by the topic models will become highly cited in the future. Using these identified metrics and the associations between papers and topic models it is possible to identify important papers for planners to consider.
A preliminary version of our work was presented by Thronson etal. 2021 and Thomas etal. 2022. - [2] arXiv:2407.03192 [pdf, other]
-
Title: CiteAssist: A System for Automated Preprint Citation and BibTeX GenerationComments: Published at SDProc @ ACL 2024Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL)
We present CiteAssist, a system to automate the generation of BibTeX entries for preprints, streamlining the process of bibliographic annotation. Our system extracts metadata, such as author names, titles, publication dates, and keywords, to create standardized annotations within the document. CiteAssist automatically attaches the BibTeX citation to the end of a PDF and links it on the first page of the document so other researchers gain immediate access to the correct citation of the article. This method promotes platform flexibility by ensuring that annotations remain accessible regardless of the repository used to publish or access the preprint. The annotations remain available even if the preprint is viewed externally to CiteAssist. Additionally, the system adds relevant related papers based on extracted keywords to the preprint, providing researchers with additional publications besides those in related work for further reading. Researchers can enhance their preprints organization and reference management workflows through a free and publicly available web interface.
New submissions for Thursday, 4 July 2024 (showing 2 of 2 entries )
- [3] arXiv:2402.04854 (replaced) [pdf, html, other]
-
Title: Hierarchical Tree-structured Knowledge Graph For Academic Insight SurveyComments: This paper has been accepted by 'The 18TH International Conference on INnovations in Intelligent SysTems and Applications (INISTA 2024)'Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL); Machine Learning (cs.LG)
Research surveys have always posed a challenge for beginner researchers who lack of research training. These researchers struggle to understand the directions within their research topic, and the discovery of new research findings within a short time. One way to provide intuitive assistance to beginner researchers is by offering relevant knowledge graphs(KG) and recommending related academic papers. However, existing navigation knowledge graphs primarily rely on keywords in the research field and often fail to present the logical hierarchy among multiple related papers clearly. Moreover, most recommendation systems for academic papers simply rely on high text similarity, which can leave researchers confused as to why a particular article is being recommended. They may lack of grasp important information about the insight connection between "Issue resolved" and "Issue finding" that they hope to obtain. To address these issues, this study aims to support research insight surveys for beginner researchers by establishing a hierarchical tree-structured knowledge graph that reflects the inheritance insight of research topics and the relevance insight among the academic papers.
- [4] arXiv:2406.07016 (replaced) [pdf, html, other]
-
Title: Delving into ChatGPT usage in academic writing through excess vocabularyComments: v2: Updating dataset, figures and numbers to include all PubMed abstracts until end of June 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Digital Libraries (cs.DL); Social and Information Networks (cs.SI)
Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.