Computer Science > Computation and Language
[Submitted on 3 Jan 2024 (this version), latest version 25 Apr 2024 (v3)]
Title:A First Look at Information Highlighting in Stack Overflow Answers
View PDF HTML (experimental)Abstract:Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN and BERT models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models based on CNN architecture achieve precision ranging from 0.71 to 0.82. The trained model for automatic code content highlighting achieves a recall of 0.73 and an F1 score of 0.71, outperforming the trained models for other formatting styles. The BERT models have even lower recalls and F1 scores than the CNN models. Our analysis of failure cases indicates that the majority of the failure cases are missing identification (i.e., the model misses the content that is supposed to be highlighted) due to the models tend to learn the frequently highlighted words while struggling to learn less frequent words. Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.
Submission history
From: Shahla Shaan Ahmed [view email][v1] Wed, 3 Jan 2024 00:13:52 UTC (280 KB)
[v2] Thu, 1 Feb 2024 01:32:48 UTC (280 KB)
[v3] Thu, 25 Apr 2024 22:18:27 UTC (1,001 KB)
Current browse context:
cs.CL
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.