TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information

Zhang, Jinghong; Cui, Yidong; Wang, Weiling; Cheng, Xianyou

Computer Science > Information Retrieval

arXiv:2502.15714 (cs)

[Submitted on 25 Jan 2025]

Title:TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information

Authors:Jinghong Zhang, Yidong Cui, Weiling Wang, Xianyou Cheng

View PDF HTML (experimental)

Abstract:With the advancement of technology and changes in the market, the demand for the construction of domain-specific knowledge bases has been increasing, either to improve model performance or to promote enterprise innovation and competitiveness. The construction of domain-specific knowledge bases typically relies on web crawlers or existing industry databases, leading to problems with accuracy and consistency of the data. To address these challenges, we considered the characteristics of domain data, where internal knowledge is interconnected, and proposed the Self-Natural Language Inference Data Filtering (self-nli-TDF) framework. This framework compares trusted filtered knowledge with the data to be filtered, deducing the reasoning relationship between them, thus improving filtering performance. The framework uses plug-and-play large language models for trustworthiness assessment and employs the RoBERTa-MNLI model from the NLI domain for reasoning. We constructed three datasets in the domains of biology, radiation, and science, and conducted experiments using RoBERTa, GPT3.5, and the local Qwen2 model. The experimental results show that this framework improves filter quality, producing more consistent and reliable filtering results.

Comments:	12 pages, 8 figures, submitted to IEEE Transactions on Knowledge and Data Engineering
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2502.15714 [cs.IR]
	(or arXiv:2502.15714v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2502.15714

Submission history

From: Yidong Cui [view email]
[v1] Sat, 25 Jan 2025 04:18:35 UTC (1,434 KB)

Computer Science > Information Retrieval

Title:TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators