LLM-CI: Assessing Contextual Integrity Norms in Language Models

Shvartzshnaider, Yan; Duddu, Vasisht; Lacalamita, John

Computer Science > Machine Learning

arXiv:2409.03735 (cs)

[Submitted on 5 Sep 2024]

Title:LLM-CI: Assessing Contextual Integrity Norms in Language Models

Authors:Yan Shvartzshnaider, Vasisht Duddu, John Lacalamita

View PDF HTML (experimental)

Abstract:Large language models (LLMs), while memorizing parts of their training data scraped from the Internet, may also inadvertently encode societal preferences and norms. As these models are integrated into sociotechnical systems, it is crucial that the norms they encode align with societal expectations. These norms could vary across models, hyperparameters, optimization techniques, and datasets. This is especially challenging due to prompt sensitivity$-$small variations in prompts yield different responses, rendering existing assessment methodologies unreliable. There is a need for a comprehensive framework covering various models, optimization, and datasets, along with a reliable methodology to assess encoded norms.
We present LLM-CI, the first open-sourced framework to assess privacy norms encoded in LLMs. LLM-CI uses a Contextual Integrity-based factorial vignette methodology to assess the encoded norms across different contexts and LLMs. We propose the multi-prompt assessment methodology to address prompt sensitivity by assessing the norms from only the prompts that yield consistent responses across multiple variants. Using LLM-CI and our proposed methodology, we comprehensively evaluate LLMs using IoT and COPPA vignettes datasets from prior work, examining the impact of model properties (e.g., hyperparameters, capacity) and optimization strategies (e.g., alignment, quantization).

Comments:	20 pages, 8 Figures, 4 Tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Cite as:	arXiv:2409.03735 [cs.LG]
	(or arXiv:2409.03735v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.03735

Submission history

From: Yan Shvartzshnaider [view email]
[v1] Thu, 5 Sep 2024 17:50:31 UTC (3,305 KB)

Computer Science > Machine Learning

Title:LLM-CI: Assessing Contextual Integrity Norms in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:LLM-CI: Assessing Contextual Integrity Norms in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators