Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Mondal, Shanka Subhra; Frankland, Steven; Webb, Taylor; Cohen, Jonathan D.

Computer Science > Machine Learning

arXiv:2305.18417v2 (cs)

[Submitted on 28 May 2023 (v1), revised 18 Jan 2024 (this version, v2), latest version 23 Jan 2024 (v3)]

Title:Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Authors:Shanka Subhra Mondal, Steven Frankland, Taylor Webb, Jonathan D. Cohen

View PDF HTML (experimental)

Abstract:Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization-successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid cell code (e.g., in entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), that we call DPP attention (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.

Comments:	29 pages (including Appendix), 21 figures
Subjects:	Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Cite as:	arXiv:2305.18417 [cs.LG]
	(or arXiv:2305.18417v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.18417

Submission history

From: Shanka Subhra Mondal [view email]
[v1] Sun, 28 May 2023 19:07:55 UTC (13,439 KB)
[v2] Thu, 18 Jan 2024 15:50:01 UTC (16,516 KB)
[v3] Tue, 23 Jan 2024 10:50:06 UTC (16,516 KB)

Computer Science > Machine Learning

Title:Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators