Information Theory
See recent articles
- [1] arXiv:2406.18985 [pdf, html, other]
-
Title: Exploiting Structured Sparsity in Near Field: From the Perspective of DecompositionComments: This aricle has been accepted for publication in IEEE CommagSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The structured sparsity can be leveraged in traditional far-field channels, greatly facilitating efficient sparse channel recovery by compressing the complexity of overheads to the level of the scatterer number. However, when experiencing a fundamental shift from planar-wave-based far-field modeling to spherical-wave-based near-field modeling, whether these benefits persist in the near-field regime remains an open issue. To answer this question, this article delves into structured sparsity in the near-field realm, examining its peculiarities and challenges. In particular, we present the key features of near-field structured sparsity in contrast to the far-field counterpart, drawing from both physical and mathematical perspectives. Upon unmasking the theoretical bottlenecks, we resort to bypassing them by decoupling the geometric parameters of the scatterers, termed the triple parametric decomposition (TPD) framework. It is demonstrated that our novel TPD framework can achieve robust recovery of near-field sparse channels by applying the potential structured sparsity and avoiding the curse of complexity and overhead.
- [2] arXiv:2406.19002 [pdf, other]
-
Title: Coded Cooperative Networks for Semi-Decentralized Federated LearningSubjects: Information Theory (cs.IT)
To enhance straggler resilience in federated learning (FL) systems, a semi-decentralized approach has been recently proposed, enabling collaboration between clients. Unlike the existing semi-decentralized schemes, which adaptively adjust the collaboration weight according to the network topology, this letter proposes a deterministic coded network that leverages wireless diversity for semi-decentralized FL without requiring prior information about the entire network. Furthermore, the theoretical analyses of the outage and the convergence rate of the proposed scheme are provided. Finally, the superiority of our proposed method over benchmark methods is demonstrated through comprehensive simulations.
- [3] arXiv:2406.19026 [pdf, html, other]
-
Title: Completely decomposable rank-metric codesSubjects: Information Theory (cs.IT)
In this paper, we investigate completely decomposable rank-metric codes, i.e. rank-metric codes that are the direct sum of 1-dimensional maximum rank distance codes. We study the weight distribution of such codes, characterizing codewords with certain rank weights. Additionally, we obtain classification results for codes with the largest number of minimum weight codewords within the class of completely decomposable codes.
- [4] arXiv:2406.19084 [pdf, html, other]
-
Title: Spatial Multiplexing in Near-Field Line-of-Sight MIMO Communications: Paraxial and Non-Paraxial DeploymentsComments: This work has been accepted in IEEE Transactions on Green Communications and NetworkingSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Sixth generation (6G) wireless networks are envisioned to include aspects of energy footprint reduction (sustainability), besides those of network capacity and connectivity, at the design stage. This paradigm change requires radically new physical layer technologies. Notably, the integration of large-aperture arrays and the transmission over high frequency bands, such as the sub-terahertz spectrum, are two promising options. In many communication scenarios of practical interest, the use of large antenna arrays in the sub-terahertz frequency range often results in short-range transmission distances that are characterized by line-of-sight channels, in which pairs of transmitters and receivers are located in the (radiating) near field of one another. These features make the traditional designs, based on the far-field approximation, for multiple-input multiple-output (MIMO) systems sub-optimal in terms of spatial multiplexing gains. To overcome these limitations, new designs for MIMO systems are required, which account for the spherical wavefront that characterizes the electromagnetic waves in the near field, in order to ensure the highest spatial multiplexing gain without increasing the power expenditure. In this paper, we introduce an analytical framework for optimizing the deployment of antenna arrays in line-of-sight channels, which can be applied to paraxial and non-paraxial network deployments. In the paraxial setting, we devise a simpler analytical framework, which, compared to those available in the literature, provides explicit information about the impact of key design parameters. In the non-paraxial setting, we introduce a novel analytical framework that allows us to identify a set of sufficient conditions to be fulfilled for achieving the highest spatial multiplexing gain. The proposed designs are validated with numerical simulations.
- [5] arXiv:2406.19248 [pdf, html, other]
-
Title: Staggered Quantizers for Perfect Perceptual Quality: A Connection between Quantizers with Common Randomness and WithoutComments: 6 pages, 4 figures; to appear in the First "Learn to compression" Workshop @ ISIT 2024 as a spotlight paperSubjects: Information Theory (cs.IT)
The rate-distortion-perception (RDP) framework has attracted significant recent attention due to its application in neural compression. It is important to understand the underlying mechanism connecting procedures with common randomness and those without. Different from previous efforts, we study this problem from a quantizer design perspective. By analyzing an idealized setting, we provide an interpretation of the advantage of dithered quantization in the RDP setting, which further allows us to make a conceptual connection between randomized (dithered) quantizers and quantizers without common randomness. This new understanding leads to a new procedure for RDP coding based on staggered quantizers.
- [6] arXiv:2406.19334 [pdf, html, other]
-
Title: Multi-RIS-Empowered Multiple Access: A Distributed Sum-Rate Maximization ApproachComments: Submitted to an IEEE JournalSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The plethora of wirelessly connected devices, whose deployment density is expected to largely increase in the upcoming sixth Generation (6G) of wireless networks, will naturally necessitate substantial advances in multiple access schemes. Reconfigurable Intelligent Surfaces (RISs) constitute a candidate 6G technology capable to offer dynamic over-the-air signal propagation programmability, which can be optimized for efficient non-orthogonal access of a multitude of devices. In this paper, we study the downlink of a wideband communication system comprising multiple multi-antenna Base Stations (BSs), each wishing to serve an associated single-antenna user via the assistance of a Beyond Diagonal (BD) and frequency-selective RIS. Under the assumption that each BS performs Orthogonal Frequency Division Multiplexing (OFDM) transmissions and exclusively controls a distinct RIS, we focus on the sum-rate maximization problem and present a distributed joint design of the linear precoders at the BSs as well as the tunable capacitances and the switch selection matrices at the multiple BD RISs. The formulated non-convex design optimization problem is solved via successive concave approximation necessitating minimal cooperation among the BSs. Our extensive simulation results showcase the performance superiority of the proposed cooperative scheme over non-cooperation benchmarks, indicating the performance gains with BD RISs via the presented optimized frequency selective operation for various scenarios.
New submissions for Friday, 28 June 2024 (showing 6 of 6 entries )
- [7] arXiv:2406.18598 (cross-list from eess.SP) [pdf, html, other]
-
Title: CubeSat-Enabled Free-Space Optics: Joint Data Communication and Fine Beam TrackingComments: 13 pages, 7 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
The integration of CubeSats with Free Space Optical (FSO) links accelerates a major advancement in high-throughput, low-Earth orbit communication systems. However, CubeSats face challenges such as size, weight, and power (SWaP) limitations, as well as vibrations that cause fluctuations in the angle-of-arrival (AoA) of the optical beam at the receiver. These practical challenges make establishing CubeSat-assisted FSO links complicated. To mitigate AoA fluctuations, we expand the receiver's field of view and track the location of the focused beam spot using an array of avalanche photodiodes at the receiver. Initially, we model the optical channel between the transmitter and the detector array. Furthermore, to reduce the computational load of maximum likelihood sequence detection, which is infeasible for CubeSats due to SWaP constraints, we propose a sub-optimal blind sequence data detection approach that relies on the generalized likelihood ratio test (GLRT) criterion. We also utilize combining methods such as equal gain combining (EGC) and maximal ratio combining (MRC) for data detection, benchmarking their performance against the GLRT-based method. Numerical results demonstrate that the proposed low-complexity GLRT-based method outperforms the combining methods, achieving performance close to that of the ideal receiver.
- [8] arXiv:2406.18651 (cross-list from quant-ph) [pdf, html, other]
-
Title: Contraction of Private Quantum Channels and Private Quantum Hypothesis TestingComments: 36 pages; See independent work titled "Sample Complexity of Locally Differentially Private Quantum Hypothesis Testing" by Hao-Chung Cheng, Christoph Hirche, and Cambyse RouzéSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG); Machine Learning (stat.ML)
A quantum generalized divergence by definition satisfies the data-processing inequality; as such, the relative decrease in such a divergence under the action of a quantum channel is at most one. This relative decrease is formally known as the contraction coefficient of the channel and the divergence. Interestingly, there exist combinations of channels and divergences for which the contraction coefficient is strictly less than one. Furthermore, understanding the contraction coefficient is fundamental for the study of statistical tasks under privacy constraints. To this end, here we establish upper bounds on contraction coefficients for the hockey-stick divergence under privacy constraints, where privacy is quantified with respect to the quantum local differential privacy (QLDP) framework, and we fully characterize the contraction coefficient for the trace distance under privacy constraints. With the machinery developed, we also determine an upper bound on the contraction of both the Bures distance and quantum relative entropy relative to the normalized trace distance, under QLDP constraints. Next, we apply our findings to establish bounds on the sample complexity of quantum hypothesis testing under privacy constraints. Furthermore, we study various scenarios in which the sample complexity bounds are tight, while providing order-optimal quantum channels that achieve those bounds. Lastly, we show how private quantum channels provide fairness and Holevo information stability in quantum learning settings.
- [9] arXiv:2406.18655 (cross-list from quant-ph) [pdf, html, other]
-
Title: Localized statistics decoding: A parallel decoding algorithm for quantum low-density parity-check codesComments: 21 pages, 10 figuresSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Quantum low-density parity-check codes are a promising candidate for fault-tolerant quantum computing with considerably reduced overhead compared to the surface code. However, the lack of a practical decoding algorithm remains a barrier to their implementation. In this work, we introduce localized statistics decoding, a reliability-guided inversion decoder that is highly parallelizable and applicable to arbitrary quantum low-density parity-check codes. Our approach employs a parallel matrix factorization strategy, which we call on-the-fly elimination, to identify, validate, and solve local decoding regions on the decoding graph. Through numerical simulations, we show that localized statistics decoding matches the performance of state-of-the-art decoders while reducing the runtime complexity for operation in the sub-threshold regime. Importantly, our decoder is more amenable to implementation on specialized hardware, positioning it as a promising candidate for decoding real-time syndromes from experiments.
- [10] arXiv:2406.18658 (cross-list from quant-ph) [pdf, html, other]
-
Title: Sample Complexity of Locally Differentially Private Quantum Hypothesis TestingComments: 24 pages. Short version accepted at ISIT 2024. This work is independent and concurrent to "Contraction of Private Quantum Channels and Private Quantum Hypothesis Testing" by Theshani Nuradha and Mark M. WildeSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT)
Quantum state discrimination is an important problem in many information processing tasks. In this work we are concerned with finding its best possible sample complexity when the states are preprocessed by a quantum channel that is required to be locally differentially private. To that end we provide achievability and converse bounds for different settings. This includes symmetric state discrimination in various regimes and the asymmetric case. On the way, we also prove new sample complexity bounds for the general unconstrained setting. An important tool in this endeavor are new entropy inequalities that we believe to be of independent interest.
- [11] arXiv:2406.19060 (cross-list from quant-ph) [pdf, html, other]
-
Title: Semi-definite optimization of the measured relative entropies of quantum states and channelsComments: 33 pagesSubjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph); Optimization and Control (math.OC)
The measured relative entropies of quantum states and channels find operational significance in quantum information theory as achievable error rates in hypothesis testing tasks. They are of interest in the near term, as they correspond to hybrid quantum-classical strategies with technological requirements far less challenging to implement than required by the most general strategies allowed by quantum mechanics. In this paper, we prove that these measured relative entropies can be calculated efficiently by means of semi-definite programming, by making use of variational formulas for the measured relative entropies of states and semi-definite representations of the weighted geometric mean and the operator connection of the logarithm. Not only do the semi-definite programs output the optimal values of the measured relative entropies of states and channels, but they also provide numerical characterizations of optimal strategies for achieving them, which is of significant practical interest for designing hypothesis testing protocols.
- [12] arXiv:2406.19061 (cross-list from math.ST) [pdf, other]
-
Title: Entrywise dynamics and universality of general first order methodsSubjects: Statistics Theory (math.ST); Information Theory (cs.IT)
General first order methods (GFOMs), including various gradient descent and AMP algorithms, constitute a broad class of iterative algorithms in modern statistical learning problems. Some GFOMs also serve as constructive proof devices, iteratively characterizing the empirical distributions of statistical estimators in the large system limits for any fixed number of iterations.
This paper develops a non-asymptotic, entrywise characterization for a general class of GFOMs. Our characterizations capture the precise entrywise behavior of the GFOMs, and hold universally across a broad class of heterogeneous random matrix models. As a corollary, we provide the first non-asymptotic description of the empirical distributions of the GFOMs beyond Gaussian ensembles.
We demonstrate the utility of these general results in two applications. In the first application, we prove entrywise universality for regularized least squares estimators in the linear model, by controlling the entrywise error relative to a suitably constructed GFOM. This algorithmic proof method also leads to systematically improved averaged universality results for regularized regression estimators in the linear model, and resolves the universality conjecture for (regularized) MLEs in logistic regression. In the second application, we obtain entrywise Gaussian approximations for a class of gradient descent algorithms. Our approach provides non-asymptotic state evolution for the bias and variance of the algorithm along the iteration path, applicable for non-convex loss functions.
The proof relies on a new recursive leave-k-out method that provides almost delocalization for the GFOMs and their derivatives. Crucially, our method ensures entrywise universality for up to poly-logarithmic many iterations, which facilitates effective $\ell_2/\ell_\infty$ control between certain GFOMs and statistical estimators in applications.
Cross submissions for Friday, 28 June 2024 (showing 6 of 6 entries )
- [13] arXiv:2207.12653 (replaced) [pdf, html, other]
-
Title: Incremental Measurement of Structural Entropy for Dynamic GraphsSubjects: Information Theory (cs.IT)
Structural entropy is a metric that measures the amount of information embedded in graph structure data under a strategy of hierarchical abstracting. To measure the structural entropy of a dynamic graph, we need to decode the optimal encoding tree corresponding to the best community partitioning for each snapshot. However, the current methods do not support dynamic encoding tree updating and incremental structural entropy computation. To address this issue, we propose Incre-2dSE, a novel incremental measurement framework that dynamically adjusts the community partitioning and efficiently computes the updated structural entropy for each updated graph. Specifically, Incre-2dSE includes incremental algorithms based on two dynamic adjustment strategies for two-dimensional encoding trees, i.e., the naive adjustment strategy and the node-shifting adjustment strategy, which support theoretical analysis of updated structural entropy and incrementally optimize community partitioning towards a lower structural entropy. We conduct extensive experiments on 3 artificial datasets generated by Hawkes Process and 3 real-world datasets. Experimental results confirm that our incremental algorithms effectively capture the dynamic evolution of the communities, reduce time consumption, and provide great interpretability.
- [14] arXiv:2310.20504 (replaced) [pdf, other]
-
Title: SumComp: Coding for Digital Over-the-Air Computation via the Ring of IntegersSubjects: Information Theory (cs.IT)
Communication and computation are traditionally treated as separate entities, allowing for individual optimizations. However, many applications focus on local information's functionality rather than the information itself. For such cases, harnessing interference for computation in a multiple access channel through digital over-the-air computation can notably increase the computation, as established by the ChannelComp method. However, the coding scheme originally proposed in ChannelComp may suffer from high computational complexity because it is general and is not optimized for specific modulation categories. Therefore, this study considers a specific category of digital modulations for over-the-air computations, QAM and PAM, for which we introduce a novel coding scheme called SumComp. Furthermore, we derive an MSE analysis for SumComp coding in the computation of the arithmetic mean function and establish an upper bound on the MAE for a set of nomographic functions. Simulation results affirm the superior performance of SumComp coding compared to traditional analog over-the-air computation and the original coding in ChannelComp approaches regarding both MSE and MAE over a noisy multiple access channel. Specifically, SumComp coding shows approximately $10$ dB improvements for computing arithmetic and geometric mean on the normalized MSE for low noise scenarios.
- [15] arXiv:2312.02042 (replaced) [pdf, html, other]
-
Title: Kirchhoff Meets Johnson: In Pursuit of Unconditionally Secure CommunicationComments: 13 pages, 8 figures, 1 table, Wiley Engineering Reports (to appear)Journal-ref: Wiley Engineering Reports, 2024Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Noise: an enemy to be dealt with and a major factor limiting communication system performance. However, what if there is gold in that garbage? In conventional engineering, our focus is primarily on eliminating, suppressing, combating, or even ignoring noise and its detrimental impacts. Conversely, could we exploit it similarly to biology, which utilizes noise-alike carrier signals to convey information? In this context, the utilization of noise, or noise-alike signals in general, has been put forward as a means to realize unconditionally secure communication systems in the future. In this tutorial article, we begin by tracing the origins of thermal noise-based communication and highlighting one of its significant applications for ensuring unconditionally secure networks: the Kirchhoff-law-Johnson-noise (KLJN) secure key exchange scheme. We then delve into the inherent challenges tied to secure communication and discuss the imperative need for physics-based key distribution schemes in pursuit of unconditional security. Concurrently, we provide a concise overview of quantum key distribution (QKD) schemes and draw comparisons with their KLJN-based counterparts. Finally, extending beyond wired communication loops, we explore the transmission of noise signals over-the-air and evaluate their potential for stealth and secure wireless communication systems.
- [16] arXiv:2401.02118 (replaced) [pdf, html, other]
-
Title: Radio Map-Based Spectrum Sharing for Joint Communication and SensingSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
The sixth-generation (6G) network is expected to provide both communication and sensing (C&S) services. However, spectrum scarcity poses a major challenge to the harmonious coexistence of C&S systems. Without effective cooperation, the interference resulting from spectrum sharing impairs the performance of both systems. This paper addresses C&S interference within a distributed network. Different from traditional schemes that require pilot-based high-frequency interactions between C&S systems, we introduce a third party named the radio map to provide the large-scale channel state information (CSI). With large-scale CSI, we optimize the transmit power of C&S systems to maximize the signal-to-interference-plus-noise ratio (SINR) for the radar detection, while meeting the ergodic rate requirement of the interfered user. Given the non-convexity of both the objective and constraint, we employ the techniques of auxiliary-function-based scaling and fractional programming for simplification. Subsequently, we propose an iterative algorithm to solve this problem. Simulation results corroborate our idea that the extrinsic information, i.e., positions and surroundings, is effective to decouple C&S interference.
- [17] arXiv:2404.00628 (replaced) [pdf, html, other]
-
Title: Fluid Antenna Relay Assisted Communication Systems Through Antenna Location OptimizationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In this paper, we investigate the problem of resource allocation for fluid antenna relay (FAR) system with antenna location optimization. In the considered model, each user transmits information to a base station (BS) with help of FAR. The antenna location of the FAR is flexible and can be adapted to dynamic location distribution of the users. We formulate a sum rate maximization problem through jointly optimizing the antenna location and bandwidth allocation with meeting the minimum rate requirements, total bandwidth budget, and feasible antenna region constraints. To solve this problem, we obtain the optimal bandwidth in closed form. Based on the optimal bandwidth, the original problem is reduced to the antenna location optimization problem and an alternating algorithm is proposed. Simulation results verify the effectiveness of the proposed algorithm and the sum rate can be increased by up to 125% compared to the conventional schemes.
- [18] arXiv:2404.13685 (replaced) [pdf, html, other]
-
Title: Second-Order Identification Capacity of AWGN ChannelsComments: 7 pages, 3 figures, 1 table. This paper has been accepted by IEEE ISIT 2024. In response to the reviewer's feedback, we have incorporated additional references and refined the content in the introduction section and we replace some figures as the pdf formSubjects: Information Theory (cs.IT)
In this paper, we establish the second-order randomized identification capacity (RID capacity) of the Additive White Gaussian Noise Channel (AWGNC). On the one hand, we obtain a refined version of Hayashi's theorem to prove the achievability part. On the other, we investigate the relationship between identification and channel resolvability, then we propose a finer quantization method to prove the converse part. Consequently, the second-order RID capacity of the AWGNC has the same form as the second-order transmission capacity. The only difference is that the maximum number of messages in RID scales double exponentially in the blocklength.
- [19] arXiv:2405.07665 (replaced) [pdf, html, other]
-
Title: Partial information decomposition: redundancy as information bottleneckComments: Entropy, 2024Subjects: Information Theory (cs.IT); Machine Learning (stat.ML)
The partial information decomposition (PID) aims to quantify the amount of redundant information that a set of sources provides about a target. Here, we show that this goal can be formulated as a type of information bottleneck (IB) problem, termed the "redundancy bottleneck" (RB). The RB formalizes a tradeoff between prediction and compression: it extracts information from the sources that best predict the target, without revealing which source provided the information. It can be understood as a generalization of "Blackwell redundancy", which we previously proposed as a principled measure of PID redundancy. The "RB curve" quantifies the prediction--compression tradeoff at multiple scales. This curve can also be quantified for individual sources, allowing subsets of redundant sources to be identified without combinatorial optimization. We provide an efficient iterative algorithm for computing the RB curve.
- [20] arXiv:2111.06343 (replaced) [pdf, html, other]
-
Title: Reliability Function of Quantum Information Decoupling via the Sandwiched R\'enyi DivergenceComments: V3: close to published version. V2: presentation improved with a new title, decoupling via measurement added, results and proofs of V1 unchangedJournal-ref: Commun. Math. Phys. 405, 160, (2024)Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)
Quantum information decoupling is a fundamental quantum information processing task, which also serves as a crucial tool in a diversity of topics in quantum physics. In this paper, we characterize the reliability function of catalytic quantum information decoupling, that is, the best exponential rate under which perfect decoupling is asymptotically approached. We have obtained the exact formula when the decoupling cost is below a critical value. In the situation of high cost, we provide meaningful upper and lower bounds. This result is then applied to quantum state merging, exploiting its inherent connection to decoupling. In addition, as technical tools, we derive the exact exponents for the smoothing of the conditional min-entropy and max-information, and we prove a novel bound for the convex-split lemma. Our results are given in terms of the sandwiched Rényi divergence, providing it with a new type of operational meaning in characterizing how fast the performance of quantum information tasks approaches the perfect.
- [21] arXiv:2209.00555 (replaced) [pdf, html, other]
-
Title: Strong Converse Exponent for Entanglement-Assisted CommunicationComments: V3: close to published version. V2: minor changes, presentation improvedJournal-ref: IEEE Tran. Inf. Theory 70(7), 5017-5029 (2024)Subjects: Quantum Physics (quant-ph); Information Theory (cs.IT); Mathematical Physics (math-ph)
We determine the exact strong converse exponent for entanglement-assisted classical communication of a quantum channel. Our main contribution is the derivation of an upper bound for the strong converse exponent which is characterized by the sandwiched Rényi divergence. It turns out that this upper bound coincides with the lower bound of Gupta and Wilde (Commun. Math. Phys. 334:867-887, 2015). Thus, the strong converse exponent follows from the combination of these two bounds. Our result has two implications. Firstly, it implies that the exponential bound for the strong converse property of quantum-feedback-assisted classical communication, derived by Cooney, Mosonyi and Wilde (Commun. Math. Phys. 344:797-829, 2016), is optimal. This answers their open question in the affirmative. Hence, we have determined the exact strong converse exponent for this problem as well. Secondly, due to an observation of Leung and Matthews, it can be easily extended to deal with the transmission of quantum information under the assistance of entanglement or quantum feedback, yielding similar results. The above findings provide, for the first time, a complete operational interpretation to the channel's sandwiched Rényi information of order $\alpha > 1$.
- [22] arXiv:2305.11957 (replaced) [pdf, html, other]
-
Title: Towards understanding neural collapse in supervised contrastive learning with the information bottleneck methodSubjects: Machine Learning (cs.LG); Information Theory (cs.IT)
Neural collapse describes the geometry of activation in the final layer of a deep neural network when it is trained beyond performance plateaus. Open questions include whether neural collapse leads to better generalization and, if so, why and how training beyond the plateau helps. We model neural collapse as an information bottleneck (IB) problem in order to investigate whether such a compact representation exists and discover its connection to generalization. We demonstrate that neural collapse leads to good generalization specifically when it approaches an optimal IB solution of the classification problem. Recent research has shown that two deep neural networks independently trained with the same contrastive loss objective are linearly identifiable, meaning that the resulting representations are equivalent up to a matrix transformation. We leverage linear identifiability to approximate an analytical solution of the IB problem. This approximation demonstrates that when class means exhibit $K$-simplex Equiangular Tight Frame (ETF) behavior (e.g., $K$=10 for CIFAR10 and $K$=100 for CIFAR100), they coincide with the critical phase transitions of the corresponding IB problem. The performance plateau occurs once the optimal solution for the IB problem includes all of these phase transitions. We also show that the resulting $K$-simplex ETF can be packed into a $K$-dimensional Gaussian distribution using supervised contrastive learning with a ResNet50 backbone. This geometry suggests that the $K$-simplex ETF learned by supervised contrastive learning approximates the optimal features for source coding. Hence, there is a direct correspondence between optimal IB solutions and generalization in contrastive learning.
- [23] arXiv:2308.03658 (replaced) [pdf, html, other]
-
Title: Control-Oriented Deep Space Communications For Unmanned Space ExplorationSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Systems and Control (eess.SY)
In unmanned space exploration, the cooperation among space robots requires advanced communication techniques. In this paper, we propose a communication optimization scheme for a specific cooperation system named the "mother-daughter system". In this setup, the mother spacecraft orbits the planet, while daughter probes are distributed across the planetary surface. During each control cycle, the mother spacecraft senses the environment, computes control commands and distributes them to daughter probes for actions. They synergistically form sensing-communication-computing-control ($\mathbf{SC^3}$) loops. Given the indivisibility of the $\mathbf{SC^3}$ loop, we optimize the mother-daughter downlink for closed-loop control. The optimization objective is the linear quadratic regulator (LQR) cost, and the optimization parameters are the block length and transmit power. To solve the nonlinear mixed-integer problem, we first identify the optimal block length and then transform the power allocation problem into a tractable convex problem. We further derive the approximate closed-form solutions for the proposed scheme and two communication-oriented schemes: the max-sum rate scheme and the max-min rate scheme. On this basis, we analyze their power allocation principles. In particular, for time-insensitive control tasks, we find that the proposed scheme demonstrates equivalence to the max-min rate scheme. These findings are verified through simulations.
- [24] arXiv:2403.19379 (replaced) [pdf, html, other]
-
Title: Optimal Pilot Design for OTFS in Linear Time-Varying ChannelsComments: 13 pages, 8 figures, submitted to IEEE Transactions on Wireless CommunicationsSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This paper investigates the positioning of the pilot symbols, as well as the power distribution between the pilot and the communication symbols in the OTFS modulation scheme. We analyze the pilot placements that minimize the mean squared error (MSE) in estimating the channel taps. In addition, we optimize the average channel capacity by adjusting the power balance. We show that this leads to a significant increase in average capacity. The results provide valuable guidance for designing the OTFS parameters to achieve maximum capacity. Numerical simulations are performed to validate the findings.
- [25] arXiv:2405.10930 (replaced) [pdf, other]
-
Title: Submodular Information Selection for Hypothesis Testing with Misclassification PenaltiesComments: 21 pages, 4 figuresSubjects: Machine Learning (stat.ML); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)
We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis remains bounded and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under certain assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.
- [26] arXiv:2405.12937 (replaced) [pdf, html, other]
-
Title: Asymptotic analysis of sum-rate under SICSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Limitation of the cost of coordination and contention among a large number of nodes calls for grant-free approaches, exploiting physical layer techniques to solve collisions. Successive Interference Cancellation (SIC) is becoming a key building block of multiple access channel receiver, in an effort to support massive Internet of Things (IoT). In this paper, we explore the large-scale performance of SIC in a theoretical framework. A general model of a SIC receiver is stated for a shared channel with $n$ transmitters. The asymptotic sum-rate performance is characterized as $n \rightarrow \infty$, for a suitably scaled target Signal to Noise Interference Ratio (SNIR). The probability distribution of the number of correctly decoded packets is shown to tend to a deterministic distribution asymptotically for large values of $n$. The asymptotic analysis is carried out for any probability distribution of the wireless channel gain, assuming that the average received power level is same for all nodes, through power control.
- [27] arXiv:2406.03072 (replaced) [pdf, other]
-
Title: Local to Global: Learning Dynamics and Effect of Initialization for TransformersAshok Vardhan Makkuva, Marco Bondaschi, Chanakya Ekbote, Adway Girish, Alliot Nagle, Hyeji Kim, Michael GastparSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current understanding in this regard remains limited with many fundamental questions about how transformers learn Markov chains still unanswered. In this paper, we address this by focusing on first-order Markov chains and single-layer transformers, providing a comprehensive characterization of the learning dynamics in this context. Specifically, we prove that transformer parameters trained on next-token prediction loss can either converge to global or local minima, contingent on the initialization and the Markovian data properties, and we characterize the precise conditions under which this occurs. To the best of our knowledge, this is the first result of its kind highlighting the role of initialization. We further demonstrate that our theoretical findings are corroborated by empirical evidence. Based on these insights, we provide guidelines for the initialization of transformer parameters and demonstrate their effectiveness. Finally, we outline several open problems in this arena. Code is available at: this https URL.