Information estimation with discrete diffusion

ICLR 2026, 14th International Conference on Learning Representations, 23-27 April 2026, Rio de Janeiro, Brazil

Information-theoretic measures, such as Mutual Information (MI), play a crucial role in understanding non-linear relationships between random variables and are widely used across scientific disciplines. Yet, their use on real-world discrete data remains challenging. Existing methods typically rely on embedding discrete data into a continuous space and apply neural estimators originally designed for continuous distributions. This process requires careful engineering for both the embedding model and estimator architecture, but suffers from issues related to high data dimensionality. In this work, we introduce InfoSEDD, a discrete diffusion–based approach that bridges information-theoretic estimation and generative modeling such that they can be used to compute Kullback–Leibler divergences. Backed by Continuous Time Markov Chains theory principles, the design of InfoSEDD is lightweight and scalable and allows seamless integration with pretrained models. We showcase the versatility of our approach through applications on motif discovery in genetic promoter data, semantic-aware model selection in text summarization, and entropy estimation in Ising models. Finally, we construct consistency tests on real-world textual and genomics data. Our experiments demonstrate that InfoSEDD outperforms alternatives that rely on the ''embedding trick''. Our results position InfoSEDD as a robust and scalable tool for information-theoretic analysis of discrete data.


Type:
Conférence
City:
Rio de Janeiro
Date:
2026-04-23
Department:
Data Science
Eurecom Ref:
8591
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in ICLR 2026, 14th International Conference on Learning Representations, 23-27 April 2026, Rio de Janeiro, Brazil and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/8591