The information bottleneck (IB) method aims to find compressed representations of a variable $X$ that retain the most relevant information about a target variable $Y$. We show that for a wide family of distributions -- namely, when $Y$ is generated by $X$ through a Hamming channel, under mild conditions -- the optimal IB representations require an alphabet strictly larger than that of $X$. This implies that, despite several recent works, the cardinality bound first identified by Witsenhausen and Wyner in 1975 is tight. At the core of our finding is the observation that the IB function in this setting is not strictly concave, similar to the deterministic case, even though the joint distribution of $X$ and $Y$ is of full support. Finally, we provide a complete characterization of the IB function, as well as of the optimal representations for the Hamming case.
翻译:暂无翻译