Across 12 benchmarks, our study brings together 30 public LLMs that originate from diverse organizations. Remarkably, we find that LLMs' intelligence – reflected by average benchmark scores – almost linearly correlates with their ability to compress external text corpora. These results provide concrete evidence supporting the belief that superior compression indicates greater intelligence. Furthermore, our findings suggest that compression efficiency, as an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure that is linearly associated with the model capabilities.
While intelligence *leverages* compression in important ways in representation learning, intelligence and compression are by nature opposite in key aspects. Because intelligence is all about *generalization to future data (out of distribution)* while compression is all about *efficiently fitting the distribution of past data*. If you're optimal at the latter, you're terrible at the former. If you were an optimal compression algorithm, the behavior policy you would develop during the first 10 years of your life (maximizing your extrinsic rewards such as candy intake, while forgetting all information that appears useless as per past rewards) would be entirely inadequate to handle the next 10.
1h 30m