PUBLICATION: Finch: Prompt-guided key-value cache compression
Copyright ACL. Personal use of this material is permitted. The definitive version of this paper was published in EMNLP 2024, Conference on Empirical Methods in Natural Language Processing, 12-16 November 2024, Miami, Florida, USA / Also in TACL (Transactions of the Association for Computational Linguistics), Vol.12, 2024 and is available at : https://doi.org/10.1162/tacl_a_00716