Subsequence-based indices for genome sequence analysis

Buzzega, Giovanni; Conte, Alessio; Guerrini, Veronica; Punzi, Giulia; Rosone, Giovanna; Tattini, Lorenzo
Open Access Series in Informatics (OASICS), Volume 132 (2025), Article N°20

Compact indices are a fundamental tool in string analysis, even more so in bioinformatics, where genomic sequences can reach billions in length. This paper presents some recent results in which Roberto Grossi has been involved, showing how some of these indices do more than just efficiently represent data, but rather are able to bring out salient information within it, which can be exploited for their downstream analysis. Specifically, we first review a recently-introduced method [Guerrini et al., 2023] that employs the Burrows-Wheeler Transform to build reasonably accurate phylogenetic trees in an assembly-free scenario. We then describe a recent practical tool [Buzzega et al., 2025] for
indexing Maximal Common Subsequences between strings, which can enable analysis of genomic sequence similarity. Experimentally, we show that the results produced by the one index are consistent with the expectations about the results of the other index.

Type:
Journal
Date:
2025-08-13
Department:
Data Science
Eurecom Ref:
8329
Copyright:
© ACM, 2025. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Open Access Series in Informatics (OASICS), Volume 132 (2025), Article N°20
See also:

PERMALINK : https://www.eurecom.fr/publication/8329