MATILDA: Mining approximate tuple-generating dependencies in large databases

Amat, François; Paris, Pierre-Henri; Papotti, Paolo; Suchanek, Fabian M.
ACM Transactions on Intelligent Systems and Technology (TIST), 2 April 2026

First-Order Tuple-generating dependencies (TGDs) are general constraints on databases that subsume Horn rules, foreign key constraints, and inclusion dependencies. We introduce an approach to automatically mine these dependencies from a given database, subject to a number of constraints. Our method employs tailored pruning techniques and a graph representation to deal with the exponential size of the search space. It can mine both exact and approximate rules by employing support and confidence metrics. Experiments on various benchmark datasets demonstrate that (1) our approach can find all inclusion dependencies and Horn rules that competing approaches find, (2) it scales to databases that are out of reach for competing approaches, and (3) it finds a small portion of more complex patterns (TGDs with multiple atoms in the rule head) that other methods cannot find by design. Overall, MATILDA mines about 50% more joinable patterns than competing systems across 35 benchmark databases while keeping runtimes competitive (sub-second to a few minutes depending on the database size).


DOI
Type:
Journal
Date:
2026-04-02
Department:
Data Science
Eurecom Ref:
8750
Copyright:
© ACM, 2026. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Intelligent Systems and Technology (TIST), 2 April 2026 https://doi.org/10.1145/3815190
See also:

PERMALINK : https://www.eurecom.fr/publication/8750