Evaluating SQL selection/projection over table embeddings

Mellouli, Mariam; Papotti, Paolo
TADA 2025, 3rd International Workshop on Tabular Data Analysis (TaDA), collocated with the 51th International Conference on Very Large Data Bases (VLDB 2025), 5 September 2025, London, UK

Word embeddings are a powerful technique for representing and analyzing textual data in natural language processing (NLP) tasks. Notably, they possess a word-analogy property that represents realworld relationships through geometric operations in the embedding space. We study this property in the setting where embeddings are generated from relational databases. Our study aims to determine whether existing methods to obtain embeddings of relational tables preserve the inherent relationships of the relational model, akin to the word-analogy property in natural language text. By treating the learned vector space itself as an execution substrate, we ask a simple yet unexplored question: can an embedding faithfully stand in for the DBMS when answering basic SQL? To test this hypothesis, we develop a framework that assesses the capability of embeddings of relational data to answer SQL queries involving selection and projection. This framework encompasses the generation of embeddings, the querying of these embeddings through SQL, and the evaluation of the results. Our findings indicate that embedding methods that pretrain on the table can capture and reflect the original table/row/attribute relationships.


Type:
Conference
City:
London
Date:
2025-09-05
Department:
Data Science
Eurecom Ref:
8412
Copyright:
VLDB
See also:

PERMALINK : https://www.eurecom.fr/publication/8412