A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Jose, Marcelo Archanjo; Cozman, Fabio Gagliardi

doi:10.1007/s41870-023-01342-3

Computer Science > Artificial Intelligence

arXiv:2306.14256 (cs)

[Submitted on 25 Jun 2023]

Title:A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Authors:Marcelo Archanjo Jose, Fabio Gagliardi Cozman

View PDF

Abstract:Long sequences of text are challenging in the context of transformers, due to quadratic memory increase in the self-attention mechanism. As this issue directly affects the translation from natural language to SQL queries (as techniques usually take as input a concatenated text with the question and the database schema), we present techniques that allow long text sequences to be handled by transformers with up to 512 input tokens. We propose a training process with database schema pruning (removal of tables and columns names that are useless for the query of interest). In addition, we used a multilingual approach with the mT5-large model fine-tuned with a data-augmented Spider dataset in four languages simultaneously: English, Portuguese, Spanish, and French. Our proposed technique used the Spider dataset and increased the exact set match accuracy results from 0.718 to 0.736 in a validation dataset (Dev). Source code, evaluations, and checkpoints are available at: \underline{this https URL}.

Comments:	This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in International Journal of Information Technology, and is available online at this https URL . SharedIt link: this https URL
Subjects:	Artificial Intelligence (cs.AI)
MSC classes:	68T07, 68T50
ACM classes:	I.2.7; H.3.3
Cite as:	arXiv:2306.14256 [cs.AI]
	(or arXiv:2306.14256v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2306.14256
Related DOI:	https://doi.org/10.1007/s41870-023-01342-3

Submission history

From: Marcelo José Sc.D. [view email]
[v1] Sun, 25 Jun 2023 14:28:12 UTC (169 KB)

Computer Science > Artificial Intelligence

Title:A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators