#5889. MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences

July 2026publication date
Proposal available till 19-05-2025
4 total number of authors per manuscript0 $

The title of the journal is available only for the authors who have already paid for
Journal’s subject area:
Theoretical Computer Science;
Human-Computer Interaction;
Software;
Places in the authors’ list:
place 1place 2place 3place 4
FreeFreeFreeFree
2350 $1200 $1050 $900 $
Contract5889.1 Contract5889.2 Contract5889.3 Contract5889.4
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)

Abstract:
Transformers have been widely utilized in recent NLP studies. Unlike CNNs or RNNs, the vanilla Transformer is position-insensitive, and thus is incapable of capturing the structural priors between sequences of words. Existing studies commonly apply one single mask strategy on Transformers for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into Transformers, proposing the Multiple Structural Priors Guided Transformer (MS-Transformer) that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. In particular, we integrate two categories of structural priors, including the sequential order and the relative position of words. For the purpose of capturing the latent hierarchical structure of the texts, we extract these information not only from the word contexts but also from the dependency syntax trees.
Keywords:
Natural language processing; Sentence representation; Transformer

Contacts :
0