#3685. MS-Transformer: Introduce multiple structural priors into a unified transformer for encoding sentences
October 2026 | publication date |
Proposal available till | 03-06-2025 |
4 total number of authors per manuscript | 0 $ |
The title of the journal is available only for the authors who have already paid for |
|
|
Journal’s subject area: |
Language and Linguistics;
Linguistics and Language;
Sociology and Political Science;
Speech and Hearing; |
Places in the authors’ list:
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)
Abstract:
Transformers have been widely utilized in recent NLP studies. Existing studies commonly apply one single mask strategy on Transformers for incorporating structural priors while failing at modeling more abundant structural information of texts. In this paper, we aim at introducing multiple types of structural priors into Transformers, proposing the Multiple Structural Priors Guided Transformer that transforms different structural priors into different attention heads by using a novel multi-mask based multi-head attention mechanism. We integrate two categories of structural priors, including the sequential order and the relative position of words. Experimental results on three tasks show that MS-Transformer achieves significant improvements against other strong baselines.
Keywords:
Natural language processing; Sentence representation; Transformer
Contacts :