LogIn / Registration

| : 2247.4

Articles for sale

Head office address: Moscow, 123317, Moscow City, 8th Floor, Presnenskaya Embankment, 6, bldg. 2
article@123mi.ru

#5841. Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering

September 2026	publication date
Proposal available till	16-07-2025
4 total number of authors per manuscript	0 $

The title of the journal is available only for the authors who have already paid for

Journal’s subject area:

Computer Vision and Pattern Recognition;
Signal Processing;

Places in the authors’ list:

place 1	place 2	place 3	place 4
Free	Free	Free	Free
2350 $	1200 $	1050 $	900 $
Contract №5841.1	Contract №5841.2	Contract №5841.3	Contract №5841.4

1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)

Abstract:
Visual Question Answering (VQA) is a multi-modal challenging task that accepts an image and a natural language question about that image as inputs and desires to find the correct answer. This AI-complete task necessitates the fine-grained joint understanding of the two input modalities. Inspired by the success of attention mechanism in the task of efficient comprehension of visual-language features for VQA, this paper proposes a Multi-Tier Attention Network (MTAN) with the major component being term-weighted question-guided visual attention. Additionally, we introduce a novel Supervised Term Weighting (STW) scheme named ‘qf.obj.cos’ to semantically weight words utilizing the notion of visual object detection. This can be generalized to other vision-language comprehension tasks like image captioning, text-to-image-retrieval, multi-modal summarization etc. In effect, the proposed system allows the generation of more discriminative visual features from the progressive steps of question guided visual attention where question embedding is indeed guided by semantic term weighting. MTAN is quantitatively and qualitatively evaluated on the benchmark DAQUAR dataset and an extensive set of ablations are studied to demonstrate the individual significance of each of the components of the system.
Keywords:
Attention mechanism; Deep learning; Semantic similarity; Supervised term weighting; Visual Question Answering

Contacts :

Contact Info

Office

Sign up for a meeting through a call center: help@buy-sell-article.com
,