Please use this identifier to cite or link to this item: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16085
Title: Optimize Image Caption Generation Techniques
Authors: Indra Kumar, R
Ezil Sam Leni, A
Keywords: Automated Image Captioning
Transformer-Based Encoder-Decoder Models
Leveraging State-Of-The-Art Techniques
Issue Date: 1-May-2024
Publisher: Alliance College of Engineering and Design, Alliance University
Citation: 53p.
Series/Report no.: CSE_G16_2024 [20030141CSE021]
Abstract: In the realm of computer vision and natural language processing, the task of generating descriptive captions for images has garnered significant attention. This project explores the efficacy of transformer-based encoder-decoder models in addressing the challenge of image captioning. Leveraging state-of-the-art techniques, we develop a novel approach to generate contextually relevant and coherent captions for a diverse range of visual content. Drawing upon a rich dataset of images and corresponding captions, we employ a transformer-based architecture to learn the intricate relationship between visual features and textual descriptions. Through extensive experimentation and evaluation, we assess the performance of our model in terms of caption quality, semantic coherence, and generalization across different domains. The project adopts a collaborative and interdisciplinary approach, bringing together expertise from computer vision, natural language processing, and machine learning. By leveraging insights from these diverse fields, we aim to push the boundaries of image captioning and pave the way for more advanced and nuanced understanding of visual content. The findings of this project hold promise for various applications, including accessibility, content indexing, and human-computer interaction. By enabling automated generation of descriptive captions, our model has the potential to enhance user experience, improve content accessibility, and facilitate richer interaction with visual data. As we navigate the frontier of image captioning with transformer-based encoder-decoder models, this project contributes to the ongoing dialogue in the field and underscores the transformative potential of artificial intelligence in bridging the gap between visual and textual modalities.
URI: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/16085
Appears in Collections:Dissertations - Alliance College of Engineering & Design

Files in This Item:
File SizeFormat 
CSE_G16_2024.pdf
  Restricted Access
1.85 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.