Program now available
Registration open
>
16 - 20 February 2025
San Diego, California, US
Conference 13407 > Paper 13407-111
Paper 13407-111

Vision transformer for efficient chest x-ray and gastrointestinal image classification

19 February 2025 • 5:30 PM - 7:00 PM PST | Golden State Ballroom

Abstract

Medical image analysis has emerged as critical research domain because of its usefulness in different clinical applications, such as early disease diagnosis and treatment. Convolutional neural networks (CNNs) have become standard in medical image analysis due to their superior ability to interpret complex features, often outperforming humans. In addition to CNNs, transformer architectures also have gained popularity for medical image analysis tasks. However, despite progress in the field, there are still potential areas for improvement. This study evaluates and compares both CNNs and transformer-based methods, employing diverse data augmentation techniques, on three medical image datasets. For Chest X-ray, the vision transformer model achieved the highest F1 score of 0.9532 and Matthews correlation coefficient (MCC) of 0.9259. Similarly, for the Kvasir dataset, we achieved an F1 score of 0.9436 and MCC of 0.9360. For the Kvasir-Capsule, the ViT model achieved an F1-score of 0.7156 and an MCC of 0.3705. We found that the transformer-based models were better or more effective than various CNN models for classifying different anatomical structures, findings, and abnormalities in medic

Presenter

Northwestern Univ. (United States)
Application tracks: AI/ML , Sustainability
Presenter/Author
Northwestern Univ. (United States)
Author
Northwestern Univ. (United States)
Author
Smriti Regmi
IOE Pashchimanchal Campus (Nepal)
Author
Aliza Subedi
IOE Pashchimanchal Campus (Nepal)