Articles

Permanent URI for this collectionhttps://ds.uofallujah.edu.iq/handle/123456789/48

Explore Our Articles

Dive into a collection of insightful articles covering a wide range of topics, from academic research to thought-provoking ideas and innovative solutions. Our curated content is designed to inform, inspire, and engage readers from all walks of life.

Whether you're seeking in-depth analyses, practical tips, or the latest trends, our articles provide valuable perspectives and knowledge to keep you informed and connected.

Start exploring and discover content that broadens your horizons and fuels your curiosity.

News

Latest News

University Hosts Annual Research Conference

March 2025

The University of Fallujah recently hosted its annual research conference, bringing together scholars, students, and industry experts to discuss the latest developments in science and technology.

New Digital Repository Launched

November 15, 2024

We are excited to announce the launch of the Digital Repository, providing open access to the university's academic and research materials for global audiences.

New University of Fallujah System Released

November 15, 2024

The University of Fallujah has launched a new system to enhance administrative processes and improve student services. This system aims to streamline academic records, facilitate communication, and provide a user-friendly platform for students, faculty, and staff.

Stay up-to-date with more news:

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
    (Corinel University, 2021-01-28) Li Yuan; Yunpeng Chen
    Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and then applies multiple Transformer layers to model their global relation for classification. However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find it is because: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines among neighboring pixels, leading to low training sample efficiency; 2) the redundant attention backbone design of ViT leads to limited feature richness for fixed computation budgets and limited training samples. To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study. Notably, T2T-ViT reduces the parameter count and MACs of vanilla ViT by half, while achieving more than 3.0\% improvement when trained from scratch on ImageNet. It also outperforms ResNets and achieves comparable performance with MobileNets by directly training on ImageNet. For example, T2T-ViT with comparable size to ResNet50 (21.5M parameters) can achieve 83.3\% top1 accuracy in image resolution 384×384 on ImageNet. (Code: this https URL)

© 2025 University of Fallujah. All rights reserved.