[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

October 15, 2025 0 Views

[Submitted on 30 Jun 2025 (v1), last revised 13 Oct 2025 (this version, v2)]

View a PDF of the paper titled The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models, by Lijun Sheng and 4 other authors

View PDF
HTML (experimental)

Abstract:Test-time adaptation (TTA) methods have gained significant attention for enhancing the performance of vision-language models (VLMs) such as CLIP during inference, without requiring additional labeled data. However, current TTA researches generally suffer from major limitations such as duplication of baseline results, limited evaluation metrics, inconsistent experimental settings, and insufficient analysis. These problems hinder fair comparisons between TTA methods and make it difficult to assess their practical strengths and weaknesses. To address these challenges, we introduce TTA-VLM, a comprehensive benchmark for evaluating TTA methods on VLMs. Our benchmark implements 8 episodic TTA and 7 online TTA methods within a unified and reproducible framework, and evaluates them across 15 widely used datasets. Unlike prior studies focused solely on CLIP, we extend the evaluation to SigLIP–a model trained with a Sigmoid loss–and include training-time tuning methods such as CoOp, MaPLe, and TeCoA to assess generality. Beyond classification accuracy, TTA-VLM incorporates various evaluation metrics, including robustness, calibration, out-of-distribution detection, and stability, enabling a more holistic assessment of TTA methods. Through extensive experiments, we find that 1) existing TTA methods produce limited gains compared to the previous pioneering work; 2) current TTA methods exhibit poor collaboration with training-time fine-tuning methods; 3) accuracy gains frequently come at the cost of reduced model trustworthiness. We release TTA-VLM to provide fair comparison and comprehensive evaluation of TTA methods for VLMs, and we hope it encourages the community to develop more reliable and generalizable TTA strategies.

Submission history

From: Lijun Sheng [view email]
[v1]
Mon, 30 Jun 2025 16:05:55 UTC (83 KB)
[v2]
Mon, 13 Oct 2025 13:09:11 UTC (1,500 KB)

Deep Insight Think Deeper. See Clearer

[D] Why does BYOL/JEPA like models work? How does EMA prevent model collapse?

[D] cool applications of ML in fixed income markets?

[D] AAAI considered 2nd tier now?

[R] Building a deep learning image model system to identify BJJ positions in matches

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Dreaming in Blocks — MineWorld, the Minecraft World Model

Is vibe coding ruining a generation of engineers?

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Submission history

About AI Writer

Check Also

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Leave a Reply Cancel reply

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

السعودية.. تحرك أمني ضد سيدتين ترتديان "جرابين لحمل الأسلحة" – CNN Arabic

‘The Only Reason Call of Duty Exists Is Because EA Were Dicks,’ Battlefield Boss Vince Zampella Says

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Stock market today: Live updates

Demystifying Machine Learning: A Beginner’s Guide | machine learning Guide 2025

Demystifying Deep Learning: A Beginner’s Guide | deep learning Guide 2025

Unleashing Creativity: The Power of Generative AI in Art and Design | generative ai Guide 2025

Understanding ChatGPT: The Future of Conversational AI | chatgpt Guide 2025

Transforming Industries: The Impact of OpenAI on Business Innovation | openai Guide 2025