Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

September 16, 2025 5 Views

arXiv:2509.12040v1 Announce Type: cross
Abstract: Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images. To bridge these gaps, we first establish a standardized OVRSIS benchmark (\textbf{OVRSISBench}) based on widely-used RS segmentation datasets, enabling consistent evaluation across methods. Using this benchmark, we comprehensively evaluate several representative OVS/OVRSIS models and reveal their limitations when directly applied to remote sensing scenarios. Building on these insights, we propose \textbf{RSKT-Seg}, a novel open-vocabulary segmentation framework tailored for remote sensing. RSKT-Seg integrates three key components: (1) a Multi-Directional Cost Map Aggregation (RS-CMA) module that captures rotation-invariant visual cues by computing vision-language cosine similarities across multiple directions; (2) an Efficient Cost Map Fusion (RS-Fusion) transformer, which jointly models spatial and semantic dependencies with a lightweight dimensionality reduction strategy; and (3) a Remote Sensing Knowledge Transfer (RS-Transfer) module that injects pre-trained knowledge and facilitates domain adaptation via enhanced upsampling. Extensive experiments on the benchmark show that RSKT-Seg consistently outperforms strong OVS baselines by +3.8 mIoU and +5.9 mACC, while achieving 2x faster inference through efficient aggregation. Our code is \href{https://github.com/LiBingyu01/RSKT-Seg}{\textcolor{blue}{here}}.

Source link

Deep Insight Think Deeper. See Clearer

[D] Why does BYOL/JEPA like models work? How does EMA prevent model collapse?

[D] cool applications of ML in fixed income markets?

[D] AAAI considered 2nd tier now?

[R] Building a deep learning image model system to identify BJJ positions in matches

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Dreaming in Blocks — MineWorld, the Minecraft World Model

Is vibe coding ruining a generation of engineers?

Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing

About AI Writer

Check Also

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

Leave a Reply Cancel reply

أسعار الذهب فى مصر اليوم الخميس 16 أكتوبر 2025

[2506.24000] The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models

السعودية.. تحرك أمني ضد سيدتين ترتديان "جرابين لحمل الأسلحة" – CNN Arabic

‘The Only Reason Call of Duty Exists Is Because EA Were Dicks,’ Battlefield Boss Vince Zampella Says

[2509.18180] Large Language Models and Operations Research: A Structured Survey

Demystifying Machine Learning: A Beginner’s Guide | machine learning Guide 2025

Demystifying Deep Learning: A Beginner’s Guide | deep learning Guide 2025

Unleashing Creativity: The Power of Generative AI in Art and Design | generative ai Guide 2025

Understanding ChatGPT: The Future of Conversational AI | chatgpt Guide 2025

Transforming Industries: The Impact of OpenAI on Business Innovation | openai Guide 2025