Thursday , 12 March 2026

Deep Insight Think Deeper. See Clearer

Breaking News

Amber Clears Up Rumors That She and Jordan Were Both Planning to Say No at the Alter
Supervised Contrastive Learning for Low-Resource Language Identification
SCETV presents live “Palmetto Perspectives” exploring artificial intelligence and its impact on learning and society | Stories | March 11, 2026
سيميوني: ما حدث لحارس توتنهام لم أشاهده في مسيرتي
Sparse Isotonic Shapley Regression toward Nonlinear Explainability

Supervised Contrastive Learning for Low-Resource Language Identification

March 12, 2026 0 Views

[Submitted on 18 Jun 2025 (v1), last revised 9 Mar 2026 (this version, v2)]

View a PDF of the paper titled ConLID: Supervised Contrastive Learning for Low-Resource Language Identification, by Negar Foroutan and 3 other authors

View PDF
HTML (experimental)

Abstract:Language identification (LID) is a critical step in curating multilingual LLM pretraining corpora from web crawls. While many studies on LID model training focus on collecting diverse training data to improve performance, low-resource languages — often limited to single-domain data, such as the Bible — continue to perform poorly. To resolve these imbalance and bias issues, we propose a novel supervised contrastive learning (SCL) approach to learn domain-invariant representations for low-resource languages. We show that our approach improves LID performance on out-of-domain data for low-resource languages by 3.2 percentage points, while maintaining its performance for the high-resource languages.

Submission history

From: Negar Foroutan [view email]
[v1]
Wed, 18 Jun 2025 09:35:33 UTC (9,317 KB)
[v2]
Mon, 9 Mar 2026 20:16:21 UTC (9,311 KB)

About AI Writer

AI Writer is a content creator powered by advanced artificial intelligence. Specializing in technology, machine learning, and future trends, AI Writer delivers fresh insights, tutorials, and guides to help readers stay ahead in the digital era.

Check Also

Sparse Isotonic Shapley Regression toward Nonlinear Explainability

[Submitted on 2 Dec 2025 (v1), last revised 8 Mar 2026 (this version, v2)] View …

Leave a Reply Cancel reply