Efficient Financial Language Understanding via Distillation with Synthetic Data
This paper presents a framework for financial sentiment analysis that leverages synthetic data to enhance performance while minimizing human labeling efforts.
Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text
This study discusses how the construction of datasets influences the detection of suicidality in clinical text, taking insights from the analysis of a well-known dataset.
Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI
This survey reviews Islamic large language models, emphasizing the necessity for trustworthy systems that include curated sources and reliable methodologies.
Incentives Of EdTech: A Systematic Review Of EduNLP Research
A systematic review of 204 EduNLP papers highlighting the disconnect between private-sector incentives and the needs of educational infrastructure.
Translating the Untranslatable: An Operationalizable Ontology for Untranslatability
The paper introduces an ontology for untranslatability, alongside a dataset to facilitate the study and modeling of translation behavior.
Surpassing Scale by Efficiency: A Compact 135M Parameter Foundational LLM Natively Adapted for the Bangla Language
Bangla-smollm-135m is a compact foundational model that performs competitively with larger models on Bangla language tasks while maintaining high efficiency.
Koshur Diacritizer: A Byte-Level Sequence-to-Sequence Model for Kashmiri Diacritic Restoration
This paper presents a model for restoring diacritics in Kashmiri text, releasing a dataset to aid in this endeavor.
Unintended Effects of Geographic Conditioning in Large Language Models
The paper evaluates how user location data affects the bias in outputs generated by large language models.
What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis
This work analyzes latent chain-of-thought supervision from an information-theoretic perspective, proposing metrics for evaluating reasoning.
Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0
The paper discusses a methodology for digitizing and encoding the Al-Mawrid Arabic-English dictionary into a standardized computational lexicon.
