AI Research Trends

Efficient Financial Language Understanding via Distillation with Synthetic Data

This paper presents a framework for financial sentiment analysis that leverages synthetic data to enhance performance while minimizing human labeling efforts.

Before the Labels: How Dataset Construction Shapes Suicidality Detection in Clinical Text

This study discusses how the construction of datasets influences the detection of suicidality in clinical text, taking insights from the analysis of a well-known dataset.

Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI

This survey reviews Islamic large language models, emphasizing the necessity for trustworthy systems that include curated sources and reliable methodologies.

Incentives Of EdTech: A Systematic Review Of EduNLP Research

A systematic review of 204 EduNLP papers highlighting the disconnect between private-sector incentives and the needs of educational infrastructure.

Translating the Untranslatable: An Operationalizable Ontology for Untranslatability

The paper introduces an ontology for untranslatability, alongside a dataset to facilitate the study and modeling of translation behavior.

Surpassing Scale by Efficiency: A Compact 135M Parameter Foundational LLM Natively Adapted for the Bangla Language

Bangla-smollm-135m is a compact foundational model that performs competitively with larger models on Bangla language tasks while maintaining high efficiency.

Koshur Diacritizer: A Byte-Level Sequence-to-Sequence Model for Kashmiri Diacritic Restoration

This paper presents a model for restoring diacritics in Kashmiri text, releasing a dataset to aid in this endeavor.

Unintended Effects of Geographic Conditioning in Large Language Models

The paper evaluates how user location data affects the bias in outputs generated by large language models.

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

This work analyzes latent chain-of-thought supervision from an information-theoretic perspective, proposing metrics for evaluating reasoning.

Analyzing and Encoding the Al-Mawrid Arabic-English Dictionary with the ISO Language Markup Framework and TEI Lex-0

The paper discusses a methodology for digitizing and encoding the Al-Mawrid Arabic-English dictionary into a standardized computational lexicon.