AI Research Trends 

Hidden Prompts in Manuscripts Exploit AI-Assisted Peer Review

In July 2025, 18 academic manuscripts on the preprint website arXiv were found to contain hidden instructions known as prompts designed to manipulate AI-assisted peer review. The incident exposes systematic vulnerabilities extending beyond peer review to any automated system processing scholarly texts, underscoring the need for coordinated technical screening.

Read more

Bias, Accuracy, and Trust: Gender-Diverse Perspectives on Large Language Models

This study examines how gender-diverse populations perceive bias, accuracy, and trustworthiness in LLMs, revealing significant insights on gendered responses and the necessity for more inclusive AI systems.

Read more

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety

OpenAgentSafety introduces a modular framework for evaluating agent behavior across essential risk categories, highlighting safety vulnerabilities in AI agents deployed in real-world tasks.

Read more

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications

This paper presents a novel prompt recommendation system for domain-specific AI applications, significantly enhancing the effectiveness of user interactions by automatically generating high-quality prompts.

Read more

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

UQLM provides an off-the-shelf solution for detecting hallucinations in LLMs using uncertainty quantification techniques, aiding reliability in AI outputs.

Read more

SQLBarber: A System Leveraging Large Language Models to Generate Customized and Realistic SQL Workloads

SQLBarber enables the generation of customized SQL queries, significantly enhancing benchmarking in database research, and addressing challenges related to data privacy.

Read more

Few-shot text-based emotion detection

The Unibuc – NLP team details their approach to the SemEval 2025 Workshop, leveraging large language models for emotion detection with effective results across multiple languages.

Read more

A Survey on Latent Reasoning

This comprehensive survey provides insights into latent reasoning methods for LLMs, aiming to improve their computational reasoning capabilities without token-level supervision.

Read more

MedGemma Technical Report

MedGemma introduces a collection of medical foundation models demonstrating advanced capabilities, significantly contributing to AI applications in healthcare.

Read more

Large Language Models Predict Human Well-being — But Not Equally Everywhere

This study evaluates the predictive accuracy of LLMs regarding human well-being across different nations, revealing systemic biases and the limitations inherent to training data.

Read more