Bounded Ratio Reinforcement Learning
The paper introduces the Bounded Ratio Reinforcement Learning (BRRL) framework, which addresses the disconnect between trust region methods and the heuristic clipped objective in Proximal Policy Optimization (PPO). The authors derive an optimal solution for a constrained policy optimization problem and evaluate their policy optimization algorithm Bounded Policy Optimization (BPO) across various environments.
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs
The authors present the BLF system for binary forecasting, which employs a Bayesian linguistic belief state for dynamic updating and hierarchical aggregation. Evaluations show that BLF outperforms existing public forecasting methods on a robust benchmark.
ReCap: Lightweight Referential Grounding for Coherent Story Visualization
ReCap introduces a consistency framework that enhances story visualization by maintaining character stability and visual fidelity without modifying the base diffusion model. It shows improved character consistency performance across benchmark stories.
When Can LLMs Learn to Reason with Weak Supervision?
This study investigates the ability of large language models to learn reasoning under weak supervision conditions, identifying key factors that influence generalization and performance when faced with limited or noisy reward signals.
Benchmarking System Dynamics AI Assistants: Cloud Versus Local LLMs on CLD Extraction and Discussion
The paper evaluates cloud and local language models on system dynamics tasks, documenting performance discrepancies across multiple benchmarks and providing guidelines for practitioners on model deployment.
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
Introducing GSQ, a method for low-precision scalar quantization that improves model accuracy at low bit-widths while maintaining low memory costs, demonstrating substantial advancements on LLaMA models.
Inference-Time Distillation: Cost-Efficient Agents Without Fine-Tuning or Manual Prompt Engineering
The authors propose a technique to improve efficiency in deploying LLM agents without cumbersome prompt engineering or fine-tuning, showcasing significant cost reductions while preserving accuracy.
FUSE: Ensembling Verifiers with Zero Labeled Data
FUSE aims to improve verification of model outputs by employing an ensemble method that does not rely on ground truth labels, showcasing competitive performance against semi-supervised counterparts.
Empowering Multi-Turn Tool-Integrated Agentic Reasoning with Group Turn Policy Optimization
This work introduces Group Turn Policy Optimization to enhance the training of LLMs on multi-turn reasoning tasks, demonstrating significant improvements in performance across various benchmarks.
