VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
This work introduces VIKI-Bench, a hierarchical benchmark tailored for embodied multi-agent cooperation, along with VIKI-R, a framework that enhances multi-agent cooperation through reinforcement learning and demonstrates significantly improved performance across diverse tasks.
Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models
The paper introduces Cosmos-Drive-Dreams, a synthetic data generation pipeline that addresses challenges in real-world driving data collection by generating high-fidelity driving scenarios to facilitate autonomous vehicle training.
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
AbstentionBench provides a large-scale benchmark for evaluating LLMs’ abilities to correctly abstain from answering unanswerable questions, revealing significant gaps in performance across different models.
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
FZOO offers a Fast Zeroth-Order Optimizer that achieves Adam-scale speed for fine-tuning large language models by significantly reducing the number of required forward passes for convergence.
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning
Router-R1 presents a reinforcement learning framework for optimizing the routing of user queries among multiple language models, demonstrating improved performance in multi-hop QA benchmarks.
Learning to Reason Across Parallel Samples for LLM Reasoning
This study proposes a novel model, Sample Set Aggregator (SSA), for aggregating answers from multiple sampled outputs to improve reasoning accuracy in language models.
When Simple Model Just Works: Is Network Traffic Classification in Crisis?
This paper investigates the efficacy of a simple k-NN baseline for network traffic classification, revealing issues with current practices and redundancy in labeled datasets.
TokenBreak: Bypassing Text Classification Models Through Token Manipulation
TokenBreak introduces a novel attack method that can bypass text classification models by exploiting the tokenization strategy, revealing vulnerabilities that may endanger systems relying on NLP technologies.
LoRMA: Low-Rank Multiplicative Adaptation for LLMs
LoRMA proposes a new approach to low-rank adaptation in language models, shifting from additive to multiplicative updates to enhance training efficiency and performance.