EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits
This paper introduces EditInspector, a novel benchmark for evaluating text-guided image edits using human annotations. The framework assesses quality across various dimensions, revealing significant shortcomings in current models’ capabilities.
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
This paper presents V-JEPA 2, which combines extensive video data and robotic interaction to achieve state-of-the-art results in motion understanding and video question answering, demonstrating the potential of self-supervised learning in robotics.
How Do People Revise Inconsistent Beliefs? Examining Belief Revision in Humans with User Studies
By conducting user studies, this paper explores how humans modify beliefs based on explanations, providing insights crucial for designing AI systems that better align with human reasoning and cognitive processes.
When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text
This research highlights vulnerabilities in detecting AI-generated text on social media, revealing that fine-tuned models significantly reduce detectability, which poses risks for misinformation and influence campaigns.
Trustworthy AI: Safety, Bias, and Privacy — A Survey
This survey investigates the challenges of trustworthiness in AI systems, focusing on issues of safety, bias, and privacy, which are essential for the responsible deployment of AI technologies.
Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy
This paper introduces a large-scale dataset for medical visual question answering, enhancing clinical decision support systems in endoscopy, aimed at training more reliable AI systems in healthcare settings.
KI4Demokratie: An AI-Based Platform for Monitoring and Fostering Democratic Discourse
This paper presents an AI platform designed to monitor right-wing discourse on social media, aiding journalists and policymakers in understanding and countering extremist narratives without infringing freedom of expression.
PersonaLens: A Benchmark for Personalization Evaluation in Conversational AI Assistants
This work introduces PersonaLens, a comprehensive benchmark for assessing personalization in AI assistants, which is crucial for enhancing user experiences in task-oriented dialogues.
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
EmoNet-Voice provides a large-scale dataset for evaluating speech emotion recognition, designed to improve emotional understanding in AI systems by incorporating human expert verification and fine-grained categorizations.