ToolFuzz — Automated Agent Tool Testing
ToolFuzz is a method for automated testing of tool documentation for LLM agents, identifying errors in user queries and responses.
Runtime Detection of Adversarial Attacks in AI Accelerators Using Performance Counters
This study introduces SAMURAI, a framework for real-time detection of adversarial attacks on AI hardware using performance counters.
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases
MedR-Bench introduces a benchmarking dataset for evaluating LLMs’ reasoning abilities in clinical scenarios, revealing limitations despite high accuracy.
Junior Software Developers’ Perspectives on Adopting LLMs for Software Engineering: a Systematic Literature Review
The paper reviews studies on how junior developers perceive LLMs in software engineering, highlighting both benefits and limitations.
NLP-enabled Trajectory Map-matching in Urban Road Networks using a Transformer-based Encoder-decoder
This research develops a deep learning approach to map-matching vehicular trajectories, improving accuracy through NLP techniques.
Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training
This study automates contrast set generation to evaluate NLP model robustness, showing improved performance and robustness.
Is My Text in Your AI Model? Gradient-based Membership Inference Test applied to LLMs
The paper presents an adaptation of Membership Inference Test to identify if textual data was used for training LLMs, ensuring privacy.
E-Gen: Leveraging E-Graphs to Improve Continuous Representations of Symbolic Expressions
E-Gen introduces a method for improving mathematical expression embeddings using E-graphs for enhanced performance in NLP tasks.
Plume: Scaffolding Text Composition in Dashboards
Plume is a system designed to assist authors in crafting effective text for dashboards, enhancing clarity with LLMs.
AI Biases as Asymmetries: A Review to Guide Practice
This review discusses the evolving understanding of bias in AI, advocating for a new framework to address both beneficial and detrimental biases.