fairXiv.org

11/3/2024[V1] 6/10/2024

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

arXiv:2406.06007v3 »Full PDF »

NeurIPS 2024 Datasets and Benchmarks Track

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness. CARES comprises about 41K question-answer pairs in both closed and open-ended formats, covering 16 medical image modalities and 27 anatomical regions. Our analysis reveals that the models consistently exhibit concerns regarding trustworthiness, often displaying factual inaccuracies and failing to maintain fairness across different demographic groups. Furthermore, they are vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly release our benchmark and code in https://cares-ai.github.io/.

Abstract

11/8/2024[V1] 2/20/2024

Fairness Without Harm: An Influence-Guided Active Sampling Approach

Jinlong Pang, Jialu Wang, Zhaowei Zhu, Yuanshun Yao, Chen Qian, Yang Liu

arXiv:2402.12789v3 »Full PDF »

The pursuit of fairness in machine learning (ML), ensuring that the models do not exhibit biases toward protected demographic groups, typically results in a compromise scenario. This compromise can be explained by a Pareto frontier where given certain resources (e.g., data), reducing the fairness violations often comes at the cost of lowering the model accuracy. In this work, we aim to train models that mitigate group fairness disparity without causing harm to model accuracy. Intuitively, acquiring more data is a natural and promising approach to achieve this goal by reaching a better Pareto frontier of the fairness-accuracy tradeoff. The current data acquisition methods, such as fair active learning approaches, typically require annotating sensitive attributes. However, these sensitive attribute annotations should be protected due to privacy and safety concerns. In this paper, we propose a tractable active data sampling algorithm that does not rely on training group annotations, instead only requiring group annotations on a small validation set. Specifically, the algorithm first scores each new example by its influence on fairness and accuracy evaluated on the validation dataset, and then selects a certain number of examples for training. We theoretically analyze how acquiring more data can improve fairness without causing harm, and validate the possibility of our sampling approach in the context of risk disparity. We also provide the upper bound of generalization error and risk disparity as well as the corresponding connections. Extensive experiments on real-world data demonstrate the effectiveness of our proposed algorithm. Our code is available at https://github.com/UCSC-REAL/FairnessWithoutHarm.

Abstract

11/8/2024

Fairness in Monotone $k$ -submodular Maximization: Algorithms and Applications

Yanhui Zhu, Samik Basu, A. Pavan

arXiv:2411.05318v1 »Full PDF »

17 pages. To appear in IEEE BigData 2024

Submodular optimization has become increasingly prominent in machine learning and fairness has drawn much attention. In this paper, we propose to study the fair

$k$ -submodular maximization problem and develop a

$\frac{1}{3}$ -approximation greedy algorithm with a running time of

$\mathcal{O}(knB)$ . To the best of our knowledge, our work is the first to incorporate fairness in the context of

$k$ -submodular maximization, and our theoretical guarantee matches the best-known

$k$ -submodular maximization results without fairness constraints. In addition, we have developed a faster threshold-based algorithm that achieves a

$(\frac{1}{3} - \epsilon)$ approximation with

$\mathcal{O}(\frac{kn}{\epsilon} \log \frac{B}{\epsilon})$ evaluations of the function

$f$ . Furthermore, for both algorithms, we provide approximation guarantees when the

$k$ -submodular function is not accessible but only can be approximately accessed. We have extensively validated our theoretical findings through empirical research and examined the practical implications of fairness. Specifically, we have addressed the question: ``What is the price of fairness?" through case studies on influence maximization with

$k$ topics and sensor placement with

$k$ types. The experimental results show that the fairness constraints do not significantly undermine the quality of solutions.

Abstract

11/7/2024[V1] 8/5/2024

A Taxonomy of Multi-Layered Runtime Guardrails for Designing Foundation Model-Based Agents: Swiss Cheese Model for AI Safety by Design

Md Shamsujjoha, Qinghua Lu, Dehai Zhao, Liming Zhu

arXiv:2408.02205v2 »Full PDF »

15 Pages

Foundation Model (FM) based agents are revolutionizing application development across various domains. However, their rapidly growing capabilities and autonomy have raised significant concerns about AI safety. Designing effective guardrails for these agents is challenging due to their autonomous and non-deterministic behavior, and the involvement of multiple artifacts -- such as goals, prompts, plans, tools, knowledge bases, and intermediate and final results. Addressing these unique challenges runtime requires multi-layered guardrails that operate effectively at various levels of the agent architecture, similar to the Swiss Cheese Model. In this paper, we present a taxonomy of multi-layered runtime guardrails to classify and compare their characteristics and design options, grounded on a systematic literature review and guided by the Swiss Cheese Model. This taxonomy is organized into external and internal quality attributes and design options categories. We also highlight the relationships between guardrails, the associated risks they mitigate, and the quality attributes they impact in agent architectures. Thus, the proposed taxonomy provides structured and concrete guidance for making architectural design decisions when implementing multi-layered guardrails while emphasizing the trade-offs inherent in these decisions.

Abstract

11/6/2024

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

Fengxiang Wang, Ranjie Duan, Peng Xiao, Xiaojun Jia, YueFeng Chen, Chongwen Wang, Jialing Tao, Hang Su, Jun Zhu, Hui Xue

arXiv:2411.03814v1 »Full PDF »

Large Language Models (LLMs) demonstrate outstanding performance in their reservoir of knowledge and understanding capabilities, but they have also been shown to be prone to illegal or unethical reactions when subjected to jailbreak attacks. To ensure their responsible deployment in critical applications, it is crucial to understand the safety capabilities and vulnerabilities of LLMs. Previous works mainly focus on jailbreak in single-round dialogue, overlooking the potential jailbreak risks in multi-round dialogues, which are a vital way humans interact with and extract information from LLMs. Some studies have increasingly concentrated on the risks associated with jailbreak in multi-round dialogues. These efforts typically involve the use of manually crafted templates or prompt engineering techniques. However, due to the inherent complexity of multi-round dialogues, their jailbreak performance is limited. To solve this problem, we propose a novel multi-round dialogue jailbreaking agent, emphasizing the importance of stealthiness in identifying and mitigating potential threats to human values posed by LLMs. We propose a risk decomposition strategy that distributes risks across multiple rounds of queries and utilizes psychological strategies to enhance attack strength. Extensive experiments show that our proposed method surpasses other attack methods and achieves state-of-the-art attack success rate. We will make the corresponding code and dataset available for future research. The code will be released soon.

Abstract

10/31/2024

ADAPT: A Game-Theoretic and Neuro-Symbolic Framework for Automated Distributed Adaptive Penetration Testing

Haozhe Lei, Yunfei Ge, Quanyan Zhu

arXiv:2411.00217v1 »Full PDF »

The integration of AI into modern critical infrastructure systems, such as healthcare, has introduced new vulnerabilities that can significantly impact workflow, efficiency, and safety. Additionally, the increased connectivity has made traditional human-driven penetration testing insufficient for assessing risks and developing remediation strategies. Consequently, there is a pressing need for a distributed, adaptive, and efficient automated penetration testing framework that not only identifies vulnerabilities but also provides countermeasures to enhance security posture. This work presents ADAPT, a game-theoretic and neuro-symbolic framework for automated distributed adaptive penetration testing, specifically designed to address the unique cybersecurity challenges of AI-enabled healthcare infrastructure networks. We use a healthcare system case study to illustrate the methodologies within ADAPT. The proposed solution enables a learning-based risk assessment. Numerical experiments are used to demonstrate effective countermeasures against various tactical techniques employed by adversarial AI.

Abstract

10/30/2024

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

Youcheng Huang, Fengbin Zhu, Jingkun Tang, Pan Zhou, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

arXiv:2410.22888v1 »Full PDF »

Visual Language Models (VLMs) are vulnerable to adversarial attacks, especially those from adversarial images, which is however under-explored in literature. To facilitate research on this critical safety problem, we first construct a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR), given that existing datasets are either small-scale or only contain limited types of harmful responses. With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input. Extensive experiments with two victim VLMs, LLaVA and MiniGPT-4, well demonstrate the effectiveness, efficiency, and cross-model transferrability of our proposed method. Our code is available at https://github.com/mob-scu/RADAR-NEARSIDE

Abstract

10/29/2024[V1] 7/10/2024

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

arXiv:2407.07457v4 »Full PDF »

The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https://github.com/NineAbyss/GLBench.

Abstract

10/29/2024

Discriminative Pedestrian Features and Gated Channel Attention for Clothes-Changing Person Re-Identification

Yongkang Ding, Rui Mao, Hanyue Zhu, Anqi Wang, Liyan Zhang

arXiv:2410.21663v1 »Full PDF »

The article has been accepted by IEEE International Conference on Multimedia and Expo 2024

In public safety and social life, the task of Clothes-Changing Person Re-Identification (CC-ReID) has become increasingly significant. However, this task faces considerable challenges due to appearance changes caused by clothing alterations. Addressing this issue, this paper proposes an innovative method for disentangled feature extraction, effectively extracting discriminative features from pedestrian images that are invariant to clothing. This method leverages pedestrian parsing techniques to identify and retain features closely associated with individual identity while disregarding the variable nature of clothing attributes. Furthermore, this study introduces a gated channel attention mechanism, which, by adjusting the network's focus, aids the model in more effectively learning and emphasizing features critical for pedestrian identity recognition. Extensive experiments conducted on two standard CC-ReID datasets validate the effectiveness of the proposed approach, with performance surpassing current leading solutions. The Top-1 accuracy under clothing change scenarios on the PRCC and VC-Clothes datasets reached 64.8% and 83.7%, respectively.

Abstract

10/28/2024

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring

Honglin Mu, Han He, Yuxin Zhou, Yunlong Feng, Yang Xu, Libo Qin, Xiaoming Shi, Zeming Liu, Xudong Han, Qi Shi, Qingfu Zhu, Wanxiang Che

arXiv:2410.21083v1 »Full PDF »

Large language model (LLM) safety is a critical issue, with numerous studies employing red team testing to enhance model security. Among these, jailbreak methods explore potential vulnerabilities by crafting malicious prompts that induce model outputs contrary to safety alignments. Existing black-box jailbreak methods often rely on model feedback, repeatedly submitting queries with detectable malicious instructions during the attack search process. Although these approaches are effective, the attacks may be intercepted by content moderators during the search process. We propose an improved transfer attack method that guides malicious prompt construction by locally training a mirror model of the target black-box model through benign data distillation. This method offers enhanced stealth, as it does not involve submitting identifiable malicious instructions to the target model during the search phase. Our approach achieved a maximum attack success rate of 92%, or a balanced value of 80% with an average of 1.5 detectable jailbreak queries per sample against GPT-3.5 Turbo on a subset of AdvBench. These results underscore the need for more robust defense mechanisms.

Abstract

fairXiv Pronounced fair • kive

16737 latest Fairness/Ethics + ML/AI papers

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Fairness Without Harm: An Influence-Guided Active Sampling Approach

Fairness in Monotone $k$ -submodular Maximization: Algorithms and Applications

A Taxonomy of Multi-Layered Runtime Guardrails for Designing Foundation Model-Based Agents: Swiss Cheese Model for AI Safety by Design

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

ADAPT: A Game-Theoretic and Neuro-Symbolic Framework for Automated Distributed Adaptive Penetration Testing

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

Discriminative Pedestrian Features and Gated Channel Attention for Clothes-Changing Person Re-Identification

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Fairness Without Harm: An Influence-Guided Active Sampling Approach

Fairness in Monotone kk-submodular Maximization: Algorithms and Applications

A Taxonomy of Multi-Layered Runtime Guardrails for Designing Foundation Model-Based Agents: Swiss Cheese Model for AI Safety by Design

MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue

ADAPT: A Game-Theoretic and Neuro-Symbolic Framework for Automated Distributed Adaptive Penetration Testing

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

Discriminative Pedestrian Features and Gated Channel Attention for Clothes-Changing Person Re-Identification

Stealthy Jailbreak Attacks on Large Language Models via Benign Data Mirroring

Fairness in Monotone $k$ -submodular Maximization: Algorithms and Applications