Artificial intelligence has significantly impacted medical applications,
particularly with the advent of Medical Large Vision Language Models
(Med-LVLMs), sparking optimism for the future of automated and personalized
healthcare. However, the trustworthiness of Med-LVLMs remains unverified,
posing significant risks for future model deployment. In this paper, we
introduce CARES and aim to comprehensively evaluate the Trustworthiness of
Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs
across five dimensions, including trustfulness, fairness, safety, privacy, and
robustness. CARES comprises about 41K question-answer pairs in both closed and
open-ended formats, covering 16 medical image modalities and 27 anatomical
regions. Our analysis reveals that the models consistently exhibit concerns
regarding trustworthiness, often displaying factual inaccuracies and failing to
maintain fairness across different demographic groups. Furthermore, they are
vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly
release our benchmark and code in https://cares-ai.github.io/.Abstract
Fairness Without Harm: An Influence-Guided Active Sampling Approach
arXiv:2402.12789v3 »Full PDF »The pursuit of fairness in machine learning (ML), ensuring that the models do
not exhibit biases toward protected demographic groups, typically results in a
compromise scenario. This compromise can be explained by a Pareto frontier
where given certain resources (e.g., data), reducing the fairness violations
often comes at the cost of lowering the model accuracy. In this work, we aim to
train models that mitigate group fairness disparity without causing harm to
model accuracy. Intuitively, acquiring more data is a natural and promising
approach to achieve this goal by reaching a better Pareto frontier of the
fairness-accuracy tradeoff. The current data acquisition methods, such as fair
active learning approaches, typically require annotating sensitive attributes.
However, these sensitive attribute annotations should be protected due to
privacy and safety concerns. In this paper, we propose a tractable active data
sampling algorithm that does not rely on training group annotations, instead
only requiring group annotations on a small validation set. Specifically, the
algorithm first scores each new example by its influence on fairness and
accuracy evaluated on the validation dataset, and then selects a certain number
of examples for training. We theoretically analyze how acquiring more data can
improve fairness without causing harm, and validate the possibility of our
sampling approach in the context of risk disparity. We also provide the upper
bound of generalization error and risk disparity as well as the corresponding
connections. Extensive experiments on real-world data demonstrate the
effectiveness of our proposed algorithm. Our code is available at
https://github.com/UCSC-REAL/FairnessWithoutHarm.Abstract
Fairness in Monotone k-submodular Maximization: Algorithms and
Applications
Submodular optimization has become increasingly prominent in machine learning
and fairness has drawn much attention. In this paper, we propose to study the
fair k-submodular maximization problem and develop a
13-approximation greedy algorithm with a running time of
O(knB). To the best of our knowledge, our work is the first to
incorporate fairness in the context of k-submodular maximization, and our
theoretical guarantee matches the best-known k-submodular maximization
results without fairness constraints. In addition, we have developed a faster
threshold-based algorithm that achieves a (13−ϵ)
approximation with O(knϵlogBϵ)
evaluations of the function f. Furthermore, for both algorithms, we provide
approximation guarantees when the k-submodular function is not accessible but
only can be approximately accessed. We have extensively validated our
theoretical findings through empirical research and examined the practical
implications of fairness. Specifically, we have addressed the question: ``What
is the price of fairness?" through case studies on influence maximization with
k topics and sensor placement with k types. The experimental results show
that the fairness constraints do not significantly undermine the quality of
solutions.Abstract
A Taxonomy of Multi-Layered Runtime Guardrails for Designing Foundation
Model-Based Agents: Swiss Cheese Model for AI Safety by Design
Foundation Model (FM) based agents are revolutionizing application
development across various domains. However, their rapidly growing capabilities
and autonomy have raised significant concerns about AI safety. Designing
effective guardrails for these agents is challenging due to their autonomous
and non-deterministic behavior, and the involvement of multiple artifacts --
such as goals, prompts, plans, tools, knowledge bases, and intermediate and
final results. Addressing these unique challenges runtime requires
multi-layered guardrails that operate effectively at various levels of the
agent architecture, similar to the Swiss Cheese Model. In this paper, we
present a taxonomy of multi-layered runtime guardrails to classify and compare
their characteristics and design options, grounded on a systematic literature
review and guided by the Swiss Cheese Model. This taxonomy is organized into
external and internal quality attributes and design options categories. We also
highlight the relationships between guardrails, the associated risks they
mitigate, and the quality attributes they impact in agent architectures. Thus,
the proposed taxonomy provides structured and concrete guidance for making
architectural design decisions when implementing multi-layered guardrails while
emphasizing the trade-offs inherent in these decisions.Abstract
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
arXiv:2411.03814v1 »Full PDF »Large Language Models (LLMs) demonstrate outstanding performance in their
reservoir of knowledge and understanding capabilities, but they have also been
shown to be prone to illegal or unethical reactions when subjected to jailbreak
attacks. To ensure their responsible deployment in critical applications, it is
crucial to understand the safety capabilities and vulnerabilities of LLMs.
Previous works mainly focus on jailbreak in single-round dialogue, overlooking
the potential jailbreak risks in multi-round dialogues, which are a vital way
humans interact with and extract information from LLMs. Some studies have
increasingly concentrated on the risks associated with jailbreak in multi-round
dialogues. These efforts typically involve the use of manually crafted
templates or prompt engineering techniques. However, due to the inherent
complexity of multi-round dialogues, their jailbreak performance is limited. To
solve this problem, we propose a novel multi-round dialogue jailbreaking agent,
emphasizing the importance of stealthiness in identifying and mitigating
potential threats to human values posed by LLMs. We propose a risk
decomposition strategy that distributes risks across multiple rounds of queries
and utilizes psychological strategies to enhance attack strength. Extensive
experiments show that our proposed method surpasses other attack methods and
achieves state-of-the-art attack success rate. We will make the corresponding
code and dataset available for future research. The code will be released soon.Abstract
ADAPT: A Game-Theoretic and Neuro-Symbolic Framework for Automated
Distributed Adaptive Penetration Testing
arXiv:2411.00217v1 »Full PDF »The integration of AI into modern critical infrastructure systems, such as
healthcare, has introduced new vulnerabilities that can significantly impact
workflow, efficiency, and safety. Additionally, the increased connectivity has
made traditional human-driven penetration testing insufficient for assessing
risks and developing remediation strategies. Consequently, there is a pressing
need for a distributed, adaptive, and efficient automated penetration testing
framework that not only identifies vulnerabilities but also provides
countermeasures to enhance security posture. This work presents ADAPT, a
game-theoretic and neuro-symbolic framework for automated distributed adaptive
penetration testing, specifically designed to address the unique cybersecurity
challenges of AI-enabled healthcare infrastructure networks. We use a
healthcare system case study to illustrate the methodologies within ADAPT. The
proposed solution enables a learning-based risk assessment. Numerical
experiments are used to demonstrate effective countermeasures against various
tactical techniques employed by adversarial AI.Abstract
Effective and Efficient Adversarial Detection for Vision-Language Models
via A Single Vector
arXiv:2410.22888v1 »Full PDF »Visual Language Models (VLMs) are vulnerable to adversarial attacks,
especially those from adversarial images, which is however under-explored in
literature. To facilitate research on this critical safety problem, we first
construct a new laRge-scale Adervsarial images dataset with Diverse hArmful
Responses (RADAR), given that existing datasets are either small-scale or only
contain limited types of harmful responses. With the new RADAR dataset, we
further develop a novel and effective iN-time Embedding-based AdveRSarial Image
DEtection (NEARSIDE) method, which exploits a single vector that distilled from
the hidden states of VLMs, which we call the attacking direction, to achieve
the detection of adversarial images against benign ones in the input. Extensive
experiments with two victim VLMs, LLaVA and MiniGPT-4, well demonstrate the
effectiveness, efficiency, and cross-model transferrability of our proposed
method. Our code is available at https://github.com/mob-scu/RADAR-NEARSIDEAbstract
GLBench: A Comprehensive Benchmark for Graph with Large Language Models
arXiv:2407.07457v4 »Full PDF »The emergence of large language models (LLMs) has revolutionized the way we
interact with graphs, leading to a new paradigm called GraphLLM. Despite the
rapid development of GraphLLM methods in recent years, the progress and
understanding of this field remain unclear due to the lack of a benchmark with
consistent experimental protocols. To bridge this gap, we introduce GLBench,
the first comprehensive benchmark for evaluating GraphLLM methods in both
supervised and zero-shot scenarios. GLBench provides a fair and thorough
evaluation of different categories of GraphLLM methods, along with traditional
baselines such as graph neural networks. Through extensive experiments on a
collection of real-world datasets with consistent data processing and splitting
strategies, we have uncovered several key findings. Firstly, GraphLLM methods
outperform traditional baselines in supervised settings, with LLM-as-enhancers
showing the most robust performance. However, using LLMs as predictors is less
effective and often leads to uncontrollable output issues. We also notice that
no clear scaling laws exist for current GraphLLM methods. In addition, both
structures and semantics are crucial for effective zero-shot transfer, and our
proposed simple baseline can even outperform several models tailored for
zero-shot scenarios. The data and code of the benchmark can be found at
https://github.com/NineAbyss/GLBench.Abstract
Discriminative Pedestrian Features and Gated Channel Attention for
Clothes-Changing Person Re-Identification
The article has been accepted by IEEE International Conference on
Multimedia and Expo 2024
In public safety and social life, the task of Clothes-Changing Person
Re-Identification (CC-ReID) has become increasingly significant. However, this
task faces considerable challenges due to appearance changes caused by clothing
alterations. Addressing this issue, this paper proposes an innovative method
for disentangled feature extraction, effectively extracting discriminative
features from pedestrian images that are invariant to clothing. This method
leverages pedestrian parsing techniques to identify and retain features closely
associated with individual identity while disregarding the variable nature of
clothing attributes. Furthermore, this study introduces a gated channel
attention mechanism, which, by adjusting the network's focus, aids the model in
more effectively learning and emphasizing features critical for pedestrian
identity recognition. Extensive experiments conducted on two standard CC-ReID
datasets validate the effectiveness of the proposed approach, with performance
surpassing current leading solutions. The Top-1 accuracy under clothing change
scenarios on the PRCC and VC-Clothes datasets reached 64.8% and 83.7%,
respectively.Abstract
Stealthy Jailbreak Attacks on Large Language Models via Benign Data
Mirroring
arXiv:2410.21083v1 »Full PDF »Large language model (LLM) safety is a critical issue, with numerous studies
employing red team testing to enhance model security. Among these, jailbreak
methods explore potential vulnerabilities by crafting malicious prompts that
induce model outputs contrary to safety alignments. Existing black-box
jailbreak methods often rely on model feedback, repeatedly submitting queries
with detectable malicious instructions during the attack search process.
Although these approaches are effective, the attacks may be intercepted by
content moderators during the search process. We propose an improved transfer
attack method that guides malicious prompt construction by locally training a
mirror model of the target black-box model through benign data distillation.
This method offers enhanced stealth, as it does not involve submitting
identifiable malicious instructions to the target model during the search
phase. Our approach achieved a maximum attack success rate of 92%, or a
balanced value of 80% with an average of 1.5 detectable jailbreak queries per
sample against GPT-3.5 Turbo on a subset of AdvBench. These results underscore
the need for more robust defense mechanisms.Abstract