Large language models (LLMs) have exhibited great potential in autonomously
completing tasks across real-world applications. Despite this, these LLM agents
introduce unexpected safety risks when operating in interactive environments.
Instead of centering on the harmlessness of LLM-generated content in most prior
studies, this work addresses the imperative need for benchmarking the
behavioral safety of LLM agents within diverse environments. We introduce
R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and
identifying safety risks given agent interaction records. R-Judge comprises 569
records of multi-turn agent interaction, encompassing 27 key risk scenarios
among 5 application categories and 10 risk types. It is of high-quality
curation with annotated safety labels and risk descriptions. Evaluation of 11
LLMs on R-Judge shows considerable room for enhancing the risk awareness of
LLMs: The best-performing model, GPT-4o, achieves 74.42% while no other models
significantly exceed the random. Moreover, we reveal that risk awareness in
open agent scenarios is a multi-dimensional capability involving knowledge and
reasoning, thus challenging for LLMs. With further experiments, we find that
fine-tuning on safety judgment significantly improve model performance while
straightforward prompting mechanisms fail. R-Judge is publicly available at
https://github.com/Lordog/R-Judge.Abstract
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy
Leakage
Generalist web agents have evolved rapidly and demonstrated remarkable
potential. However, there are unprecedented safety risks associated with these
them, which are nearly unexplored so far. In this work, we aim to narrow this
gap by conducting the first study on the privacy risks of generalist web agents
in adversarial environments. First, we present a threat model that discusses
the adversarial targets, constraints, and attack scenarios. Particularly, we
consider two types of adversarial targets: stealing users' specific personally
identifiable information (PII) or stealing the entire user request. To achieve
these objectives, we propose a novel attack method, termed Environmental
Injection Attack (EIA). This attack injects malicious content designed to adapt
well to different environments where the agents operate, causing them to
perform unintended actions. This work instantiates EIA specifically for the
privacy scenario. It inserts malicious web elements alongside persuasive
instructions that mislead web agents into leaking private information, and can
further leverage CSS and JavaScript features to remain stealthy. We collect 177
actions steps that involve diverse PII categories on realistic websites from
the Mind2Web dataset, and conduct extensive experiments using one of the most
capable generalist web agent frameworks to date, SeeAct. The results
demonstrate that EIA achieves up to 70% ASR in stealing users' specific PII.
Stealing full user requests is more challenging, but a relaxed version of EIA
can still achieve 16% ASR. Despite these concerning results, it is important to
note that the attack can still be detectable through careful human inspection,
highlighting a trade-off between high autonomy and security. This leads to our
detailed discussion on the efficacy of EIA under different levels of human
supervision as well as implications on defenses for generalist web agents.Abstract
arXiv:2408.14340v3 »Full PDF »In recent years, foundation models (FMs) such as large language models (LLMs)
and latent diffusion models (LDMs) have profoundly impacted diverse sectors,
including music. This comprehensive review examines state-of-the-art (SOTA)
pre-trained models and foundation models in music, spanning from representation
learning, generative learning and multimodal learning. We first contextualise
the significance of music in various industries and trace the evolution of AI
in music. By delineating the modalities targeted by foundation models, we
discover many of the music representations are underexplored in FM development.
Then, emphasis is placed on the lack of versatility of previous methods on
diverse music applications, along with the potential of FMs in music
understanding, generation and medical application. By comprehensively exploring
the details of the model pre-training paradigm, architectural choices,
tokenisation, finetuning methodologies and controllability, we emphasise the
important topics that should have been well explored, like instruction tuning
and in-context learning, scaling law and emergent ability, as well as
long-sequence modelling etc. A dedicated section presents insights into music
agents, accompanied by a thorough analysis of datasets and evaluations
essential for pre-training and downstream tasks. Finally, by underscoring the
vital importance of ethical considerations, we advocate that following research
on FM for music should focus more on such issues as interpretability,
transparency, human responsibility, and copyright issues. The paper offers
insights into future challenges and trends on FMs for music, aiming to shape
the trajectory of human-AI collaboration in the music realm.Abstract
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled
Refusal Training
arXiv:2407.09121v1 »Full PDF »This study addresses a critical gap in safety tuning practices for Large
Language Models (LLMs) by identifying and tackling a refusal position bias
within safety tuning data, which compromises the models' ability to
appropriately refuse generating unsafe content. We introduce a novel approach,
Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse
compliance to harmful prompts at any response position, significantly enhancing
their safety capabilities. DeRTa incorporates two novel components: (1) Maximum
Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models
to recognize and avoid unsafe content by appending a segment of harmful
response to the beginning of a safe response, and (2) Reinforced Transition
Optimization (RTO), which equips models with the ability to transition from
potential harm to safety refusal consistently throughout the harmful response
sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model
families across six attack scenarios, demonstrates that our method not only
improves model safety without compromising performance but also surpasses
well-known models such as GPT-4 in defending against attacks. Importantly, our
approach successfully defends recent advanced attack methods (e.g., CodeAttack)
that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be
found at https://github.com/RobustNLP/DeRTa.Abstract
LLMs Meet Multimodal Generation and Editing: A Survey
52 Pages with 16 Figures, 12 Tables, and 545 References. GitHub
Repository at:
https://github.co...
With the recent advancement in large language models (LLMs), there is a
growing interest in combining LLMs with multimodal learning. Previous surveys
of multimodal large language models (MLLMs) mainly focus on multimodal
understanding. This survey elaborates on multimodal generation and editing
across various domains, comprising image, video, 3D, and audio. Specifically,
we summarize the notable advancements with milestone works in these fields and
categorize these studies into LLM-based and CLIP/T5-based methods. Then, we
summarize the various roles of LLMs in multimodal generation and exhaustively
investigate the critical technical components behind these methods and the
multimodal datasets utilized in these studies. Additionally, we dig into
tool-augmented multimodal agents that can leverage existing generative models
for human-computer interaction. Lastly, we discuss the advancements in the
generative AI safety field, investigate emerging applications, and discuss
future prospects. Our work provides a systematic and insightful overview of
multimodal generation and processing, which is expected to advance the
development of Artificial Intelligence for Generative Content (AIGC) and world
models. A curated list of all related papers can be found at
https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-GenerationAbstract
FairCLIP: Harnessing Fairness in Vision-Language Learning
Fairness is a critical concern in deep learning, especially in healthcare,
where these models influence diagnoses and treatment decisions. Although
fairness has been investigated in the vision-only domain, the fairness of
medical vision-language (VL) models remains unexplored due to the scarcity of
medical VL datasets for studying fairness. To bridge this research gap, we
introduce the first fair vision-language medical dataset Harvard-FairVLMed that
provides detailed demographic attributes, ground-truth labels, and clinical
notes to facilitate an in-depth examination of fairness within VL foundation
models. Using Harvard-FairVLMed, we conduct a comprehensive fairness analysis
of two widely-used VL models (CLIP and BLIP2), pre-trained on both natural and
medical domains, across four different protected attributes. Our results
highlight significant biases in all VL models, with Asian, Male, Non-Hispanic,
and Spanish being the preferred subgroups across the protected attributes of
race, gender, ethnicity, and language, respectively. In order to alleviate
these biases, we propose FairCLIP, an optimal-transport-based approach that
achieves a favorable trade-off between performance and fairness by reducing the
Sinkhorn distance between the overall sample distribution and the distributions
corresponding to each demographic group. As the first VL dataset of its kind,
Harvard-FairVLMed holds the potential to catalyze advancements in the
development of machine learning models that are both ethically aware and
clinically effective. Our dataset and code are available at
https://ophai.hms.harvard.edu/datasets/harvard-fairvlmed10k.Abstract
MARBLE: Music Audio Representation Benchmark for Universal Evaluation
In the era of extensive intersection between art and Artificial Intelligence
(AI), such as image generation and fiction co-creation, AI for music remains
relatively nascent, particularly in music understanding. This is evident in the
limited work on deep music representations, the scarcity of large-scale
datasets, and the absence of a universal and community-driven benchmark. To
address this issue, we introduce the Music Audio Representation Benchmark for
universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various
Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy
with four hierarchy levels, including acoustic, performance, score, and
high-level description. We then establish a unified protocol based on 14 tasks
on 8 public-available datasets, providing a fair and standard assessment of
representations of all open-sourced pre-trained models developed on music
recordings as baselines. Besides, MARBLE offers an easy-to-use, extendable, and
reproducible suite for the community, with a clear statement on copyright
issues on datasets. Results suggest recently proposed large-scale pre-trained
musical language models perform the best in most tasks, with room for further
improvement. The leaderboard and toolkit repository are published at
https://marble-bm.shef.ac.uk to promote future music AI research.Abstract
Conditional Supervised Contrastive Learning for Fair Text Classification
Contrastive representation learning has gained much attention due to its
superior performance in learning representations from both image and sequential
data. However, the learned representations could potentially lead to
performance disparities in downstream tasks, such as increased silencing of
underrepresented groups in toxicity comment classification. In light of this
challenge, in this work, we study learning fair representations that satisfy a
notion of fairness known as equalized odds for text classification via
contrastive learning. Specifically, we first theoretically analyze the
connections between learning representations with a fairness constraint and
conditional supervised contrastive objectives, and then propose to use
conditional supervised contrastive objectives to learn fair representations for
text classification. We conduct experiments on two text datasets to demonstrate
the effectiveness of our approaches in balancing the trade-offs between task
performance and bias mitigation among existing baselines for text
classification. Furthermore, we also show that the proposed methods are stable
in different hyperparameter settings.Abstract
Towards Return Parity in Markov Decision Processes
AISTATS 2022. Code is released at
https://github.com/JFChi/Return-Parity-MDP
Algorithmic decisions made by machine learning models in high-stakes domains
may have lasting impacts over time. However, naive applications of standard
fairness criterion in static settings over temporal domains may lead to delayed
and adverse effects. To understand the dynamics of performance disparity, we
study a fairness problem in Markov decision processes (MDPs). Specifically, we
propose return parity, a fairness notion that requires MDPs from different
demographic groups that share the same state and action spaces to achieve
approximately the same expected time-discounted rewards. We first provide a
decomposition theorem for return disparity, which decomposes the return
disparity of any two MDPs sharing the same state and action spaces into the
distance between group-wise reward functions, the discrepancy of group
policies, and the discrepancy between state visitation distributions induced by
the group policies. Motivated by our decomposition theorem, we propose
algorithms to mitigate return disparity via learning a shared group policy with
state visitation distributional alignment using integral probability metrics.
We conduct experiments to corroborate our results, showing that the proposed
algorithm can successfully close the disparity gap while maintaining the
performance of policies on two real-world recommender system benchmark
datasets.Abstract
Rapid COVID-19 Risk Screening by Eye-region Manifestations
arXiv:2106.06664v1 »Full PDF »It is still nontrivial to develop a new fast COVID-19 screening method with
the easier access and lower cost, due to the technical and cost limitations of
the current testing methods in the medical resource-poor districts. On the
other hand, there are more and more ocular manifestations that have been
reported in the COVID-19 patients as growing clinical evidence[1]. This
inspired this project. We have conducted the joint clinical research since
January 2021 at the ShiJiaZhuang City, Heibei province, China, which approved
by the ethics committee of The fifth hospital of ShiJiaZhuang of Hebei Medical
University. We undertake several blind tests of COVID-19 patients by Union
Hospital, Tongji Medical College, Huazhong University of Science and
Technology, Wuhan, China. Meantime as an important part of the ongoing globally
COVID-19 eye test program by AIMOMICS since February 2020, we propose a new
fast screening method of analyzing the eye-region images, captured by common
CCD and CMOS cameras. This could reliably make a rapid risk screening of
COVID-19 with the sustainable stable high performance in different countries
and races. Our model for COVID-19 rapid prescreening have the merits of the
lower cost, fully self-performed, non-invasive, importantly real-time, and thus
enables the continuous health surveillance. We further implement it as the open
accessible APIs, and provide public service to the world. Our pilot experiments
show that our model is ready to be usable to all kinds of surveillance
scenarios, such as infrared temperature measurement device at airports and
stations, or directly pushing to the target people groups smartphones as a
packaged application.Abstract