arXiv:2312.00029v3 »Full PDF »Research into AI alignment has grown considerably since the recent
introduction of increasingly capable Large Language Models (LLMs).
Unfortunately, modern methods of alignment still fail to fully prevent harmful
responses when models are deliberately attacked. Such vulnerabilities can lead
to LLMs being manipulated into generating hazardous content: from instructions
for creating dangerous materials to inciting violence or endorsing unethical
behaviors. To help mitigate this issue, we introduce Bergeron: a framework
designed to improve the robustness of LLMs against attacks without any
additional parameter fine-tuning. Bergeron is organized into two tiers; with a
secondary LLM acting as a guardian to the primary LLM. This framework better
safeguards the primary model against incoming attacks while monitoring its
output for any harmful content. Empirical analysis reviews that by using
Bergeron to complement models with existing alignment training, we can
significantly improve the robustness and safety of multiple, commonly used
commercial and open-source LLMs. Specifically, we found that models integrated
with Bergeron are, on average, nearly seven times more resistant to attacks
compared to models without such support.Abstract
"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure
Risks and Benefits When Using LLM-Based Conversational Agents
The widespread use of Large Language Model (LLM)-based conversational agents
(CAs), especially in high-stakes domains, raises many privacy concerns.
Building ethical LLM-based CAs that respect user privacy requires an in-depth
understanding of the privacy risks that concern users the most. However,
existing research, primarily model-centered, does not provide insight into
users' perspectives. To bridge this gap, we analyzed sensitive disclosures in
real-world ChatGPT conversations and conducted semi-structured interviews with
19 LLM-based CA users. We found that users are constantly faced with trade-offs
between privacy, utility, and convenience when using LLM-based CAs. However,
users' erroneous mental models and the dark patterns in system design limited
their awareness and comprehension of the privacy risks. Additionally, the
human-like interactions encouraged more sensitive disclosures, which
complicated users' ability to navigate the trade-offs. We discuss practical
design guidelines and the needs for paradigm shifts to protect the privacy of
LLM-based CA users.Abstract
Mental-LLM: Leveraging Large Language Models for Mental Health
Prediction via Online Text Data
Published at Proceedings of the ACM on Interactive, Mobile, Wearable
and Ubiquitous Technologies (...
Advances in large language models (LLMs) have empowered a variety of
applications. However, there is still a significant gap in research when it
comes to understanding and enhancing the capabilities of LLMs in the field of
mental health. In this work, we present a comprehensive evaluation of multiple
LLMs on various mental health prediction tasks via online text data, including
Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of
experiments, covering zero-shot prompting, few-shot prompting, and instruction
fine-tuning. The results indicate a promising yet limited performance of LLMs
with zero-shot and few-shot prompt designs for mental health tasks. More
importantly, our experiments show that instruction finetuning can significantly
boost the performance of LLMs for all tasks simultaneously. Our best-finetuned
models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of
GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of
GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the
state-of-the-art task-specific language model. We also conduct an exploratory
case study on LLMs' capability on mental health reasoning tasks, illustrating
the promising capability of certain models such as GPT-4. We summarize our
findings into a set of action guidelines for potential methods to enhance LLMs'
capability for mental health tasks. Meanwhile, we also emphasize the important
limitations before achieving deployability in real-world mental health
settings, such as known racial and gender bias. We highlight the important
ethical risks accompanying this line of research.Abstract
Is a Seat at the Table Enough? Engaging Teachers and Students in Dataset
Specification for ML in Education
arXiv:2311.05792v1 »Full PDF »Despite the promises of ML in education, its adoption in the classroom has
surfaced numerous issues regarding fairness, accountability, and transparency,
as well as concerns about data privacy and student consent. A root cause of
these issues is the lack of understanding of the complex dynamics of education,
including teacher-student interactions, collaborative learning, and classroom
environment. To overcome these challenges and fully utilize the potential of ML
in education, software practitioners need to work closely with educators and
students to fully understand the context of the data (the backbone of ML
applications) and collaboratively define the ML data specifications. To gain a
deeper understanding of such a collaborative process, we conduct ten co-design
sessions with ML software practitioners, educators, and students. In the
sessions, teachers and students work with ML engineers, UX designers, and legal
practitioners to define dataset characteristics for a given ML application. We
find that stakeholders contextualize data based on their domain and procedural
knowledge, proactively design data requirements to mitigate downstream harms
and data reliability concerns, and exhibit role-based collaborative strategies
and contribution patterns. Further, we find that beyond a seat at the table,
meaningful stakeholder participation in ML requires structured supports:
defined processes for continuous iteration and co-evaluation, shared contextual
data quality standards, and information scaffolds for both technical and
non-technical stakeholders to traverse expertise boundaries.Abstract
Are Fairy Tales Fair? Analyzing Gender Bias in Temporal Narrative Event
Chains of Children's Fairy Tales
Social biases and stereotypes are embedded in our culture in part through
their presence in our stories, as evidenced by the rich history of humanities
and social science literature analyzing such biases in children stories.
Because these analyses are often conducted manually and at a small scale, such
investigations can benefit from the use of more recent natural language
processing methods that examine social bias in models and data corpora. Our
work joins this interdisciplinary effort and makes a unique contribution by
taking into account the event narrative structures when analyzing the social
bias of stories. We propose a computational pipeline that automatically
extracts a story's temporal narrative verb-based event chain for each of its
characters as well as character attributes such as gender. We also present a
verb-based event annotation scheme that can facilitate bias analysis by
including categories such as those that align with traditional stereotypes.
Through a case study analyzing gender bias in fairy tales, we demonstrate that
our framework can reveal bias in not only the unigram verb-based events in
which female and male characters participate but also in the temporal narrative
order of such event participation.Abstract
7th ICML Workshop on Automated Machine Learning (2020)
The CASH problem has been widely studied in the context of automated
configurations of machine learning (ML) pipelines and various solvers and
toolkits are available. However, CASH solvers do not directly handle black-box
constraints such as fairness, robustness or other domain-specific custom
constraints. We present our recent approach [Liu, et al., 2020] that leverages
the ADMM optimization framework to decompose CASH into multiple small problems
and demonstrate how ADMM facilitates incorporation of black-box constraints.Abstract
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained
Language Model
Update model and results; add comparison with EVA2.0
In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain
dialogue generation model based on a large pre-trained language model (PLM)
PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue
models trained over a massive amount of dialogue data from scratch, we aim to
build a powerful dialogue model with relatively fewer data and computation
costs by inheriting valuable language capabilities and knowledge from PLMs. To
this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been
proven well-performed on a variety of Chinese natural language tasks. We
investigate different aspects of responses generated by PanGu-Bot, including
response quality, knowledge, and safety. We show that PanGu-Bot outperforms
state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA
(Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects.
We also demonstrate that PanGu-Bot can be easily deployed to generate emotional
responses without further training. Throughout our empirical analysis, we also
point out that the PanGu-Bot response quality, knowledge correctness, and
safety are still far from perfect, and further explorations are indispensable
to building reliable and smart dialogue systems. Our model and code will be
available at
https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot
soon.Abstract
CDR: Customizable Density Ratios of Strong-over-weak LLMs for Preference
Annotation
arXiv:2411.02481v2 »Full PDF »Preference tuning of large language models (LLMs) relies on high-quality
human preference data, which is often expensive and time-consuming to gather.
While existing methods can use trained reward models or proprietary model as
judges for preference annotation, they have notable drawbacks: training reward
models remain dependent on initial human data, and using proprietary model
imposes license restrictions that inhibits commercial usage. In this paper, we
introduce customized density ratio (CDR), a training-free and highly effective
method that leverages off-the-shelf LLMs for preference data annotation. Our
approach uses the log-density ratio between a better-aligned LLM and a less
aligned LLM as a reward signal. We explores 221 different LLMs pairs and
empirically demonstrate that increasing the performance gap between paired LLMs
correlates with better reward generalization. Furthermore, we show that
tailoring the density ratio reward function with specific criteria and
preference exemplars enhances performance across domains and within target
areas.
In our experiment using density ratio from a pair of Mistral-7B models, CDR
achieves a RewardBench score of 82.6, outperforming the best trained reward
functions from same model class and demonstrating competitive performance
against SoTA models in Safety (91.0) and Reasoning (88.0) domains. We use CDR
to annotate an on-policy preference dataset with which we preference tune
Llama-3-8B-Instruct with SimPO. Using reward signals from two relatively weak
models, our approach pushes Llama-3-8B to achieve a 37.4% (+15.1%) win rate on
ArenaHard and a 40.7% (+17.8%) win rate on Length-Controlled AlpacaEval 2.0,
along with a score of 8.0 on MT-Bench.Abstract
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with
Next-Patch Prediction
Simulating realistic behaviors of traffic agents is pivotal for efficiently
validating the safety of autonomous driving systems. Existing data-driven
simulators primarily use an encoder-decoder architecture to encode the
historical trajectories before decoding the future. However, the heterogeneity
between encoders and decoders complicates the models, and the manual separation
of historical and future trajectories leads to low data utilization. Given
these limitations, we propose BehaviorGPT, a homogeneous and fully
autoregressive Transformer designed to simulate the sequential behavior of
multiple agents. Crucially, our approach discards the traditional separation
between "history" and "future" by modeling each time step as the "current" one
for motion generation, leading to a simpler, more parameter- and data-efficient
agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3)
to mitigate the negative effects of autoregressive modeling, in which models
are trained to reason at the patch level of trajectories and capture long-range
spatial-temporal interactions. Despite having merely 3M model parameters,
BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a
realism score of 0.7473 and a minADE score of 1.4147, demonstrating its
exceptional performance in traffic agent simulation.Abstract
SIESEF-FusionNet: Spatial Inter-correlation Enhancement and
Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic
Segmentation
The ambiguity at the boundaries of different semantic classes in point cloud
semantic segmentation often leads to incorrect decisions in intelligent
perception systems, such as autonomous driving. Hence, accurate delineation of
the boundaries is crucial for improving safety in autonomous driving. A novel
spatial inter-correlation enhancement and spatially-embedded feature fusion
network (SIESEF-FusionNet) is proposed in this paper, enhancing spatial
inter-correlation by combining inverse distance weighting and angular
compensation to extract more beneficial spatial information without causing
redundancy. Meanwhile, a new spatial adaptive pooling module is also designed,
embedding enhanced spatial information into semantic features for strengthening
the context-awareness of semantic features. Experimental results demonstrate
that 83.7% mIoU and 97.8% OA are achieved by SIESEF-FusionNet on the Toronto3D
dataset, with performance superior to other baseline methods. A value of 61.1%
mIoU is reached on the semanticKITTI dataset, where a marked improvement in
segmentation performance is observed. In addition, the effectiveness and
plug-and-play capability of the proposed modules are further verified through
ablation studies.Abstract