arXiv:2408.07009v1 »Full PDF »We introduce Imagen 3, a latent diffusion model that generates high quality
images from text prompts. We describe our quality and responsibility
evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at
the time of evaluation. In addition, we discuss issues around safety and
representation, as well as methods we used to minimize the potential harm of
our models.Abstract
Underspecification in Scene Description-to-Depiction Tasks
arXiv:2210.05815v1 »Full PDF »Questions regarding implicitness, ambiguity and underspecification are
crucial for understanding the task validity and ethical concerns of multimodal
image+text systems, yet have received little attention to date. This position
paper maps out a conceptual framework to address this gap, focusing on systems
which generate images depicting scenes from scene descriptions. In doing so, we
account for how texts and images convey meaning differently. We outline a set
of core challenges concerning textual and visual ambiguity, as well as risks
that may be amplified by ambiguous and underspecified elements. We propose and
discuss strategies for addressing these challenges, including generating
visually ambiguous images, and generating a set of diverse images.Abstract
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with
Next-Patch Prediction
Simulating realistic behaviors of traffic agents is pivotal for efficiently
validating the safety of autonomous driving systems. Existing data-driven
simulators primarily use an encoder-decoder architecture to encode the
historical trajectories before decoding the future. However, the heterogeneity
between encoders and decoders complicates the models, and the manual separation
of historical and future trajectories leads to low data utilization. Given
these limitations, we propose BehaviorGPT, a homogeneous and fully
autoregressive Transformer designed to simulate the sequential behavior of
multiple agents. Crucially, our approach discards the traditional separation
between "history" and "future" by modeling each time step as the "current" one
for motion generation, leading to a simpler, more parameter- and data-efficient
agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3)
to mitigate the negative effects of autoregressive modeling, in which models
are trained to reason at the patch level of trajectories and capture long-range
spatial-temporal interactions. Despite having merely 3M model parameters,
BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a
realism score of 0.7473 and a minADE score of 1.4147, demonstrating its
exceptional performance in traffic agent simulation.Abstract
Stochastic Monkeys at Play: Random Augmentations Cheaply Break LLM
Safety Alignment
Safety alignment of Large Language Models (LLMs) has recently become a
critical objective of model developers. In response, a growing body of work has
been investigating how safety alignment can be bypassed through various
jailbreaking methods, such as adversarial attacks. However, these jailbreak
methods can be rather costly or involve a non-trivial amount of creativity and
effort, introducing the assumption that malicious users are high-resource or
sophisticated. In this paper, we study how simple random augmentations to the
input prompt affect safety alignment effectiveness in state-of-the-art LLMs,
such as Llama 3 and Qwen 2. We perform an in-depth evaluation of 17 different
models and investigate the intersection of safety under random augmentations
with multiple dimensions: augmentation type, model size, quantization,
fine-tuning-based defenses, and decoding strategies (e.g., sampling
temperature). We show that low-resource and unsophisticated attackers, i.e.
stochastic monkeys, can significantly improve their chances of
bypassing alignment with just 25 random augmentations per prompt.Abstract
Whither Bias Goes, I Will Go: An Integrative, Systematic Review of
Algorithmic Bias Mitigation
Machine learning (ML) models are increasingly used for personnel assessment
and selection (e.g., resume screeners, automatically scored interviews).
However, concerns have been raised throughout society that ML assessments may
be biased and perpetuate or exacerbate inequality. Although organizational
researchers have begun investigating ML assessments from traditional
psychometric and legal perspectives, there is a need to understand, clarify,
and integrate fairness operationalizations and algorithmic bias mitigation
methods from the computer science, data science, and organizational research
literatures. We present a four-stage model of developing ML assessments and
applying bias mitigation methods, including 1) generating the training data, 2)
training the model, 3) testing the model, and 4) deploying the model. When
introducing the four-stage model, we describe potential sources of bias and
unfairness at each stage. Then, we systematically review definitions and
operationalizations of algorithmic bias, legal requirements governing personnel
selection from the United States and Europe, and research on algorithmic bias
mitigation across multiple domains and integrate these findings into our
framework. Our review provides insights for both research and practice by
elucidating possible mechanisms of algorithmic bias while identifying which
bias mitigation methods are legal and effective. This integrative framework
also reveals gaps in the knowledge of algorithmic bias mitigation that should
be addressed by future collaborative research between organizational
researchers, computer scientists, and data scientists. We provide
recommendations for developing and deploying ML assessments, as well as
recommendations for future research into algorithmic bias and fairness.Abstract
arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
Generative AI Agents in Autonomous Machines: A Safety Perspective
arXiv:2410.15489v1 »Full PDF »The integration of Generative Artificial Intelligence (AI) into autonomous
machines represents a major paradigm shift in how these systems operate and
unlocks new solutions to problems once deemed intractable. Although generative
AI agents provide unparalleled capabilities, they also have unique safety
concerns. These challenges require robust safeguards, especially for autonomous
machines that operate in high-stakes environments. This work investigates the
evolving safety requirements when generative models are integrated as agents
into physical autonomous machines, comparing these to safety considerations in
less critical AI applications. We explore the challenges and opportunities to
ensure the safe deployment of generative AI-driven autonomous machines.
Furthermore, we provide a forward-looking perspective on the future of
AI-driven autonomous systems and emphasize the importance of evaluating and
communicating safety risks. As an important step towards addressing these
concerns, we recommend the development and implementation of comprehensive
safety scorecards for the use of generative AI technologies in autonomous
machines.Abstract
arXiv:2409.14586v1 »Full PDF »Text generation has a fundamental limitation almost by definition: there is
no taking back tokens that have been generated, even when they are clearly
problematic. In the context of language model safety, when a partial unsafe
generation is produced, language models by their nature tend to happily keep on
generating similarly unsafe additional text. This is in fact how safety
alignment of frontier models gets circumvented in the wild, despite great
efforts in improving their safety. Deviating from the paradigm of approaching
safety alignment as prevention (decreasing the probability of harmful
responses), we propose backtracking, a technique that allows language models to
"undo" and recover from their own unsafe generation through the introduction of
a special [RESET] token. Our method can be incorporated into either SFT or DPO
training to optimize helpfulness and harmlessness. We show that models trained
to backtrack are consistently safer than baseline models: backtracking
Llama-3-8B is four times more safe than the baseline model (6.1\% → 1.5\%)
in our evaluations without regression in helpfulness. Our method additionally
provides protection against four adversarial attacks including an adaptive
attack, despite not being trained to do so.Abstract
Evaluating Fairness in Transaction Fraud Models: Fairness Metrics, Bias
Audits, and Challenges
arXiv:2409.04373v1 »Full PDF »Ensuring fairness in transaction fraud detection models is vital due to the
potential harms and legal implications of biased decision-making. Despite
extensive research on algorithmic fairness, there is a notable gap in the study
of bias in fraud detection models, mainly due to the field's unique challenges.
These challenges include the need for fairness metrics that account for fraud
data's imbalanced nature and the tradeoff between fraud protection and service
quality. To address this gap, we present a comprehensive fairness evaluation of
transaction fraud models using public synthetic datasets, marking the first
algorithmic bias audit in this domain. Our findings reveal three critical
insights: (1) Certain fairness metrics expose significant bias only after
normalization, highlighting the impact of class imbalance. (2) Bias is
significant in both service quality-related parity metrics and fraud
protection-related parity metrics. (3) The fairness through unawareness
approach, which involved removing sensitive attributes such as gender, does not
improve bias mitigation within these datasets, likely due to the presence of
correlated proxies. We also discuss socio-technical fairness-related challenges
in transaction fraud models. These insights underscore the need for a nuanced
approach to fairness in fraud detection, balancing protection and service
quality, and moving beyond simple bias mitigation strategies. Future work must
focus on refining fairness metrics and developing methods tailored to the
unique complexities of the transaction fraud domain.Abstract
Recursively Feasible Probabilistic Safe Online Learning with Control
Barrier Functions
Journal article. Includes the results of the 2021 CDC paper titled
"Pointwise feasibility of gauss...
Learning-based control has recently shown great efficacy in performing
complex tasks for various applications. However, to deploy it in real systems,
it is of vital importance to guarantee the system will stay safe. Control
Barrier Functions (CBFs) offer mathematical tools for designing
safety-preserving controllers for systems with known dynamics. In this article,
we first introduce a model-uncertainty-aware reformulation of CBF-based
safety-critical controllers using Gaussian Process (GP) regression to close the
gap between an approximate mathematical model and the real system, which
results in a second-order cone program (SOCP)-based control design. We then
present the pointwise feasibility conditions of the resulting safety
controller, highlighting the level of richness that the available system
information must meet to ensure safety. We use these conditions to devise an
event-triggered online data collection strategy that ensures the recursive
feasibility of the learned safety controller. Our method works by constantly
reasoning about whether the current information is sufficient to ensure safety
or if new measurements under active safe exploration are required to reduce the
uncertainty. As a result, our proposed framework can guarantee the forward
invariance of the safe set defined by the CBF with high probability, even if it
contains a priori unexplored regions. We validate the proposed framework in two
numerical simulation experiments.Abstract