arXiv:2411.03395v1 »Full PDF »Large language models (LLMs) have shown remarkable progress in encoding
clinical knowledge and responding to complex medical queries with appropriate
clinical reasoning. However, their applicability in subspecialist or complex
medical settings remains underexplored. In this work, we probe the performance
of AMIE, a research conversational diagnostic AI system, in the subspecialist
domain of breast oncology care without specific fine-tuning to this challenging
domain. To perform this evaluation, we curated a set of 50 synthetic breast
cancer vignettes representing a range of treatment-naive and
treatment-refractory cases and mirroring the key information available to a
multidisciplinary tumor board for decision-making (openly released with this
work). We developed a detailed clinical rubric for evaluating management plans,
including axes such as the quality of case summarization, safety of the
proposed care plan, and recommendations for chemotherapy, radiotherapy, surgery
and hormonal therapy. To improve performance, we enhanced AMIE with the
inference-time ability to perform web search retrieval to gather relevant and
up-to-date clinical knowledge and refine its responses with a multi-stage
self-critique pipeline. We compare response quality of AMIE with internal
medicine trainees, oncology fellows, and general oncology attendings under both
automated and specialist clinician evaluations. In our evaluations, AMIE
outperformed trainees and fellows demonstrating the potential of the system in
this challenging and important domain. We further demonstrate through
qualitative examples, how systems such as AMIE might facilitate conversational
interactions to assist clinicians in their decision making. However, AMIE's
performance was overall inferior to attending oncologists suggesting that
further research is needed prior to consideration of prospective uses.Abstract
arXiv:2404.18416v2 »Full PDF »Excellence in a wide variety of medical applications poses considerable
challenges for AI, requiring advanced reasoning, access to up-to-date medical
knowledge and understanding of complex multimodal data. Gemini models, with
strong general capabilities in multimodal and long-context reasoning, offer
exciting possibilities in medicine. Building on these core strengths of Gemini,
we introduce Med-Gemini, a family of highly capable multimodal models that are
specialized in medicine with the ability to seamlessly use web search, and that
can be efficiently tailored to novel modalities using custom encoders. We
evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art
(SoTA) performance on 10 of them, and surpass the GPT-4 model family on every
benchmark where a direct comparison is viable, often by a wide margin. On the
popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves
SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search
strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU
(health & medicine), Med-Gemini improves over GPT-4V by an average relative
margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context
capabilities through SoTA performance on a needle-in-a-haystack retrieval task
from long de-identified health records and medical video question answering,
surpassing prior bespoke methods using only in-context learning. Finally,
Med-Gemini's performance suggests real-world utility by surpassing human
experts on tasks such as medical text summarization, alongside demonstrations
of promising potential for multimodal medical dialogue, medical research and
education. Taken together, our results offer compelling evidence for
Med-Gemini's potential, although further rigorous evaluation will be crucial
before real-world deployment in this safety-critical domain.Abstract
Conformal Prediction with Large Language Models for Multi-Choice
Question Answering
Updated sections on prompt engineering. Expanded sections 4.1 and 4.2
and appendix. Included addit...
As large language models continue to be widely developed, robust uncertainty
quantification techniques will become crucial for their safe deployment in
high-stakes scenarios. In this work, we explore how conformal prediction can be
used to provide uncertainty quantification in language models for the specific
task of multiple-choice question-answering. We find that the uncertainty
estimates from conformal prediction are tightly correlated with prediction
accuracy. This observation can be useful for downstream applications such as
selective classification and filtering out low-quality predictions. We also
investigate the exchangeability assumption required by conformal prediction to
out-of-subject questions, which may be a more realistic scenario for many
practical applications. Our work contributes towards more trustworthy and
reliable usage of large language models in safety-critical situations, where
robust guarantees of error rate are required.Abstract
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from
Disparate Training Data
arXiv:2406.14546v2 »Full PDF »One way to address safety risks from large language models (LLMs) is to
censor dangerous knowledge from their training data. While this removes the
explicit information, implicit information can remain scattered across various
training documents. Could an LLM infer the censored knowledge by piecing
together these implicit hints? As a step towards answering this question, we
study inductive out-of-context reasoning (OOCR), a type of generalization in
which LLMs infer latent information from evidence distributed across training
documents and apply it to downstream tasks without in-context learning. Using a
suite of five tasks, we demonstrate that frontier LLMs can perform inductive
OOCR. In one experiment we finetune an LLM on a corpus consisting only of
distances between an unknown city and other known cities. Remarkably, without
in-context examples or Chain of Thought, the LLM can verbalize that the unknown
city is Paris and use this fact to answer downstream questions. Further
experiments show that LLMs trained only on individual coin flip outcomes can
verbalize whether the coin is biased, and those trained only on pairs (x, f
(x)) can articulate a definition of f and compute inverses. While OOCR succeeds
in a range of cases, we also show that it is unreliable, particularly for
smaller LLMs learning complex structures. Overall, the ability of LLMs to
"connect the dots" without explicit in-context learning poses a potential
obstacle to monitoring and controlling the knowledge acquired by LLMs.Abstract
Gemma: Open Models Based on Gemini Research and Technology
arXiv:2403.08295v4 »Full PDF »This work introduces Gemma, a family of lightweight, state-of-the art open
models built from the research and technology used to create Gemini models.
Gemma models demonstrate strong performance across academic benchmarks for
language understanding, reasoning, and safety. We release two sizes of models
(2 billion and 7 billion parameters), and provide both pretrained and
fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out
of 18 text-based tasks, and we present comprehensive evaluations of safety and
responsibility aspects of the models, alongside a detailed description of model
development. We believe the responsible release of LLMs is critical for
improving the safety of frontier models, and for enabling the next wave of LLM
innovations.Abstract
SpikingJET: Enhancing Fault Injection for Fully and Convolutional
Spiking Neural Networks
arXiv:2404.00383v1 »Full PDF »As artificial neural networks become increasingly integrated into
safety-critical systems such as autonomous vehicles, devices for medical
diagnosis, and industrial automation, ensuring their reliability in the face of
random hardware faults becomes paramount. This paper introduces SpikingJET, a
novel fault injector designed specifically for fully connected and
convolutional Spiking Neural Networks (SNNs). Our work underscores the critical
need to evaluate the resilience of SNNs to hardware faults, considering their
growing prominence in real-world applications. SpikingJET provides a
comprehensive platform for assessing the resilience of SNNs by inducing errors
and injecting faults into critical components such as synaptic weights, neuron
model parameters, internal states, and activation functions. This paper
demonstrates the effectiveness of Spiking-JET through extensive software-level
experiments on various SNN architectures, revealing insights into their
vulnerability and resilience to hardware faults. Moreover, highlighting the
importance of fault resilience in SNNs contributes to the ongoing effort to
enhance the reliability and safety of Neural Network (NN)-powered systems in
diverse domains.Abstract
Confidence-Aware Decision-Making and Control for Tool Selection
arXiv:2403.03808v1 »Full PDF »Self-reflecting about our performance (e.g., how confident we are) before
doing a task is essential for decision making, such as selecting the most
suitable tool or choosing the best route to drive. While this form of awareness
-- thinking about our performance or metacognitive performance -- is well-known
in humans, robots still lack this cognitive ability. This reflective monitoring
can enhance their embodied decision power, robustness and safety. Here, we take
a step in this direction by introducing a mathematical framework that allows
robots to use their control self-confidence to make better-informed decisions.
We derive a mathematical closed-form expression for control confidence for
dynamic systems (i.e., the posterior inverse covariance of the control action).
This control confidence seamlessly integrates within an objective function for
decision making, that balances the: i) performance for task completion, ii)
control effort, and iii) self-confidence. To evaluate our theoretical account,
we framed the decision-making within the tool selection problem, where the
agent has to select the best robot arm for a particular control task. The
statistical analysis of the numerical simulations with randomized 2DOF arms
shows that using control confidence during tool selection improves both real
task performance, and the reliability of the tool for performance under
unmodelled perturbations (e.g., external forces). Furthermore, our results
indicate that control confidence is an early indicator of performance and thus,
it can be used as a heuristic for making decisions when computation power is
restricted or decision-making is intractable. Overall, we show the advantages
of using confidence-aware decision-making and control scheme for dynamic
systems.Abstract
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety
Training
Humans are capable of strategically deceptive behavior: behaving helpfully in
most situations, but then behaving very differently in order to pursue
alternative objectives when given the opportunity. If an AI system learned such
a deceptive strategy, could we detect it and remove it using current
state-of-the-art safety training techniques? To study this question, we
construct proof-of-concept examples of deceptive behavior in large language
models (LLMs). For example, we train models that write secure code when the
prompt states that the year is 2023, but insert exploitable code when the
stated year is 2024. We find that such backdoor behavior can be made
persistent, so that it is not removed by standard safety training techniques,
including supervised fine-tuning, reinforcement learning, and adversarial
training (eliciting unsafe behavior and then training to remove it). The
backdoor behavior is most persistent in the largest models and in models
trained to produce chain-of-thought reasoning about deceiving the training
process, with the persistence remaining even when the chain-of-thought is
distilled away. Furthermore, rather than removing backdoors, we find that
adversarial training can teach models to better recognize their backdoor
triggers, effectively hiding the unsafe behavior. Our results suggest that,
once a model exhibits deceptive behavior, standard techniques could fail to
remove such deception and create a false impression of safety.Abstract
LADRI: LeArning-based Dynamic Risk Indicator in Automated Driving System
2023 IEEE International Test Conference, 8th Edition of Automotive,
Reliability, Test & Safety Wor...
As the horizon of intelligent transportation expands with the evolution of
Automated Driving Systems (ADS), ensuring paramount safety becomes more
imperative than ever. Traditional risk assessment methodologies, primarily
crafted for human-driven vehicles, grapple to adequately adapt to the
multifaceted, evolving environments of ADS. This paper introduces a framework
for real-time Dynamic Risk Assessment (DRA) in ADS, harnessing the potency of
Artificial Neural Networks (ANNs).
Our proposed solution transcends these limitations, drawing upon ANNs, a
cornerstone of deep learning, to meticulously analyze and categorize risk
dimensions using real-time On-board Sensor (OBS) data. This learning-centric
approach not only elevates the ADS's situational awareness but also enriches
its understanding of immediate operational contexts. By dissecting OBS data,
the system is empowered to pinpoint its current risk profile, thereby enhancing
safety prospects for onboard passengers and the broader traffic ecosystem.
Through this framework, we chart a direction in risk assessment, bridging the
conventional voids and enhancing the proficiency of ADS. By utilizing ANNs, our
methodology offers a perspective, allowing ADS to adeptly navigate and react to
potential risk factors, ensuring safer and more informed autonomous journeys.Abstract
JAB: Joint Adversarial Prompting and Belief Augmentation
arXiv:2311.09473v1 »Full PDF »With the recent surge of language models in different applications, attention
to safety and robustness of these models has gained significant importance.
Here we introduce a joint framework in which we simultaneously probe and
improve the robustness of a black-box target model via adversarial prompting
and belief augmentation using iterative feedback loops. This framework utilizes
an automated red teaming approach to probe the target model, along with a
belief augmenter to generate instructions for the target model to improve its
robustness to those adversarial probes. Importantly, the adversarial model and
the belief generator leverage the feedback from past interactions to improve
the effectiveness of the adversarial prompts and beliefs, respectively. In our
experiments, we demonstrate that such a framework can reduce toxic content
generation both in dynamic cases where an adversary directly interacts with a
target model and static cases where we use a static benchmark dataset to
evaluate our model.Abstract