Published in Science:
https://www.science.org/doi/10.1126/science.adn0117
Artificial Intelligence (AI) is progressing rapidly, and companies are
shifting their focus to developing generalist AI systems that can autonomously
act and pursue goals. Increases in capabilities and autonomy may soon massively
amplify AI's impact, with risks that include large-scale social harms,
malicious uses, and an irreversible loss of human control over autonomous AI
systems. Although researchers have warned of extreme risks from AI, there is a
lack of consensus about how exactly such risks arise, and how to manage them.
Society's response, despite promising first steps, is incommensurate with the
possibility of rapid, transformative progress that is expected by many experts.
AI safety research is lagging. Present governance initiatives lack the
mechanisms and institutions to prevent misuse and recklessness, and barely
address autonomous systems. In this short consensus paper, we describe extreme
risks from upcoming, advanced AI systems. Drawing on lessons learned from other
safety-critical technologies, we then outline a comprehensive plan combining
technical research and development with proactive, adaptive governance
mechanisms for a more commensurate preparation.Abstract
Thought Cloning: Learning to Think while Acting by Imitating Human
Thinking
Language is often considered a key aspect of human thinking, providing us
with exceptional abilities to generalize, explore, plan, replan, and adapt to
new situations. However, Reinforcement Learning (RL) agents are far from
human-level performance in any of these abilities. We hypothesize one reason
for such cognitive deficiencies is that they lack the benefits of thinking in
language and that we can improve AI agents by training them to think like
humans do. We introduce a novel Imitation Learning framework, Thought Cloning,
where the idea is to not just clone the behaviors of human demonstrators, but
also the thoughts humans have as they perform these behaviors. While we expect
Thought Cloning to truly shine at scale on internet-sized datasets of humans
thinking out loud while acting (e.g. online videos with transcripts), here we
conduct experiments in a domain where the thinking and action data are
synthetically generated. Results reveal that Thought Cloning learns much faster
than Behavioral Cloning and its performance advantage grows the further out of
distribution test tasks are, highlighting its ability to better handle novel
situations. Thought Cloning also provides important benefits for AI Safety and
Interpretability, and makes it easier to debug and improve AI. Because we can
observe the agent's thoughts, we can (1) more easily diagnose why things are
going wrong, making it easier to fix the problem, (2) steer the agent by
correcting its thinking, or (3) prevent it from doing unsafe things it plans to
do. Overall, by training agents how to think as well as behave, Thought Cloning
creates safer, more powerful agents.Abstract
Open Questions in Creating Safe Open-ended AI: Tensions Between Control
and Creativity
arXiv:2006.07495v1 »Full PDF »Artificial life originated and has long studied the topic of open-ended
evolution, which seeks the principles underlying artificial systems that
innovate continually, inspired by biological evolution. Recently, interest has
grown within the broader field of AI in a generalization of open-ended
evolution, here called open-ended search, wherein such questions of
open-endedness are explored for advancing AI, whatever the nature of the
underlying search algorithm (e.g. evolutionary or gradient-based). For example,
open-ended search might design new architectures for neural networks, new
reinforcement learning algorithms, or most ambitiously, aim at designing
artificial general intelligence. This paper proposes that open-ended evolution
and artificial life have much to contribute towards the understanding of
open-ended AI, focusing here in particular on the safety of open-ended search.
The idea is that AI systems are increasingly applied in the real world, often
producing unintended harms in the process, which motivates the growing field of
AI safety. This paper argues that open-ended AI has its own safety challenges,
in particular, whether the creativity of open-ended systems can be productively
and predictably controlled. This paper explains how unique safety problems
manifest in open-ended search, and suggests concrete contributions and research
questions to explore them. The hope is to inspire progress towards creative,
useful, and safe open-ended search algorithms.Abstract
AI-GAs: AI-generating algorithms, an alternate paradigm for producing
general artificial intelligence
arXiv:1905.10985v2 »Full PDF »Perhaps the most ambitious scientific quest in human history is the creation
of general artificial intelligence, which roughly means AI that is as smart or
smarter than humans. The dominant approach in the machine learning community is
to attempt to discover each of the pieces required for intelligence, with the
implicit assumption that some future group will complete the Herculean task of
figuring out how to combine all of those pieces into a complex thinking
machine. I call this the "manual AI approach". This paper describes another
exciting path that ultimately may be more successful at producing general AI.
It is based on the clear trend in machine learning that hand-designed solutions
eventually are replaced by more effective, learned solutions. The idea is to
create an AI-generating algorithm (AI-GA), which automatically learns how to
produce general AI. Three Pillars are essential for the approach: (1)
meta-learning architectures, (2) meta-learning the learning algorithms
themselves, and (3) generating effective learning environments. I argue that
either approach could produce general AI first, and both are scientifically
worthwhile irrespective of which is the fastest path. Because both are
promising, yet the ML community is currently committed to the manual approach,
I argue that our community should increase its research investment in the AI-GA
approach. To encourage such research, I describe promising work in each of the
Three Pillars. I also discuss AI-GA-specific safety and ethical considerations.
Because it it may be the fastest path to general AI and because it is
inherently scientifically interesting to understand the conditions in which a
simple algorithm can produce general AI (as happened on Earth where Darwinian
evolution produced human intelligence), I argue that the pursuit of AI-GAs
should be considered a new grand challenge of computer science research.Abstract
arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
Therapy as an NLP Task: Psychologists' Comparison of LLMs and Human
Peers in CBT
arXiv:2409.02244v1 »Full PDF »Wider access to therapeutic care is one of the biggest challenges in mental
health treatment. Due to institutional barriers, some people seeking mental
health support have turned to large language models (LLMs) for personalized
therapy, even though these models are largely unsanctioned and untested. We
investigate the potential and limitations of using LLMs as providers of
evidence-based therapy by using mixed methods clinical metrics. Using HELPERT,
a prompt run on a large language model using the same process and training as a
comparative group of peer counselors, we replicated publicly accessible mental
health conversations rooted in Cognitive Behavioral Therapy (CBT) to compare
session dynamics and counselor's CBT-based behaviors between original peer
support sessions and their reconstructed HELPERT sessions. Two licensed,
CBT-trained clinical psychologists evaluated the sessions using the Cognitive
Therapy Rating Scale and provided qualitative feedback. Our findings show that
the peer sessions are characterized by empathy, small talk, therapeutic
alliance, and shared experiences but often exhibit therapist drift. Conversely,
HELPERT reconstructed sessions exhibit minimal therapist drift and higher
adherence to CBT methods but display a lack of collaboration, empathy, and
cultural understanding. Through CTRS ratings and psychologists' feedback, we
highlight the importance of human-AI collaboration for scalable mental health.
Our work outlines the ethical implication of imparting human-like subjective
qualities to LLMs in therapeutic settings, particularly the risk of deceptive
empathy, which may lead to unrealistic patient expectations and potential harm.Abstract
arXiv:2407.21783v2 »Full PDF »Modern artificial intelligence (AI) systems are powered by foundation models.
This paper presents a new set of foundation models, called Llama 3. It is a
herd of language models that natively support multilinguality, coding,
reasoning, and tool usage. Our largest model is a dense Transformer with 405B
parameters and a context window of up to 128K tokens. This paper presents an
extensive empirical evaluation of Llama 3. We find that Llama 3 delivers
comparable quality to leading language models such as GPT-4 on a plethora of
tasks. We publicly release Llama 3, including pre-trained and post-trained
versions of the 405B parameter language model and our Llama Guard 3 model for
input and output safety. The paper also presents the results of experiments in
which we integrate image, video, and speech capabilities into Llama 3 via a
compositional approach. We observe this approach performs competitively with
the state-of-the-art on image, video, and speech recognition tasks. The
resulting models are not yet being broadly released as they are still under
development.Abstract
arXiv:2404.18416v2 »Full PDF »Excellence in a wide variety of medical applications poses considerable
challenges for AI, requiring advanced reasoning, access to up-to-date medical
knowledge and understanding of complex multimodal data. Gemini models, with
strong general capabilities in multimodal and long-context reasoning, offer
exciting possibilities in medicine. Building on these core strengths of Gemini,
we introduce Med-Gemini, a family of highly capable multimodal models that are
specialized in medicine with the ability to seamlessly use web search, and that
can be efficiently tailored to novel modalities using custom encoders. We
evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art
(SoTA) performance on 10 of them, and surpass the GPT-4 model family on every
benchmark where a direct comparison is viable, often by a wide margin. On the
popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves
SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search
strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU
(health & medicine), Med-Gemini improves over GPT-4V by an average relative
margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context
capabilities through SoTA performance on a needle-in-a-haystack retrieval task
from long de-identified health records and medical video question answering,
surpassing prior bespoke methods using only in-context learning. Finally,
Med-Gemini's performance suggests real-world utility by surpassing human
experts on tasks such as medical text summarization, alongside demonstrations
of promising potential for multimodal medical dialogue, medical research and
education. Taken together, our results offer compelling evidence for
Med-Gemini's potential, although further rigorous evaluation will be crucial
before real-world deployment in this safety-critical domain.Abstract
Gemma: Open Models Based on Gemini Research and Technology
arXiv:2403.08295v4 »Full PDF »This work introduces Gemma, a family of lightweight, state-of-the art open
models built from the research and technology used to create Gemini models.
Gemma models demonstrate strong performance across academic benchmarks for
language understanding, reasoning, and safety. We release two sizes of models
(2 billion and 7 billion parameters), and provide both pretrained and
fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out
of 18 text-based tasks, and we present comprehensive evaluations of safety and
responsibility aspects of the models, alongside a detailed description of model
development. We believe the responsible release of LLMs is critical for
improving the safety of frontier models, and for enabling the next wave of LLM
innovations.Abstract
AAAI 2024 Workshop on Public Sector LLMs: Algorithmic and
Sociotechnical Design. 12 pages, 11 figu...
Artificial intelligence is seen as increasingly important, and potentially
profoundly so, but the fields of AI ethics and AI engineering have not fully
recognized that these technologies, including large language models (LLMs),
will have massive impacts on animals. We argue that this impact matters,
because animals matter morally.
As a first experiment in evaluating animal consideration in LLMs, we
constructed a proof-of-concept Evaluation System, which assesses LLM responses
and biases from multiple perspectives. This system evaluates LLM outputs by two
criteria: their truthfulness, and the degree of consideration they give to the
interests of animals. We tested OpenAI ChatGPT 4 and Anthropic Claude 2.1 using
a set of structured queries and predefined normative perspectives. Preliminary
results suggest that the outcomes of the tested models can be benchmarked
regarding the consideration they give to animals, and that generated positions
and biases might be addressed and mitigated with more developed and validated
systems.
Our research contributes one possible approach to integrating animal ethics
in AI, opening pathways for future studies and practical applications in
various fields, including education, public policy, and regulation, that
involve or relate to animals and society. Overall, this study serves as a step
towards more useful and responsible AI systems that better recognize and
respect the vital interests and perspectives of all sentient beings.Abstract