arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
FUTURE-AI: International consensus guideline for trustworthy and
deployable artificial intelligence in healthcare
arXiv:2309.12325v3 »Full PDF »Despite major advances in artificial intelligence (AI) for medicine and
healthcare, the deployment and adoption of AI technologies remain limited in
real-world clinical practice. In recent years, concerns have been raised about
the technical, clinical, ethical and legal risks associated with medical AI. To
increase real world adoption, it is essential that medical AI tools are trusted
and accepted by patients, clinicians, health organisations and authorities.
This work describes the FUTURE-AI guideline as the first international
consensus framework for guiding the development and deployment of trustworthy
AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and
currently comprises 118 inter-disciplinary experts from 51 countries
representing all continents, including AI scientists, clinicians, ethicists,
and social scientists. Over a two-year period, the consortium defined guiding
principles and best practices for trustworthy AI through an iterative process
comprising an in-depth literature review, a modified Delphi survey, and online
consensus meetings. The FUTURE-AI framework was established based on 6 guiding
principles for trustworthy AI in healthcare, i.e. Fairness, Universality,
Traceability, Usability, Robustness and Explainability. Through consensus, a
set of 28 best practices were defined, addressing technical, clinical, legal
and socio-ethical dimensions. The recommendations cover the entire lifecycle of
medical AI, from design, development and validation to regulation, deployment,
and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which
provides a structured approach for constructing medical AI tools that will be
trusted, deployed and adopted in real-world practice. Researchers are
encouraged to take the recommendations into account in proof-of-concept stages
to facilitate future translation towards clinical practice of medical AI.Abstract
Standing on FURM ground -- A framework for evaluating Fair, Useful, and
Reliable AI Models in healthcare systems
arXiv:2403.07911v2 »Full PDF »The impact of using artificial intelligence (AI) to guide patient care or
operational processes is an interplay of the AI model's output, the
decision-making protocol based on that output, and the capacity of the
stakeholders involved to take the necessary subsequent action. Estimating the
effects of this interplay before deployment, and studying it in real time
afterwards, are essential to bridge the chasm between AI model development and
achievable benefit. To accomplish this, the Data Science team at Stanford
Health Care has developed a Testing and Evaluation (T&E) mechanism to identify
fair, useful and reliable AI models (FURM) by conducting an ethical review to
identify potential value mismatches, simulations to estimate usefulness,
financial projections to assess sustainability, as well as analyses to
determine IT feasibility, design a deployment strategy, and recommend a
prospective monitoring and evaluation plan. We report on FURM assessments done
to evaluate six AI guided solutions for potential adoption, spanning clinical
and operational settings, each with the potential to impact from several dozen
to tens of thousands of patients each year. We describe the assessment process,
summarize the six assessments, and share our framework to enable others to
conduct similar assessments. Of the six solutions we assessed, two have moved
into a planning and implementation phase. Our novel contributions - usefulness
estimates by simulation, financial projections to quantify sustainability, and
a process to do ethical assessments - as well as their underlying methods and
open source tools, are available for other healthcare systems to conduct
actionable evaluations of candidate AI solutions.Abstract
A debiasing technique for place-based algorithmic patrol management
In recent years, there has been a revolution in data-driven policing. With
that has come scrutiny on how bias in historical data affects algorithmic
decision making. In this exploratory work, we introduce a debiasing technique
for place-based algorithmic patrol management systems. We show that the
technique efficiently eliminates racially biased features while retaining high
accuracy in the models. Finally, we provide a lengthy list of potential future
research in the realm of fairness and data-driven policing which this work
uncovered.Abstract
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Update July 11th: - Added missing footnote back in. - Adjusted author
order (mistakenly non-alphab...
Advanced AI models hold the promise of tremendous benefits for humanity, but
society needs to proactively manage the accompanying risks. In this paper, we
focus on what we term "frontier AI" models: highly capable foundation models
that could possess dangerous capabilities sufficient to pose severe risks to
public safety. Frontier AI models pose a distinct regulatory challenge:
dangerous capabilities can arise unexpectedly; it is difficult to robustly
prevent a deployed model from being misused; and, it is difficult to stop a
model's capabilities from proliferating broadly. To address these challenges,
at least three building blocks for the regulation of frontier models are
needed: (1) standard-setting processes to identify appropriate requirements for
frontier AI developers, (2) registration and reporting requirements to provide
regulators with visibility into frontier AI development processes, and (3)
mechanisms to ensure compliance with safety standards for the development and
deployment of frontier AI models. Industry self-regulation is an important
first step. However, wider societal discussions and government intervention
will be needed to create standards and to ensure compliance with them. We
consider several options to this end, including granting enforcement powers to
supervisory authorities and licensure regimes for frontier AI models. Finally,
we propose an initial set of safety standards. These include conducting
pre-deployment risk assessments; external scrutiny of model behavior; using
risk assessments to inform deployment decisions; and monitoring and responding
to new information about model capabilities and uses post-deployment. We hope
this discussion contributes to the broader conversation on how to balance
public safety risks and innovation benefits from advances at the frontier of AI
development.Abstract
Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective
Accepted by Advances in Neural Information Processing Systems
(NeurIPS 2023)
For medical image segmentation, contrastive learning is the dominant practice
to improve the quality of visual representations by contrasting semantically
similar and dissimilar pairs of samples. This is enabled by the observation
that without accessing ground truth labels, negative examples with truly
dissimilar anatomical features, if sampled, can significantly improve the
performance. In reality, however, these samples may come from similar
anatomical regions and the models may struggle to distinguish the minority
tail-class samples, making the tail classes more prone to misclassification,
both of which typically lead to model collapse. In this paper, we propose ARCO,
a semi-supervised contrastive learning (CL) framework with stratified group
theory for medical image segmentation. In particular, we first propose building
ARCO through the concept of variance-reduced estimation and show that certain
variance-reduction techniques are particularly beneficial in pixel/voxel-level
segmentation tasks with extremely limited labels. Furthermore, we theoretically
prove these sampling techniques are universal in variance reduction. Finally,
we experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D
medical and three semantic segmentation datasets, with different label
settings, and our methods consistently outperform state-of-the-art
semi-supervised methods. Additionally, we augment the CL frameworks with these
sampling techniques and demonstrate significant gains over previous methods. We
believe our work is an important step towards semi-supervised medical image
segmentation by quantifying the limitation of current self-supervision
objectives for accomplishing such challenging safety-critical tasks.Abstract
International institutions may have an important role to play in ensuring
advanced AI systems benefit humanity. International collaborations can unlock
AI's ability to further sustainable development, and coordination of regulatory
efforts can reduce obstacles to innovation and the spread of benefits.
Conversely, the potential dangerous capabilities of powerful and
general-purpose AI systems create global externalities in their development and
deployment, and international efforts to further responsible AI practices could
help manage the risks they pose. This paper identifies a set of governance
functions that could be performed at an international level to address these
challenges, ranging from supporting access to frontier AI systems to setting
international safety standards. It groups these functions into four
institutional models that exhibit internal synergies and have precedents in
existing organizations: 1) a Commission on Frontier AI that facilitates expert
consensus on opportunities and risks from advanced AI, 2) an Advanced AI
Governance Organization that sets international standards to manage global
threats from advanced models, supports their implementation, and possibly
monitors compliance with a future governance regime, 3) a Frontier AI
Collaborative that promotes access to cutting-edge AI, and 4) an AI Safety
Project that brings together leading researchers and engineers to further AI
safety research. We explore the utility of these models and identify open
questions about their viability.Abstract
Robust error bounds for quantised and pruned neural networks
arXiv:2012.00138v2 »Full PDF »With the rise of smartphones and the internet-of-things, data is increasingly
getting generated at the edge on local, personal devices. For privacy, latency
and energy saving reasons, this shift is causing machine learning algorithms to
move towards decentralisation with the data and algorithms stored, and even
trained, locally on devices. The device hardware becomes the main bottleneck
for model capability in this set-up, creating a need for slimmed down, more
efficient neural networks. Neural network pruning and quantisation are two
methods that have been developed for this, with both approaches demonstrating
impressive results in reducing the computational cost without sacrificing
significantly on model performance. However, the understanding behind these
reduction methods remains underdeveloped. To address this issue, a
semi-definite program is introduced to bound the worst-case error caused by
pruning or quantising a neural network. The method can be applied to many
neural network structures and nonlinear activation functions with the bounds
holding robustly for all inputs in specified sets. It is hoped that the
computed bounds will provide certainty to the performance of these algorithms
when deployed on safety-critical systems.Abstract
An Ecosystem Approach to Ethical AI and Data Use: Experimental
Reflections
Submitted to the 2020 IEEE / ITU International Conference on
Artificial Intelligence for Good
While we have witnessed a rapid growth of ethics documents meant to guide AI
development, the promotion of AI ethics has nonetheless proceeded with little
input from AI practitioners themselves. Given the proliferation of AI for
Social Good initiatives, this is an emerging gap that needs to be addressed in
order to develop more meaningful ethical approaches to AI use and development.
This paper offers a methodology, a shared fairness approach, aimed at
identifying the needs of AI practitioners when it comes to confronting and
resolving ethical challenges and to find a third space where their operational
language can be married with that of the more abstract principles that
presently remain at the periphery of their work experiences. We offer a
grassroots approach to operational ethics based on dialog and mutualised
responsibility. This methodology is centred around conversations intended to
elicit practitioners perceived ethical attribution and distribution over key
value laden operational decisions, to identify when these decisions arise and
what ethical challenges they confront, and to engage in a language of ethics
and responsibility which enables practitioners to internalise ethical
responsibility. The methodology bridges responsibility imbalances that rest in
structural decision making power and elite technical knowledge, by commencing
with personal, facilitated conversations, returning the ethical discourse to
those meant to give it meaning at the sharp end of the ecosystem. Our primary
contribution is to add to the recent literature seeking to bring AI
practitioners' experiences to the fore by offering a methodology for
understanding how ethics manifests as a relational and interdependent
sociotechnical practice in their work.Abstract
Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics
arXiv:2001.00089v3 »Full PDF »Bias in machine learning has manifested injustice in several areas, such as
medicine, hiring, and criminal justice. In response, computer scientists have
developed myriad definitions of fairness to correct this bias in fielded
algorithms. While some definitions are based on established legal and ethical
norms, others are largely mathematical. It is unclear whether the general
public agrees with these fairness definitions, and perhaps more importantly,
whether they understand these definitions. We take initial steps toward
bridging this gap between ML researchers and the public, by addressing the
question: does a lay audience understand a basic definition of ML fairness? We
develop a metric to measure comprehension of three such
definitions--demographic parity, equal opportunity, and equalized odds. We
evaluate this metric using an online survey, and investigate the relationship
between comprehension and sentiment, demographics, and the definition itself.Abstract