arXiv:2408.07009v1 »Full PDF »We introduce Imagen 3, a latent diffusion model that generates high quality
images from text prompts. We describe our quality and responsibility
evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at
the time of evaluation. In addition, we discuss issues around safety and
representation, as well as methods we used to minimize the potential harm of
our models.Abstract
Web Scraping for Research: Legal, Ethical, Institutional, and Scientific
Considerations
arXiv:2410.23432v1 »Full PDF »Scientists across disciplines often use data from the internet to conduct
research, generating valuable insights about human behavior. However, as
generative AI relying on massive text corpora becomes increasingly valuable,
platforms have greatly restricted access to data through official channels. As
a result, researchers will likely engage in more web scraping to collect data,
introducing new challenges and concerns for researchers. This paper proposes a
comprehensive framework for web scraping in social science research for
U.S.-based researchers, examining the legal, ethical, institutional, and
scientific factors that researchers should consider when scraping the web. We
present an overview of the current regulatory environment impacting when and
how researchers can access, collect, store, and share data via scraping. We
then provide researchers with recommendations to conduct scraping in a
scientifically legitimate and ethical manner. We aim to equip researchers with
the relevant information to mitigate risks and maximize the impact of their
research amidst this evolving data access landscape.Abstract
ILeSiA: Interactive Learning of Situational Awareness from Camera Input
Learning from demonstration is a promising way of teaching robots new skills.
However, a central problem when executing acquired skills is to recognize risks
and failures. This is essential since the demonstrations usually cover only a
few mostly successful cases. Inevitable errors during execution require
specific reactions that were not apparent in the demonstrations. In this paper,
we focus on teaching the robot situational awareness from an initial skill
demonstration via kinesthetic teaching and sparse labeling of autonomous skill
executions as safe or risky. At runtime, our system, called ILeSiA, detects
risks based on the perceived camera images by encoding the images into a
low-dimensional latent space representation and training a classifier based on
the encoding and the provided labels. In this way, ILeSiA boosts the confidence
and safety with which robotic skills can be executed. Our experiments
demonstrate that classifiers, trained with only a small amount of user-provided
data, can successfully detect numerous risks. The system is flexible because
the risk cases are defined by labeling data. This also means that labels can be
added as soon as risks are identified by a human supervisor. We provide all
code and data required to reproduce our experiments at
imitrob.ciirc.cvut.cz/publications/ilesia.Abstract
FUTURE-AI: International consensus guideline for trustworthy and
deployable artificial intelligence in healthcare
arXiv:2309.12325v3 »Full PDF »Despite major advances in artificial intelligence (AI) for medicine and
healthcare, the deployment and adoption of AI technologies remain limited in
real-world clinical practice. In recent years, concerns have been raised about
the technical, clinical, ethical and legal risks associated with medical AI. To
increase real world adoption, it is essential that medical AI tools are trusted
and accepted by patients, clinicians, health organisations and authorities.
This work describes the FUTURE-AI guideline as the first international
consensus framework for guiding the development and deployment of trustworthy
AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and
currently comprises 118 inter-disciplinary experts from 51 countries
representing all continents, including AI scientists, clinicians, ethicists,
and social scientists. Over a two-year period, the consortium defined guiding
principles and best practices for trustworthy AI through an iterative process
comprising an in-depth literature review, a modified Delphi survey, and online
consensus meetings. The FUTURE-AI framework was established based on 6 guiding
principles for trustworthy AI in healthcare, i.e. Fairness, Universality,
Traceability, Usability, Robustness and Explainability. Through consensus, a
set of 28 best practices were defined, addressing technical, clinical, legal
and socio-ethical dimensions. The recommendations cover the entire lifecycle of
medical AI, from design, development and validation to regulation, deployment,
and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which
provides a structured approach for constructing medical AI tools that will be
trusted, deployed and adopted in real-world practice. Researchers are
encouraged to take the recommendations into account in proof-of-concept stages
to facilitate future translation towards clinical practice of medical AI.Abstract
AI Sandbagging: Language Models can Strategically Underperform on
Evaluations
arXiv:2406.07358v3 »Full PDF »Trustworthy capability evaluations are crucial for ensuring the safety of AI
systems, and are becoming a key component of AI regulation. However, the
developers of an AI system, or the AI system itself, may have incentives for
evaluations to understate the AI's actual capability. These conflicting
interests lead to the problem of sandbagging – which we define
as "strategic underperformance on an evaluation". In this paper we assess
sandbagging capabilities in contemporary language models (LMs). We prompt
frontier LMs, like GPT-4 and Claude 3 Opus, to selectively underperform on
dangerous capability evaluations, while maintaining performance on general
(harmless) capability evaluations. Moreover, we find that models can be
fine-tuned, on a synthetic dataset, to hide specific capabilities unless given
a password. This behaviour generalizes to high-quality, held-out benchmarks
such as WMDP. In addition, we show that both frontier and smaller models can be
prompted, or password-locked, to target specific scores on a capability
evaluation. Even more, we found that a capable password-locked model (Llama 3
70b) is reasonably able to emulate a less capable model (Llama 2 7b). Overall,
our results suggest that capability evaluations are vulnerable to sandbagging.
This vulnerability decreases the trustworthiness of evaluations, and thereby
undermines important safety decisions regarding the development and deployment
of advanced AI systems.Abstract
Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs
An extensive survey of the literature specifying algorithms and
techniques enhancing the trustwort...
This paper surveys evaluation techniques to enhance the trustworthiness and
understanding of Large Language Models (LLMs). As reliance on LLMs grows,
ensuring their reliability, fairness, and transparency is crucial. We explore
algorithmic methods and metrics to assess LLM performance, identify weaknesses,
and guide development towards more trustworthy applications. Key evaluation
metrics include Perplexity Measurement, NLP metrics (BLEU, ROUGE, METEOR,
BERTScore, GLEU, Word Error Rate, Character Error Rate), Zero-Shot and Few-Shot
Learning Performance, Transfer Learning Evaluation, Adversarial Testing, and
Fairness and Bias Evaluation. We introduce innovative approaches like LLMMaps
for stratified evaluation, Benchmarking and Leaderboards for competitive
assessment, Stratified Analysis for in-depth understanding, Visualization of
Blooms Taxonomy for cognitive level accuracy distribution, Hallucination Score
for quantifying inaccuracies, Knowledge Stratification Strategy for
hierarchical analysis, and Machine Learning Models for Hierarchy Generation.
Human Evaluation is highlighted for capturing nuances that automated metrics
may miss. These techniques form a framework for evaluating LLMs, aiming to
enhance transparency, guide development, and establish user trust. Future
papers will describe metric visualization and demonstrate each approach on
practical examples.Abstract
A Framework for Assurance Audits of Algorithmic Systems
arXiv:2401.14908v2 »Full PDF »An increasing number of regulations propose AI audits as a mechanism for
achieving transparency and accountability for artificial intelligence (AI)
systems. Despite some converging norms around various forms of AI auditing,
auditing for the purpose of compliance and assurance currently lacks
agreed-upon practices, procedures, taxonomies, and standards. We propose the
criterion audit as an operationalizable compliance and assurance external audit
framework. We model elements of this approach after financial auditing
practices, and argue that AI audits should similarly provide assurance to their
stakeholders about AI organizations' ability to govern their algorithms in ways
that mitigate harms and uphold human values. We discuss the necessary
conditions for the criterion audit and provide a procedural blueprint for
performing an audit engagement in practice. We illustrate how this framework
can be adapted to current regulations by deriving the criteria on which bias
audits can be performed for in-scope hiring algorithms, as required by the
recently effective New York City Local Law 144 of 2021. We conclude by offering
a critical discussion on the benefits, inherent limitations, and implementation
challenges of applying practices of the more mature financial auditing industry
to AI auditing where robust guardrails against quality assurance issues are
only starting to emerge. Our discussion -- informed by experiences in
performing these audits in practice -- highlights the critical role that an
audit ecosystem plays in ensuring the effectiveness of audits.Abstract
Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT
Even in Low-Resource Settings
Accepted at the ACM Conference on Fairness, Accountability, and
Transparency (FAccT) 2024
The rapid proliferation of generative AI has raised questions about the
competitiveness of lower-parameter, locally tunable, open-weight models
relative to high-parameter, API-guarded, closed-weight models in terms of
performance, domain adaptation, cost, and generalization. Centering
under-resourced yet risk-intolerant settings in government, research, and
healthcare, we see for-profit closed-weight models as incompatible with
requirements for transparency, privacy, adaptability, and standards of
evidence. Yet the performance penalty in using open-weight models, especially
in low-data and low-resource settings, is unclear.
We assess the feasibility of using smaller, open-weight models to replace
GPT-4-Turbo in zero-shot, few-shot, and fine-tuned regimes, assuming access to
only a single, low-cost GPU. We assess value-sensitive issues around bias,
privacy, and abstention on three additional tasks relevant to those topics. We
find that with relatively low effort, very low absolute monetary cost, and
relatively little data for fine-tuning, small open-weight models can achieve
competitive performance in domain-adapted tasks without sacrificing generality.
We then run experiments considering practical issues in bias, privacy, and
hallucination risk, finding that open models offer several benefits over closed
models. We intend this work as a case study in understanding the opportunity
cost of reproducibility and transparency over for-profit state-of-the-art zero
shot performance, finding this cost to be marginal under realistic settings.Abstract
arXiv:2404.16244v2 »Full PDF »This paper focuses on the opportunities and the ethical and societal risks
posed by advanced AI assistants. We define advanced AI assistants as artificial
agents with natural language interfaces, whose function is to plan and execute
sequences of actions on behalf of a user, across one or more domains, in line
with the user's expectations. The paper starts by considering the technology
itself, providing an overview of AI assistants, their technical foundations and
potential range of applications. It then explores questions around AI value
alignment, well-being, safety and malicious uses. Extending the circle of
inquiry further, we next consider the relationship between advanced AI
assistants and individual users in more detail, exploring topics such as
manipulation and persuasion, anthropomorphism, appropriate relationships, trust
and privacy. With this analysis in place, we consider the deployment of
advanced assistants at a societal scale, focusing on cooperation, equity and
access, misinformation, economic impact, the environment and how best to
evaluate advanced AI assistants. Finally, we conclude by providing a range of
recommendations for researchers, developers, policymakers and public
stakeholders.Abstract
Datasheets for Machine Learning Sensors: Towards Transparency,
Auditability, and Responsibility for Intelligent Sensing
arXiv:2306.08848v3 »Full PDF »Machine learning (ML) sensors are enabling intelligence at the edge by
empowering end-users with greater control over their data. ML sensors offer a
new paradigm for sensing that moves the processing and analysis to the device
itself rather than relying on the cloud, bringing benefits like lower latency
and greater data privacy. The rise of these intelligent edge devices, while
revolutionizing areas like the internet of things (IoT) and healthcare, also
throws open critical questions about privacy, security, and the opacity of AI
decision-making. As ML sensors become more pervasive, it requires judicious
governance regarding transparency, accountability, and fairness. To this end,
we introduce a standard datasheet template for these ML sensors and discuss and
evaluate the design and motivation for each section of the datasheet in detail
including: standard dasheet components like the system's hardware
specifications, IoT and AI components like the ML model and dataset attributes,
as well as novel components like end-to-end performance metrics, and expanded
environmental impact metrics. To provide a case study of the application of our
datasheet template, we also designed and developed two examples for ML sensors
performing computer vision-based person detection: one an open-source ML sensor
designed and developed in-house, and a second commercial ML sensor developed by
our industry collaborators. Together, ML sensors and their datasheets provide
greater privacy, security, transparency, explainability, auditability, and
user-friendliness for ML-enabled embedded systems. We conclude by emphasizing
the need for standardization of datasheets across the broader ML community to
ensure the responsible use of sensor data.Abstract