Social biases and stereotypes are embedded in our culture in part through
their presence in our stories, as evidenced by the rich history of humanities
and social science literature analyzing such biases in children stories.
Because these analyses are often conducted manually and at a small scale, such
investigations can benefit from the use of more recent natural language
processing methods that examine social bias in models and data corpora. Our
work joins this interdisciplinary effort and makes a unique contribution by
taking into account the event narrative structures when analyzing the social
bias of stories. We propose a computational pipeline that automatically
extracts a story's temporal narrative verb-based event chain for each of its
characters as well as character attributes such as gender. We also present a
verb-based event annotation scheme that can facilitate bias analysis by
including categories such as those that align with traditional stereotypes.
Through a case study analyzing gender bias in fairy tales, we demonstrate that
our framework can reveal bias in not only the unigram verb-based events in
which female and male characters participate but also in the temporal narrative
order of such event participation.Abstract
Accepted for publication at NeurIPS 2024, 34 Pages, 9 Figures
This paper examines the issue of fairness in the estimation of graphical
models (GMs), particularly Gaussian, Covariance, and Ising models. These models
play a vital role in understanding complex relationships in high-dimensional
data. However, standard GMs can result in biased outcomes, especially when the
underlying data involves sensitive characteristics or protected groups. To
address this, we introduce a comprehensive framework designed to reduce bias in
the estimation of GMs related to protected attributes. Our approach involves
the integration of the pairwise graph disparity error and a tailored loss
function into a nonsmooth multi-objective optimization problem, striving to
achieve fairness across different sensitive groups while maintaining the
effectiveness of the GMs. Experimental evaluations on synthetic and real-world
datasets demonstrate that our framework effectively mitigates bias without
undermining GMs' performance.Abstract
Finite-Sample and Distribution-Free Fair Classification: Optimal
Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness
arXiv:2410.16477v2 »Full PDF »Algorithmic fairness in machine learning has recently garnered significant
attention. However, two pressing challenges remain: (1) The fairness guarantees
of existing fair classification methods often rely on specific data
distribution assumptions and large sample sizes, which can lead to fairness
violations when the sample size is moderate-a common situation in practice. (2)
Due to legal and societal considerations, using sensitive group attributes
during decision-making (referred to as the group-blind setting) may not always
be feasible.
In this work, we quantify the impact of enforcing algorithmic fairness and
group-blindness in binary classification under group fairness constraints.
Specifically, we propose a unified framework for fair classification that
provides distribution-free and finite-sample fairness guarantees with
controlled excess risk. This framework is applicable to various group fairness
notions in both group-aware and group-blind scenarios. Furthermore, we
establish a minimax lower bound on the excess risk, showing the minimax
optimality of our proposed algorithm up to logarithmic factors. Through
extensive simulation studies and real data analysis, we further demonstrate the
superior performance of our algorithm compared to existing methods, and provide
empirical support for our theoretical findings.Abstract
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision
Language Models
Artificial intelligence has significantly impacted medical applications,
particularly with the advent of Medical Large Vision Language Models
(Med-LVLMs), sparking optimism for the future of automated and personalized
healthcare. However, the trustworthiness of Med-LVLMs remains unverified,
posing significant risks for future model deployment. In this paper, we
introduce CARES and aim to comprehensively evaluate the Trustworthiness of
Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs
across five dimensions, including trustfulness, fairness, safety, privacy, and
robustness. CARES comprises about 41K question-answer pairs in both closed and
open-ended formats, covering 16 medical image modalities and 27 anatomical
regions. Our analysis reveals that the models consistently exhibit concerns
regarding trustworthiness, often displaying factual inaccuracies and failing to
maintain fairness across different demographic groups. Furthermore, they are
vulnerable to attacks and demonstrate a lack of privacy awareness. We publicly
release our benchmark and code in https://cares-ai.github.io/.Abstract
CIDGMed: Causal Inference-Driven Medication Recommendation with Enhanced
Dual-Granularity Learning
arXiv:2403.00880v2 »Full PDF »Medication recommendation aims to integrate patients' long-term health
records to provide accurate and safe medication combinations for specific
health states. Existing methods often fail to deeply explore the true causal
relationships between diseases/procedures and medications, resulting in biased
recommendations. Additionally, in medication representation learning, the
relationships between information at different granularities of medications,
coarse-grained (medication itself) and fine-grained (molecular level), are not
effectively integrated, leading to biases in representation learning. To
address these limitations, we propose the Causal Inference-driven
Dual-Granularity Medication Recommendation method (CIDGMed). Our approach
leverages causal inference to uncover the relationships between
diseases/procedures and medications, thereby enhancing the rationality and
interpretability of recommendations. By integrating coarse-grained medication
effects with fine-grained molecular structure information, CIDGMed provides a
comprehensive representation of medications. Additionally, we employ a bias
correction model during the prediction phase to further refine recommendations,
ensuring both accuracy and safety. Through extensive experiments, CIDGMed
significantly outperforms current state-of-the-art models across multiple
metrics, achieving a 2.54% increase in accuracy, a 3.65% reduction in side
effects, and a 39.42% improvement in time efficiency. Additionally, we
demonstrate the rationale of CIDGMed through a case study.Abstract
Demystifying Large Language Models for Medicine: A Primer
arXiv:2410.18856v1 »Full PDF »Large language models (LLMs) represent a transformative class of AI tools
capable of revolutionizing various aspects of healthcare by generating
human-like responses across diverse contexts and adapting to novel tasks
following human instructions. Their potential application spans a broad range
of medical tasks, such as clinical documentation, matching patients to clinical
trials, and answering medical questions. In this primer paper, we propose an
actionable guideline to help healthcare professionals more efficiently utilize
LLMs in their work, along with a set of best practices. This approach consists
of several main phases, including formulating the task, choosing LLMs, prompt
engineering, fine-tuning, and deployment. We start with the discussion of
critical considerations in identifying healthcare tasks that align with the
core capabilities of LLMs and selecting models based on the selected task and
data, performance requirements, and model interface. We then review the
strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs
to specialized medical tasks. Deployment considerations, including regulatory
compliance, ethical guidelines, and continuous monitoring for fairness and
bias, are also discussed. By providing a structured step-by-step methodology,
this tutorial aims to equip healthcare professionals with the tools necessary
to effectively integrate LLMs into clinical practice, ensuring that these
powerful technologies are applied in a safe, reliable, and impactful manner.Abstract
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing
Large Language Models in Healthcare
arXiv:2410.18460v1 »Full PDF »Large Language Models (LLMs) have gained significant attention in the medical
domain for their human-level capabilities, leading to increased efforts to
explore their potential in various healthcare applications. However, despite
such a promising future, there are multiple challenges and obstacles that
remain for their real-world uses in practical settings. This work discusses key
challenges for LLMs in medical applications from four unique aspects:
operational vulnerabilities, ethical and social considerations, performance and
assessment difficulties, and legal and regulatory compliance. Addressing these
challenges is crucial for leveraging LLMs to their full potential and ensuring
their responsible integration into healthcare.Abstract
Conditional Language Policy: A General Framework for Steerable
Multi-Objective Finetuning
Reward-based finetuning is crucial for aligning language policies with
intended behaviors (e.g., creativity and safety). A key challenge is to develop
steerable language models that trade-off multiple (conflicting) objectives in a
flexible and efficient manner. This paper presents Conditional Language Policy
(CLP), a general framework for finetuning language models on multiple
objectives. Building on techniques from multi-task training and
parameter-efficient finetuning, CLP learn steerable models that effectively
trade-off conflicting objectives at inference time. Notably, this does not
require training or maintaining multiple models to achieve different trade-offs
between the objectives. Through extensive experiments and ablations on two
summarization datasets, we show that CLP learns steerable language models that
outperform and Pareto-dominate the existing approaches for multi-objective
finetuning.Abstract
OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables
Coordination-Aware Autonomous Vehicles
arXiv:2410.18112v1 »Full PDF »Coordination among connected and autonomous vehicles (CAVs) is advancing due
to developments in control and communication technologies. However, much of the
current work is based on oversimplified and unrealistic task-specific
assumptions, which may introduce vulnerabilities. This is critical because CAVs
not only interact with their environment but are also integral parts of it.
Insufficient exploration can result in policies that carry latent risks,
highlighting the need for methods that explore the environment both extensively
and efficiently. This work introduces OPTIMA, a novel distributed reinforcement
learning framework for cooperative autonomous vehicle tasks. OPTIMA alternates
between thorough data sampling from environmental interactions and multi-agent
reinforcement learning algorithms to optimize CAV cooperation, emphasizing both
safety and efficiency. Our goal is to improve the generality and performance of
CAVs in highly complex and crowded scenarios. Furthermore, the industrial-scale
distributed training system easily adapts to different algorithms, reward
functions, and strategies.Abstract
A Two-Stage Proactive Dialogue Generator for Efficient Clinical
Information Collection Using Large Language Model
Efficient patient-doctor interaction is among the key factors for a
successful disease diagnosis. During the conversation, the doctor could query
complementary diagnostic information, such as the patient's symptoms, previous
surgery, and other related information that goes beyond medical evidence data
(test results) to enhance disease diagnosis. However, this procedure is usually
time-consuming and less-efficient, which can be potentially optimized through
computer-assisted systems. As such, we propose a diagnostic dialogue system to
automate the patient information collection procedure. By exploiting medical
history and conversation logic, our conversation agents, particularly the
doctor agent, can pose multi-round clinical queries to effectively collect the
most relevant disease diagnostic information. Moreover, benefiting from our
two-stage recommendation structure, carefully designed ranking criteria, and
interactive patient agent, our model is able to overcome the under-exploration
and non-flexible challenges in dialogue generation. Our experimental results on
a real-world medical conversation dataset show that our model can generate
clinical queries that mimic the conversation style of real doctors, with
efficient fluency, professionalism, and safety, while effectively collecting
relevant disease diagnostic information.Abstract