Quantum computing promises to enhance machine learning and artificial
intelligence. Different quantum algorithms have been proposed to improve a wide
spectrum of machine learning tasks. Yet, recent theoretical works show that,
similar to traditional classifiers based on deep classical neural networks,
quantum classifiers would suffer from the vulnerability problem: adding tiny
carefully-crafted perturbations to the legitimate original data samples would
facilitate incorrect predictions at a notably high confidence level. This will
pose serious problems for future quantum machine learning applications in
safety and security-critical scenarios. Here, we report the first experimental
demonstration of quantum adversarial learning with programmable superconducting
qubits. We train quantum classifiers, which are built upon variational quantum
circuits consisting of ten transmon qubits featuring average lifetimes of 150
μs, and average fidelities of simultaneous single- and two-qubit gates
above 99.94% and 99.4% respectively, with both real-life images (e.g., medical
magnetic resonance imaging scans) and quantum data. We demonstrate that these
well-trained classifiers (with testing accuracy up to 99%) can be practically
deceived by small adversarial perturbations, whereas an adversarial training
process would significantly enhance their robustness to such perturbations. Our
results reveal experimentally a crucial vulnerability aspect of quantum
learning systems under adversarial scenarios and demonstrate an effective
defense strategy against adversarial attacks, which provide a valuable guide
for quantum artificial intelligence applications with both near-term and future
quantum devices.Abstract
Unsupervised Abnormal Stop Detection for Long Distance Coaches with
Low-Frequency GPS
arXiv:2411.04422v1 »Full PDF »In our urban life, long distance coaches supply a convenient yet economic
approach to the transportation of the public. One notable problem is to
discover the abnormal stop of the coaches due to the important reason, i.e.,
illegal pick up on the way which possibly endangers the safety of passengers.
It has become a pressing issue to detect the coach abnormal stop with
low-quality GPS. In this paper, we propose an unsupervised method that helps
transportation managers to efficiently discover the Abnormal Stop Detection
(ASD) for long distance coaches. Concretely, our method converts the ASD
problem into an unsupervised clustering framework in which both the normal stop
and the abnormal one are decomposed. Firstly, we propose a stop duration model
for the low frequency GPS based on the assumption that a coach changes speed
approximately in a linear approach. Secondly, we strip the abnormal stops from
the normal stop points by the low rank assumption. The proposed method is
conceptually simple yet efficient, by leveraging low rank assumption to handle
normal stop points, our approach enables domain experts to discover the ASD for
coaches, from a case study motivated by traffic managers. Datset and code are
publicly available at: https://github.com/pangjunbiao/IPPs.Abstract
EXAGREE: Towards Explanation Agreement in Explainable Machine Learning
arXiv:2411.01956v1 »Full PDF »Explanations in machine learning are critical for trust, transparency, and
fairness. Yet, complex disagreements among these explanations limit the
reliability and applicability of machine learning models, especially in
high-stakes environments. We formalize four fundamental ranking-based
explanation disagreement problems and introduce a novel framework, EXplanation
AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine
learning, particularly from stakeholder-centered perspectives. Our approach
leverages a Rashomon set for attribution predictions and then optimizes within
this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that
minimize disagreement with diverse stakeholder needs while maintaining
predictive performance. Rigorous empirical analysis on synthetic and real-world
datasets demonstrates that EXAGREE reduces explanation disagreement and
improves fairness across subgroups in various domains. EXAGREE not only
provides researchers with a new direction for studying explanation disagreement
problems but also offers data scientists a tool for making better-informed
decisions in practical applications.Abstract
GLBench: A Comprehensive Benchmark for Graph with Large Language Models
arXiv:2407.07457v4 »Full PDF »The emergence of large language models (LLMs) has revolutionized the way we
interact with graphs, leading to a new paradigm called GraphLLM. Despite the
rapid development of GraphLLM methods in recent years, the progress and
understanding of this field remain unclear due to the lack of a benchmark with
consistent experimental protocols. To bridge this gap, we introduce GLBench,
the first comprehensive benchmark for evaluating GraphLLM methods in both
supervised and zero-shot scenarios. GLBench provides a fair and thorough
evaluation of different categories of GraphLLM methods, along with traditional
baselines such as graph neural networks. Through extensive experiments on a
collection of real-world datasets with consistent data processing and splitting
strategies, we have uncovered several key findings. Firstly, GraphLLM methods
outperform traditional baselines in supervised settings, with LLM-as-enhancers
showing the most robust performance. However, using LLMs as predictors is less
effective and often leads to uncontrollable output issues. We also notice that
no clear scaling laws exist for current GraphLLM methods. In addition, both
structures and semantics are crucial for effective zero-shot transfer, and our
proposed simple baseline can even outperform several models tailored for
zero-shot scenarios. The data and code of the benchmark can be found at
https://github.com/NineAbyss/GLBench.Abstract
GPT4Video: A Unified Multimodal Large Language Model for
lnstruction-Followed Understanding and Safety-Aware Generation
While the recent advances in Multimodal Large Language Models (MLLMs)
constitute a significant leap forward in the field, these models are
predominantly confined to the realm of input-side multimodal comprehension,
lacking the capacity for multimodal content generation. To fill this gap, we
present GPT4Video, a unified multi-model framework that empowers Large Language
Models (LLMs) with the capability of both video understanding and generation.
Specifically, we develop an instruction-following-based approach integrated
with the stable diffusion generative model, which has demonstrated to
effectively and securely handle video generation scenarios. GPT4Video offers
the following benefits: 1) It exhibits impressive capabilities in both video
understanding and generation scenarios. For example, GPT4Video outperforms
Valley by 11.8\% on the Video Question Answering task, and surpasses NExt-GPT
by 2.3\% on the Text to Video generation task. 2) it endows the LLM/MLLM with
video generation capabilities without requiring additional training parameters
and can flexibly interface with a wide range of models to perform video
generation. 3) it maintains a safe and healthy conversation not only in
output-side but also the input side in an end-to-end manner. Qualitative and
qualitative experiments demonstrate that GPT4Video holds the potential to
function as a effective, safe and Humanoid-like video assistant that can handle
both video understanding and generation scenarios.Abstract
arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
Deconstructing The Ethics of Large Language Models from Long-standing
Issues to New-emerging Dilemmas: A Survey
arXiv:2406.05392v2 »Full PDF »Large Language Models (LLMs) have achieved unparalleled success across
diverse language modeling tasks in recent years. However, this progress has
also intensified ethical concerns, impacting the deployment of LLMs in everyday
contexts. This paper provides a comprehensive survey of ethical challenges
associated with LLMs, from longstanding issues such as copyright infringement,
systematic bias, and data privacy, to emerging problems like truthfulness and
social norms. We critically analyze existing research aimed at understanding,
examining, and mitigating these ethical risks. Our survey underscores
integrating ethical standards and societal values into the development of LLMs,
thereby guiding the development of responsible and ethically aligned language
models.Abstract
arXiv:2410.04555v1 »Full PDF »Data attribution methods aim to quantify the influence of individual training
samples on the prediction of artificial intelligence (AI) models. As training
data plays an increasingly crucial role in the modern development of
large-scale AI models, data attribution has found broad applications in
improving AI performance and safety. However, despite a surge of new data
attribution methods being developed recently, there lacks a comprehensive
library that facilitates the development, benchmarking, and deployment of
different data attribution methods. In this work, we introduce
dattri, an open-source data attribution library that addresses the
above needs. Specifically, dattri highlights three novel design
features. Firstly, dattri proposes a unified and easy-to-use API,
allowing users to integrate different data attribution methods into their
PyTorch-based machine learning pipeline with a few lines of code changed.
Secondly, dattri modularizes low-level utility functions that are
commonly used in data attribution methods, such as Hessian-vector product,
inverse-Hessian-vector product or random projection, making it easier for
researchers to develop new data attribution methods. Thirdly, dattri
provides a comprehensive benchmark framework with pre-trained models and ground
truth annotations for a variety of benchmark settings, including generative AI
settings. We have implemented a variety of state-of-the-art efficient data
attribution methods that can be applied to large-scale neural network models,
and will continuously update the library in the future. Using the developed
dattri library, we are able to perform a comprehensive and fair
benchmark analysis across a wide range of data attribution methods. The source
code of dattri is available at https://github.com/TRAIS-Lab/dattri.Abstract
Toward a Holistic Evaluation of Robustness in CLIP Models
17 pages, 10 figures, extension of NeurIPS'23 work: A Closer Look at
the Robustness of Contrastive...
Contrastive Language-Image Pre-training (CLIP) models have shown significant
potential, particularly in zero-shot classification across diverse distribution
shifts. Building on existing evaluations of overall classification robustness,
this work aims to provide a more comprehensive assessment of CLIP by
introducing several new perspectives. First, we investigate their robustness to
variations in specific visual factors. Second, we assess two critical safety
objectives--confidence uncertainty and out-of-distribution detection--beyond
mere classification accuracy. Third, we evaluate the finesse with which CLIP
models bridge the image and text modalities. Fourth, we extend our examination
to 3D awareness in CLIP models, moving beyond traditional 2D image
understanding. Finally, we explore the interaction between vision and language
encoders within modern large multimodal models (LMMs) that utilize CLIP as the
visual backbone, focusing on how this interaction impacts classification
robustness. In each aspect, we consider the impact of six factors on CLIP
models: model architecture, training distribution, training set size,
fine-tuning, contrastive loss, and test-time prompts. Our study uncovers
several previously unknown insights into CLIP. For instance, the architecture
of the visual encoder in CLIP plays a significant role in their robustness
against 3D corruption. CLIP models tend to exhibit a bias towards shape when
making predictions. Moreover, this bias tends to diminish after fine-tuning on
ImageNet. Vision-language models like LLaVA, leveraging the CLIP vision
encoder, could exhibit benefits in classification performance for challenging
categories over CLIP alone. Our findings are poised to offer valuable guidance
for enhancing the robustness and reliability of CLIP models.Abstract
Marginal Debiased Network for Fair Visual Recognition
arXiv:2401.02150v2 »Full PDF »Deep neural networks (DNNs) are often prone to learn the spurious
correlations between target classes and bias attributes, like gender and race,
inherent in a major portion of training data (bias-aligned samples), thus
showing unfair behavior and arising controversy in the modern pluralistic and
egalitarian society. In this paper, we propose a novel marginal debiased
network (MDN) to learn debiased representations. More specifically, a marginal
softmax loss (MSL) is designed by introducing the idea of margin penalty into
the fairness problem, which assigns a larger margin for bias-conflicting
samples (data without spurious correlations) than for bias-aligned ones, so as
to deemphasize the spurious correlations and improve generalization on unbiased
test criteria. To determine the margins, our MDN is optimized through a meta
learning framework. We propose a meta equalized loss (MEL) to perceive the
model fairness, and adaptively update the margin parameters by
meta-optimization which requires the trained model guided by the optimal
margins should minimize MEL computed on an unbiased meta-validation set.
Extensive experiments on BiasedMNIST, Corrupted CIFAR-10, CelebA and UTK-Face
datasets demonstrate that our MDN can achieve a remarkable performance on
under-represented samples and obtain superior debiased results against the
previous approaches.Abstract