Compressing high-capability Large Language Models (LLMs) has emerged as a
favored strategy for resource-efficient inferences. While state-of-the-art
(SoTA) compression methods boast impressive advancements in preserving benign
task performance, the potential risks of compression in terms of safety and
trustworthiness have been largely neglected. This study conducts the first,
thorough evaluation of three (3) leading LLMs using five (5) SoTA compression
techniques across eight (8) trustworthiness dimensions. Our experiments
highlight the intricate interplay between compression and trustworthiness,
revealing some interesting patterns. We find that quantization is currently a
more effective approach than pruning in achieving efficiency and
trustworthiness simultaneously. For instance, a 4-bit quantized model retains
the trustworthiness of its original counterpart, but model pruning
significantly degrades trustworthiness, even at 50% sparsity. Moreover,
employing quantization within a moderate bit range could unexpectedly improve
certain trustworthiness dimensions such as ethics and fairness. Conversely,
extreme quantization to very low bit levels (3 bits) tends to reduce
trustworthiness significantly. This increased risk cannot be uncovered by
looking at benign performance alone, in turn, mandating comprehensive
trustworthiness evaluation in practice. These findings culminate in practical
recommendations for simultaneously achieving high utility, efficiency, and
trustworthiness in LLMs. Code and models are available at
https://decoding-comp-trust.github.io.Abstract
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models
NeurIPS 2023 Outstanding Paper (Datasets and Benchmarks Track)
Generative Pre-trained Transformer (GPT) models have exhibited exciting
progress in their capabilities, capturing the interest of practitioners and the
public alike. Yet, while the literature on the trustworthiness of GPT models
remains limited, practitioners have proposed employing capable GPT models for
sensitive applications such as healthcare and finance -- where mistakes can be
costly. To this end, this work proposes a comprehensive trustworthiness
evaluation for large language models with a focus on GPT-4 and GPT-3.5,
considering diverse perspectives -- including toxicity, stereotype bias,
adversarial robustness, out-of-distribution robustness, robustness on
adversarial demonstrations, privacy, machine ethics, and fairness. Based on our
evaluations, we discover previously unpublished vulnerabilities to
trustworthiness threats. For instance, we find that GPT models can be easily
misled to generate toxic and biased outputs and leak private information in
both training data and conversation history. We also find that although GPT-4
is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more
vulnerable given jailbreaking system or user prompts, potentially because GPT-4
follows (misleading) instructions more precisely. Our work illustrates a
comprehensive trustworthiness evaluation of GPT models and sheds light on the
trustworthiness gaps. Our benchmark is publicly available at
https://decodingtrust.github.io/ ; our dataset can be previewed at
https://huggingface.co/datasets/AI-Secure/DecodingTrust ; a concise version of
this work is at https://openreview.net/pdf?id=kaHpo8OZw2 .Abstract
Ethics and Governance of Artificial Intelligence: Evidence from a Survey
of Machine Learning Researchers
arXiv:2105.02117v1 »Full PDF »Machine learning (ML) and artificial intelligence (AI) researchers play an
important role in the ethics and governance of AI, including taking action
against what they perceive to be unethical uses of AI (Belfield, 2020; Van
Noorden, 2020). Nevertheless, this influential group's attitudes are not well
understood, which undermines our ability to discern consensuses or
disagreements between AI/ML researchers. To examine these researchers' views,
we conducted a survey of those who published in the top AI/ML conferences (N =
524). We compare these results with those from a 2016 survey of AI/ML
researchers (Grace, Salvatier, Dafoe, Zhang, & Evans, 2018) and a 2018 survey
of the US public (Zhang & Dafoe, 2020). We find that AI/ML researchers place
high levels of trust in international organizations and scientific
organizations to shape the development and use of AI in the public interest;
moderate trust in most Western tech companies; and low trust in national
militaries, Chinese tech companies, and Facebook. While the respondents were
overwhelmingly opposed to AI/ML researchers working on lethal autonomous
weapons, they are less opposed to researchers working on other military
applications of AI, particularly logistics algorithms. A strong majority of
respondents think that AI safety research should be prioritized and that ML
institutions should conduct pre-publication review to assess potential harms.
Being closer to the technology itself, AI/ML re-searchers are well placed to
highlight new risks and develop technical solutions, so this novel attempt to
measure their attitudes has broad relevance. The findings should help to
improve how researchers, private sector executives, and policymakers think
about regulations, governance frameworks, guiding principles, and national and
international governance strategies for AI.Abstract
Accurate real time crime prediction is a fundamental issue for public safety,
but remains a challenging problem for the scientific community. Crime
occurrences depend on many complex factors. Compared to many predictable
events, crime is sparse. At different spatio-temporal scales, crime
distributions display dramatically different patterns. These distributions are
of very low regularity in both space and time. In this work, we adapt the
state-of-the-art deep learning spatio-temporal predictor, ST-ResNet [Zhang et
al, AAAI, 2017], to collectively predict crime distribution over the Los
Angeles area. Our models are two staged. First, we preprocess the raw crime
data. This includes regularization in both space and time to enhance
predictable signals. Second, we adapt hierarchical structures of residual
convolutional units to train multi-factor crime prediction models. Experiments
over a half year period in Los Angeles reveal highly accurate predictive power
of our models.Abstract
Hire Me or Not? Examining Language Model's Behavior with Occupation
Attributes
With the impressive performance in various downstream tasks, large language
models (LLMs) have been widely integrated into production pipelines, like
recruitment and recommendation systems. A known issue of models trained on
natural language data is the presence of human biases, which can impact the
fairness of the system. This paper investigates LLMs' behavior with respect to
gender stereotypes, in the context of occupation decision making. Our framework
is designed to investigate and quantify the presence of gender stereotypes in
LLMs' behavior via multi-round question answering. Inspired by prior works, we
construct a dataset by leveraging a standard occupation classification
knowledge base released by authoritative agencies. We tested three LLMs
(RoBERTa-large, GPT-3.5-turbo, and Llama2-70b-chat) and found that all models
exhibit gender stereotypes analogous to human biases, but with different
preferences. The distinct preferences of GPT-3.5-turbo and Llama2-70b-chat may
imply the current alignment methods are insufficient for debiasing and could
introduce new biases contradicting the traditional gender stereotypes.Abstract
Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective
Optimization
The goal of multi-objective optimization (MOO) is to learn under multiple,
potentially conflicting, objectives. One widely used technique to tackle MOO is
through linear scalarization, where one fixed preference vector is used to
combine the objectives into a single scalar value for optimization. However,
recent work (Hu et al., 2024) has shown linear scalarization often fails to
capture the non-convex regions of the Pareto Front, failing to recover the
complete set of Pareto optimal solutions. In light of the above limitations,
this paper focuses on Tchebycheff scalarization that optimizes for the
worst-case objective. In particular, we propose an online mirror descent
algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that
OMD-TCH enjoys a convergence rate of O(√logm/T) where m is the
number of objectives and T is the number of iteration rounds. We also propose
a novel adaptive online-to-batch conversion scheme that significantly improves
the practical performance of OMD-TCH while maintaining the same convergence
guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive
conversion scheme on both synthetic problems and federated learning tasks under
fairness constraints, showing state-of-the-art performance.Abstract
SIESEF-FusionNet: Spatial Inter-correlation Enhancement and
Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic
Segmentation
The ambiguity at the boundaries of different semantic classes in point cloud
semantic segmentation often leads to incorrect decisions in intelligent
perception systems, such as autonomous driving. Hence, accurate delineation of
the boundaries is crucial for improving safety in autonomous driving. A novel
spatial inter-correlation enhancement and spatially-embedded feature fusion
network (SIESEF-FusionNet) is proposed in this paper, enhancing spatial
inter-correlation by combining inverse distance weighting and angular
compensation to extract more beneficial spatial information without causing
redundancy. Meanwhile, a new spatial adaptive pooling module is also designed,
embedding enhanced spatial information into semantic features for strengthening
the context-awareness of semantic features. Experimental results demonstrate
that 83.7% mIoU and 97.8% OA are achieved by SIESEF-FusionNet on the Toronto3D
dataset, with performance superior to other baseline methods. A value of 61.1%
mIoU is reached on the semanticKITTI dataset, where a marked improvement in
segmentation performance is observed. In addition, the effectiveness and
plug-and-play capability of the proposed modules are further verified through
ablation studies.Abstract
Revisiting, Benchmarking and Understanding Unsupervised Graph Domain
Adaptation
Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of
knowledge from a label-rich source graph to an unlabeled target graph under
domain discrepancies. Despite the proliferation of methods designed for this
emerging task, the lack of standard experimental settings and fair performance
comparisons makes it challenging to understand which and when models perform
well across different scenarios. To fill this gap, we present the first
comprehensive benchmark for unsupervised graph domain adaptation named
GDABench, which encompasses 16 algorithms across 5 datasets with 74 adaptation
tasks. Through extensive experiments, we observe that the performance of
current UGDA models varies significantly across different datasets and
adaptation scenarios. Specifically, we recognize that when the source and
target graphs face significant distribution shifts, it is imperative to
formulate strategies to effectively address and mitigate graph structural
shifts. We also find that with appropriate neighbourhood aggregation
mechanisms, simple GNN variants can even surpass state-of-the-art UGDA
baselines. To facilitate reproducibility, we have developed an easy-to-use
library PyGDA for training and evaluating existing UGDA methods, providing a
standardized platform in this community. Our source codes and datasets can be
found at: https://github.com/pygda-team/pygda.Abstract
LongSafetyBench: Long-Context LLMs Struggle with Safety Issues
arXiv:2411.06899v1 »Full PDF »With the development of large language models (LLMs), the sequence length of
these models continues to increase, drawing significant attention to
long-context language models. However, the evaluation of these models has been
primarily limited to their capabilities, with a lack of research focusing on
their safety. Existing work, such as ManyShotJailbreak, has to some extent
demonstrated that long-context language models can exhibit safety concerns.
However, the methods used are limited and lack comprehensiveness. In response,
we introduce \textbf{LongSafetyBench}, the first benchmark designed to
objectively and comprehensively evaluate the safety of long-context models.
LongSafetyBench consists of 10 task categories, with an average length of
41,889 words. After testing eight long-context language models on
LongSafetyBench, we found that existing models generally exhibit insufficient
safety capabilities. The proportion of safe responses from most mainstream
long-context LLMs is below 50\%. Moreover, models' safety performance in
long-context scenarios does not always align with that in short-context
scenarios. Further investigation revealed that long-context models tend to
overlook harmful content within lengthy texts. We also proposed a simple yet
effective solution, allowing open-source models to achieve performance
comparable to that of top-tier closed-source models. We believe that
LongSafetyBench can serve as a valuable benchmark for evaluating the safety
capabilities of long-context language models. We hope that our work will
encourage the broader community to pay attention to the safety of long-context
models and contribute to the development of solutions to improve the safety of
long-context LLMs.Abstract
Federated Learning (FL) employs a training approach to address scenarios
where users' data cannot be shared across clients. Achieving fairness in FL is
imperative since training data in FL is inherently geographically distributed
among diverse user groups. Existing research on fairness predominantly assumes
access to the entire training data, making direct transfer to FL challenging.
However, the limited existing research on fairness in FL does not effectively
address two key challenges, i.e., (CH1) Current methods fail to deal with the
inconsistency between fair optimization results obtained with surrogate
functions and fair classification results. (CH2) Directly aggregating local
fair models does not always yield a globally fair model due to non Identical
and Independent data Distributions (non-IID) among clients. To address these
challenges, we propose a Wasserstein Fair Federated Learning framework, namely
WassFFed. To tackle CH1, we ensure that the outputs of local models, rather
than the loss calculated with surrogate functions or classification results
with a threshold, remain independent of various user groups. To resolve CH2, we
employ a Wasserstein barycenter calculation of all local models' outputs for
each user group, bringing local model outputs closer to the global output
distribution to ensure consistency between the global model and local models.
We conduct extensive experiments on three real-world datasets, demonstrating
that WassFFed outperforms existing approaches in striking a balance between
accuracy and fairness.Abstract