arXiv:2411.03814v1 »Full PDF »Large Language Models (LLMs) demonstrate outstanding performance in their
reservoir of knowledge and understanding capabilities, but they have also been
shown to be prone to illegal or unethical reactions when subjected to jailbreak
attacks. To ensure their responsible deployment in critical applications, it is
crucial to understand the safety capabilities and vulnerabilities of LLMs.
Previous works mainly focus on jailbreak in single-round dialogue, overlooking
the potential jailbreak risks in multi-round dialogues, which are a vital way
humans interact with and extract information from LLMs. Some studies have
increasingly concentrated on the risks associated with jailbreak in multi-round
dialogues. These efforts typically involve the use of manually crafted
templates or prompt engineering techniques. However, due to the inherent
complexity of multi-round dialogues, their jailbreak performance is limited. To
solve this problem, we propose a novel multi-round dialogue jailbreaking agent,
emphasizing the importance of stealthiness in identifying and mitigating
potential threats to human values posed by LLMs. We propose a risk
decomposition strategy that distributes risks across multiple rounds of queries
and utilizes psychological strategies to enhance attack strength. Extensive
experiments show that our proposed method surpasses other attack methods and
achieves state-of-the-art attack success rate. We will make the corresponding
code and dataset available for future research. The code will be released soon.Abstract
PanGu-Bot: Efficient Generative Dialogue Pre-training from Pre-trained
Language Model
Update model and results; add comparison with EVA2.0
In this paper, we introduce PanGu-Bot, a Chinese pre-trained open-domain
dialogue generation model based on a large pre-trained language model (PLM)
PANGU-alpha (Zeng et al.,2021). Different from other pre-trained dialogue
models trained over a massive amount of dialogue data from scratch, we aim to
build a powerful dialogue model with relatively fewer data and computation
costs by inheriting valuable language capabilities and knowledge from PLMs. To
this end, we train PanGu-Bot from the large PLM PANGU-alpha, which has been
proven well-performed on a variety of Chinese natural language tasks. We
investigate different aspects of responses generated by PanGu-Bot, including
response quality, knowledge, and safety. We show that PanGu-Bot outperforms
state-of-the-art Chinese dialogue systems (CDIALGPT (Wang et al., 2020), EVA
(Zhou et al., 2021), EVA2.0 (Gu et al., 2022)) w.r.t. the above three aspects.
We also demonstrate that PanGu-Bot can be easily deployed to generate emotional
responses without further training. Throughout our empirical analysis, we also
point out that the PanGu-Bot response quality, knowledge correctness, and
safety are still far from perfect, and further explorations are indispensable
to building reliable and smart dialogue systems. Our model and code will be
available at
https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/PanGu-Bot
soon.Abstract
CDR: Customizable Density Ratios of Strong-over-weak LLMs for Preference
Annotation
arXiv:2411.02481v2 »Full PDF »Preference tuning of large language models (LLMs) relies on high-quality
human preference data, which is often expensive and time-consuming to gather.
While existing methods can use trained reward models or proprietary model as
judges for preference annotation, they have notable drawbacks: training reward
models remain dependent on initial human data, and using proprietary model
imposes license restrictions that inhibits commercial usage. In this paper, we
introduce customized density ratio (CDR), a training-free and highly effective
method that leverages off-the-shelf LLMs for preference data annotation. Our
approach uses the log-density ratio between a better-aligned LLM and a less
aligned LLM as a reward signal. We explores 221 different LLMs pairs and
empirically demonstrate that increasing the performance gap between paired LLMs
correlates with better reward generalization. Furthermore, we show that
tailoring the density ratio reward function with specific criteria and
preference exemplars enhances performance across domains and within target
areas.
In our experiment using density ratio from a pair of Mistral-7B models, CDR
achieves a RewardBench score of 82.6, outperforming the best trained reward
functions from same model class and demonstrating competitive performance
against SoTA models in Safety (91.0) and Reasoning (88.0) domains. We use CDR
to annotate an on-policy preference dataset with which we preference tune
Llama-3-8B-Instruct with SimPO. Using reward signals from two relatively weak
models, our approach pushes Llama-3-8B to achieve a 37.4% (+15.1%) win rate on
ArenaHard and a 40.7% (+17.8%) win rate on Length-Controlled AlpacaEval 2.0,
along with a score of 8.0 on MT-Bench.Abstract
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with
Next-Patch Prediction
Simulating realistic behaviors of traffic agents is pivotal for efficiently
validating the safety of autonomous driving systems. Existing data-driven
simulators primarily use an encoder-decoder architecture to encode the
historical trajectories before decoding the future. However, the heterogeneity
between encoders and decoders complicates the models, and the manual separation
of historical and future trajectories leads to low data utilization. Given
these limitations, we propose BehaviorGPT, a homogeneous and fully
autoregressive Transformer designed to simulate the sequential behavior of
multiple agents. Crucially, our approach discards the traditional separation
between "history" and "future" by modeling each time step as the "current" one
for motion generation, leading to a simpler, more parameter- and data-efficient
agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3)
to mitigate the negative effects of autoregressive modeling, in which models
are trained to reason at the patch level of trajectories and capture long-range
spatial-temporal interactions. Despite having merely 3M model parameters,
BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a
realism score of 0.7473 and a minADE score of 1.4147, demonstrating its
exceptional performance in traffic agent simulation.Abstract
SIESEF-FusionNet: Spatial Inter-correlation Enhancement and
Spatially-Embedded Feature Fusion Network for LiDAR Point Cloud Semantic
Segmentation
The ambiguity at the boundaries of different semantic classes in point cloud
semantic segmentation often leads to incorrect decisions in intelligent
perception systems, such as autonomous driving. Hence, accurate delineation of
the boundaries is crucial for improving safety in autonomous driving. A novel
spatial inter-correlation enhancement and spatially-embedded feature fusion
network (SIESEF-FusionNet) is proposed in this paper, enhancing spatial
inter-correlation by combining inverse distance weighting and angular
compensation to extract more beneficial spatial information without causing
redundancy. Meanwhile, a new spatial adaptive pooling module is also designed,
embedding enhanced spatial information into semantic features for strengthening
the context-awareness of semantic features. Experimental results demonstrate
that 83.7% mIoU and 97.8% OA are achieved by SIESEF-FusionNet on the Toronto3D
dataset, with performance superior to other baseline methods. A value of 61.1%
mIoU is reached on the semanticKITTI dataset, where a marked improvement in
segmentation performance is observed. In addition, the effectiveness and
plug-and-play capability of the proposed modules are further verified through
ablation studies.Abstract
LongSafetyBench: Long-Context LLMs Struggle with Safety Issues
arXiv:2411.06899v1 »Full PDF »With the development of large language models (LLMs), the sequence length of
these models continues to increase, drawing significant attention to
long-context language models. However, the evaluation of these models has been
primarily limited to their capabilities, with a lack of research focusing on
their safety. Existing work, such as ManyShotJailbreak, has to some extent
demonstrated that long-context language models can exhibit safety concerns.
However, the methods used are limited and lack comprehensiveness. In response,
we introduce \textbf{LongSafetyBench}, the first benchmark designed to
objectively and comprehensively evaluate the safety of long-context models.
LongSafetyBench consists of 10 task categories, with an average length of
41,889 words. After testing eight long-context language models on
LongSafetyBench, we found that existing models generally exhibit insufficient
safety capabilities. The proportion of safe responses from most mainstream
long-context LLMs is below 50\%. Moreover, models' safety performance in
long-context scenarios does not always align with that in short-context
scenarios. Further investigation revealed that long-context models tend to
overlook harmful content within lengthy texts. We also proposed a simple yet
effective solution, allowing open-source models to achieve performance
comparable to that of top-tier closed-source models. We believe that
LongSafetyBench can serve as a valuable benchmark for evaluating the safety
capabilities of long-context language models. We hope that our work will
encourage the broader community to pay attention to the safety of long-context
models and contribute to the development of solutions to improve the safety of
long-context LLMs.Abstract
The Multiple Dimensions of Spuriousness in Machine Learning
arXiv:2411.04696v2 »Full PDF »Learning correlations from data forms the foundation of today's machine
learning (ML) and artificial intelligence (AI) research. While such an approach
enables the automatic discovery of patterned relationships within big data
corpora, it is susceptible to failure modes when unintended correlations are
captured. This vulnerability has expanded interest in interrogating
spuriousness, often critiqued as an impediment to model performance, fairness,
and robustness. In this article, we trace deviations from the conventional
definition of statistical spuriousness-which denotes a non-causal observation
arising from either coincidence or confounding variables-to articulate how ML
researchers make sense of spuriousness in practice. Drawing on a broad survey
of ML literature, we conceptualize the "multiple dimensions of spuriousness,"
encompassing: relevance ("Models should only use correlations that are relevant
to the task."), generalizability ("Models should only use correlations that
generalize to unseen data"), human-likeness ("Models should only use
correlations that a human would use to perform the same task"), and harmfulness
("Models should only use correlations that are not harmful"). These dimensions
demonstrate that ML spuriousness goes beyond the causal/non-causal dichotomy
and that the disparate interpretative paths researchers choose could
meaningfully influence the trajectory of ML development. By underscoring how a
fundamental problem in ML is contingently negotiated in research contexts, we
contribute to ongoing debates about responsible practices in AI development.Abstract
RoCar: A Relationship Network-based Evaluation Method for Large Language
Models
arXiv:2307.15997v2 »Full PDF »Large language models (LLMs) have received increasing attention. However, due
to the complexity of its capabilities, how to rationally evaluate the
capabilities of LLMs is still a task to be solved. We propose the RoCar method,
which utilizes the defined basic schemas to randomly construct a task graph and
generates natural language evaluation tasks based on the task graph to evaluate
the reasoning and memory abilities of LLMs respectively. Due to the very large
randomness of the task construction process, it is possible to ensure that none
of the LLMs to be tested has directly learned the evaluation tasks,
guaranteeing the fairness of the evaluation method.Abstract
A Survey on Large Language Models for Code Generation
arXiv:2406.00515v2 »Full PDF »Large Language Models (LLMs) have garnered remarkable advancements across
diverse code-related tasks, known as Code LLMs, particularly in code generation
that generates source code with LLM from natural language descriptions. This
burgeoning field has captured significant interest from both academic
researchers and industry professionals due to its practical significance in
software development, e.g., GitHub Copilot. Despite the active exploration of
LLMs for a variety of code tasks, either from the perspective of natural
language processing (NLP) or software engineering (SE) or both, there is a
noticeable absence of a comprehensive and up-to-date literature review
dedicated to LLM for code generation. In this survey, we aim to bridge this gap
by providing a systematic literature review that serves as a valuable reference
for researchers investigating the cutting-edge progress in LLMs for code
generation. We introduce a taxonomy to categorize and discuss the recent
developments in LLMs for code generation, covering aspects such as data
curation, latest advances, performance evaluation, ethical implications,
environmental impact, and real-world applications. In addition, we present a
historical overview of the evolution of LLMs for code generation and offer an
empirical comparison using the HumanEval, MBPP, and BigCodeBench benchmarks
across various levels of difficulty and types of programming tasks to highlight
the progressive enhancements in LLM capabilities for code generation. We
identify critical challenges and promising opportunities regarding the gap
between academia and practical development. Furthermore, we have established a
dedicated resource GitHub page (https://github.com/juyongjiang/CodeLLMSurvey)
to continuously document and disseminate the most recent advances in the field.Abstract
Variational Imbalanced Regression: Fair Uncertainty Quantification via
Probabilistic Smoothing
Existing regression models tend to fall short in both accuracy and
uncertainty estimation when the label distribution is imbalanced. In this
paper, we propose a probabilistic deep learning model, dubbed variational
imbalanced regression (VIR), which not only performs well in imbalanced
regression but naturally produces reasonable uncertainty estimation as a
byproduct. Different from typical variational autoencoders assuming I.I.D.
representations (a data point's representation is not directly affected by
other data points), our VIR borrows data with similar regression labels to
compute the latent representation's variational distribution; furthermore,
different from deterministic regression models producing point estimates, VIR
predicts the entire normal-inverse-gamma distributions and modulates the
associated conjugate distributions to impose probabilistic reweighting on the
imbalanced data, thereby providing better uncertainty estimation. Experiments
in several real-world datasets show that our VIR can outperform
state-of-the-art imbalanced regression models in terms of both accuracy and
uncertainty estimation. Code will soon be available at
https://github.com/Wang-ML-Lab/variational-imbalanced-regression.Abstract