arXiv:2201.13001v8 »Full PDF »Deep discriminative approaches like random forests and deep neural networks
have recently found applications in many important real-world scenarios.
However, deploying these learning algorithms in safety-critical applications
raises concerns, particularly when it comes to ensuring confidence calibration
for both in-distribution and out-of-distribution data points. Many popular
methods for in-distribution (ID) calibration, such as isotonic and Platt's
sigmoidal regression, exhibit excellent ID calibration performance. However,
these methods are not calibrated for the entire feature space, leading to
overconfidence in the case of out-of-distribution (OOD) samples. On the other
end of the spectrum, existing out-of-distribution (OOD) calibration methods
generally exhibit poor in-distribution (ID) calibration. In this paper, we
address ID and OOD calibration problems jointly. We leveraged the fact that
deep models, including both random forests and deep-nets, learn internal
representations which are unions of polytopes with affine activation functions
to conceptualize them both as partitioning rules of the feature space. We
replace the affine function in each polytope populated by the training data
with a Gaussian kernel. Our experiments on both tabular and vision benchmarks
show that the proposed approaches obtain well-calibrated posteriors while
mostly preserving or improving the classification accuracy of the original
algorithm for ID region, and extrapolate beyond the training data to handle OOD
inputs appropriately.Abstract
Abstraction and Refinement: Towards Scalable and Exact Verification of
Neural Networks
arXiv:2207.00759v1 »Full PDF »As a new programming paradigm, deep neural networks (DNNs) have been
increasingly deployed in practice, but the lack of robustness hinders their
applications in safety-critical domains. While there are techniques for
verifying DNNs with formal guarantees, they are limited in scalability and
accuracy. In this paper, we present a novel abstraction-refinement approach for
scalable and exact DNN verification. Specifically, we propose a novel
abstraction to break down the size of DNNs by over-approximation. The result of
verifying the abstract DNN is always conclusive if no spurious counterexample
is reported. To eliminate spurious counterexamples introduced by abstraction,
we propose a novel counterexample-guided refinement that refines the abstract
DNN to exclude a given spurious counterexample while still over-approximating
the original one. Our approach is orthogonal to and can be integrated with many
existing verification techniques. For demonstration, we implement our approach
using two promising and exact tools Marabou and Planet as the underlying
verification engines, and evaluate on widely-used benchmarks ACAS Xu, MNIST and
CIFAR-10. The results show that our approach can boost their performance by
solving more problems and reducing up to 86.3% and 78.0% verification time,
respectively. Compared to the most relevant abstraction-refinement approach,
our approach is 11.6-26.6 times faster.Abstract
CDR: Customizable Density Ratios of Strong-over-weak LLMs for Preference
Annotation
arXiv:2411.02481v2 »Full PDF »Preference tuning of large language models (LLMs) relies on high-quality
human preference data, which is often expensive and time-consuming to gather.
While existing methods can use trained reward models or proprietary model as
judges for preference annotation, they have notable drawbacks: training reward
models remain dependent on initial human data, and using proprietary model
imposes license restrictions that inhibits commercial usage. In this paper, we
introduce customized density ratio (CDR), a training-free and highly effective
method that leverages off-the-shelf LLMs for preference data annotation. Our
approach uses the log-density ratio between a better-aligned LLM and a less
aligned LLM as a reward signal. We explores 221 different LLMs pairs and
empirically demonstrate that increasing the performance gap between paired LLMs
correlates with better reward generalization. Furthermore, we show that
tailoring the density ratio reward function with specific criteria and
preference exemplars enhances performance across domains and within target
areas.
In our experiment using density ratio from a pair of Mistral-7B models, CDR
achieves a RewardBench score of 82.6, outperforming the best trained reward
functions from same model class and demonstrating competitive performance
against SoTA models in Safety (91.0) and Reasoning (88.0) domains. We use CDR
to annotate an on-policy preference dataset with which we preference tune
Llama-3-8B-Instruct with SimPO. Using reward signals from two relatively weak
models, our approach pushes Llama-3-8B to achieve a 37.4% (+15.1%) win rate on
ArenaHard and a 40.7% (+17.8%) win rate on Length-Controlled AlpacaEval 2.0,
along with a score of 8.0 on MT-Bench.Abstract
LongSafetyBench: Long-Context LLMs Struggle with Safety Issues
arXiv:2411.06899v1 »Full PDF »With the development of large language models (LLMs), the sequence length of
these models continues to increase, drawing significant attention to
long-context language models. However, the evaluation of these models has been
primarily limited to their capabilities, with a lack of research focusing on
their safety. Existing work, such as ManyShotJailbreak, has to some extent
demonstrated that long-context language models can exhibit safety concerns.
However, the methods used are limited and lack comprehensiveness. In response,
we introduce \textbf{LongSafetyBench}, the first benchmark designed to
objectively and comprehensively evaluate the safety of long-context models.
LongSafetyBench consists of 10 task categories, with an average length of
41,889 words. After testing eight long-context language models on
LongSafetyBench, we found that existing models generally exhibit insufficient
safety capabilities. The proportion of safe responses from most mainstream
long-context LLMs is below 50\%. Moreover, models' safety performance in
long-context scenarios does not always align with that in short-context
scenarios. Further investigation revealed that long-context models tend to
overlook harmful content within lengthy texts. We also proposed a simple yet
effective solution, allowing open-source models to achieve performance
comparable to that of top-tier closed-source models. We believe that
LongSafetyBench can serve as a valuable benchmark for evaluating the safety
capabilities of long-context language models. We hope that our work will
encourage the broader community to pay attention to the safety of long-context
models and contribute to the development of solutions to improve the safety of
long-context LLMs.Abstract
CIMRL: Combining IMitation and Reinforcement Learning for Safe
Autonomous Driving
arXiv:2406.08878v4 »Full PDF »Modern approaches to autonomous driving rely heavily on learned components
trained with large amounts of human driving data via imitation learning.
However, these methods require large amounts of expensive data collection and
even then face challenges with safely handling long-tail scenarios and
compounding errors over time. At the same time, pure Reinforcement Learning
(RL) methods can fail to learn performant policies in sparse, constrained, and
challenging-to-define reward settings such as autonomous driving. Both of these
challenges make deploying purely cloned or pure RL policies in safety critical
applications such as autonomous vehicles challenging. In this paper we propose
Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe
reinforcement learning framework that enables training driving policies in
simulation through leveraging imitative motion priors and safety constraints.
CIMRL does not require extensive reward specification and improves on the
closed loop behavior of pure cloning methods. By combining RL and imitation, we
demonstrate that our method achieves state-of-the-art results in closed loop
simulation and real world driving benchmarks.Abstract
arXiv:2411.06428v1 »Full PDF »Machine learning models deployed in sensitive areas such as healthcare must
be interpretable to ensure accountability and fairness. Rule lists (if Age < 35
∧ Priors > 0 then Recidivism = True, else if Next Condition . . . )
offer full transparency, making them well-suited for high-stakes decisions.
However, learning such rule lists presents significant challenges. Existing
methods based on combinatorial optimization require feature pre-discretization
and impose restrictions on rule size. Neuro-symbolic methods use more scalable
continuous optimization yet place similar pre-discretization constraints and
suffer from unstable optimization. To address the existing limitations, we
introduce NeuRules, an end-to-end trainable model that unifies discretization,
rule learning, and rule order into a single differentiable framework. We
formulate a continuous relaxation of the rule list learning problem that
converges to a strict rule list through temperature annealing. NeuRules learns
both the discretizations of individual features, as well as their combination
into conjunctive rules without any pre-processing or restrictions. Extensive
experiments demonstrate that NeuRules consistently outperforms both
combinatorial and neuro-symbolic methods, effectively learning simple and
complex rules, as well as their order, across a wide range of datasets.Abstract
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
arXiv:2411.04905v2 »Full PDF »Large language models (LLMs) for code have become indispensable in various
domains, including code generation, reasoning tasks and agent systems. While
open-access code LLMs are increasingly approaching the performance levels of
proprietary models, high-quality code LLMs suitable for rigorous scientific
investigation, particularly those with reproducible data processing pipelines
and transparent training protocols, remain limited. The scarcity is due to
various challenges, including resource constraints, ethical considerations, and
the competitive advantages of keeping models advanced. To address the gap, we
introduce OpenCoder, a top-tier code LLM that not only achieves performance
comparable to leading models but also serves as an "open cookbook" for the
research community. Unlike most prior efforts, we release not only model
weights and inference code, but also the reproducible training data, complete
data processing pipeline, rigorous experimental ablation results, and detailed
training protocols for open scientific research. Through this comprehensive
release, we identify the key ingredients for building a top-tier code LLM: (1)
code optimized heuristic rules for data cleaning and methods for data
deduplication, (2) recall of text corpus related to code and (3) high-quality
synthetic data in both annealing and supervised fine-tuning stages. By offering
this level of openness, we aim to broaden access to all aspects of a top-tier
code LLM, with OpenCoder serving as both a powerful model and an open
foundation to accelerate research, and enable reproducible advancements in code
AI.Abstract
STAND-Guard: A Small Task-Adaptive Content Moderation Model
Content moderation, the process of reviewing and monitoring the safety of
generated content, is important for development of welcoming online platforms
and responsible large language models. Content moderation contains various
tasks, each with its unique requirements tailored to specific scenarios.
Therefore, it is crucial to develop a model that can be easily adapted to novel
or customized content moderation tasks accurately without extensive model
tuning. This paper presents STAND-GUARD, a Small Task-Adaptive coNtent
moDeration model. The basic motivation is: by performing instruct tuning on
various content moderation tasks, we can unleash the power of small language
models (SLMs) on unseen (out-of-distribution) content moderation tasks. We also
carefully study the effects of training tasks and model size on the efficacy of
cross-task fine-tuning mechanism. Experiments demonstrate STAND-Guard is
comparable to GPT-3.5-Turbo across over 40 public datasets, as well as
proprietary datasets derived from real-world business scenarios. Remarkably,
STAND-Guard achieved nearly equivalent results to GPT-4-Turbo on unseen English
binary classification tasksAbstract
Unsupervised Abnormal Stop Detection for Long Distance Coaches with
Low-Frequency GPS
arXiv:2411.04422v1 »Full PDF »In our urban life, long distance coaches supply a convenient yet economic
approach to the transportation of the public. One notable problem is to
discover the abnormal stop of the coaches due to the important reason, i.e.,
illegal pick up on the way which possibly endangers the safety of passengers.
It has become a pressing issue to detect the coach abnormal stop with
low-quality GPS. In this paper, we propose an unsupervised method that helps
transportation managers to efficiently discover the Abnormal Stop Detection
(ASD) for long distance coaches. Concretely, our method converts the ASD
problem into an unsupervised clustering framework in which both the normal stop
and the abnormal one are decomposed. Firstly, we propose a stop duration model
for the low frequency GPS based on the assumption that a coach changes speed
approximately in a linear approach. Secondly, we strip the abnormal stops from
the normal stop points by the low rank assumption. The proposed method is
conceptually simple yet efficient, by leveraging low rank assumption to handle
normal stop points, our approach enables domain experts to discover the ASD for
coaches, from a case study motivated by traffic managers. Datset and code are
publicly available at: https://github.com/pangjunbiao/IPPs.Abstract
SA3DIP: Segment Any 3D Instance with Potential 3D Priors
arXiv:2411.03819v1 »Full PDF »The proliferation of 2D foundation models has sparked research into adapting
them for open-world 3D instance segmentation. Recent methods introduce a
paradigm that leverages superpoints as geometric primitives and incorporates 2D
multi-view masks from Segment Anything model (SAM) as merging guidance,
achieving outstanding zero-shot instance segmentation results. However, the
limited use of 3D priors restricts the segmentation performance. Previous
methods calculate the 3D superpoints solely based on estimated normal from
spatial coordinates, resulting in under-segmentation for instances with similar
geometry. Besides, the heavy reliance on SAM and hand-crafted algorithms in 2D
space suffers from over-segmentation due to SAM's inherent part-level
segmentation tendency. To address these issues, we propose SA3DIP, a novel
method for Segmenting Any 3D Instances via exploiting potential 3D Priors.
Specifically, on one hand, we generate complementary 3D primitives based on
both geometric and textural priors, which reduces the initial errors that
accumulate in subsequent procedures. On the other hand, we introduce
supplemental constraints from the 3D space by using a 3D detector to guide a
further merging process. Furthermore, we notice a considerable portion of
low-quality ground truth annotations in ScanNetV2 benchmark, which affect the
fair evaluations. Thus, we present ScanNetV2-INS with complete ground truth
labels and supplement additional instances for 3D class-agnostic instance
segmentation. Experimental evaluations on various 2D-3D datasets demonstrate
the effectiveness and robustness of our approach. Our code and proposed
ScanNetV2-INS dataset are available HERE.Abstract