Preprint; 37 pages, 8 figures, 11 tables; Code at
https://github.com/ldkong1205/Calib3D
Safety-critical 3D scene understanding tasks necessitate not only accurate
but also confident predictions from 3D perception models. This study introduces
Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D
scene understanding models from an uncertainty estimation viewpoint. We
comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D
datasets, uncovering insightful phenomena that cope with both the aleatoric and
epistemic uncertainties in 3D scene understanding. We discover that despite
achieving impressive levels of accuracy, existing models frequently fail to
provide reliable uncertainty estimates -- a pitfall that critically undermines
their applicability in safety-sensitive contexts. Through extensive analysis of
key factors such as network capacity, LiDAR representations, rasterization
resolutions, and 3D data augmentation techniques, we correlate these aspects
directly with the model calibration efficacy. Furthermore, we introduce DeptS,
a novel depth-aware scaling approach aimed at enhancing 3D model calibration.
Extensive experiments across a wide range of configurations validate the
superiority of our method. We hope this work could serve as a cornerstone for
fostering reliable 3D scene understanding. Code and benchmark toolkits are
publicly available.Abstract
On the Trustworthiness Landscape of State-of-the-art Generative Models:
A Survey and Outlook
Diffusion models and large language models have emerged as leading-edge
generative models, revolutionizing various aspects of human life. However, the
practical implementations of these models have also exposed inherent risks,
bringing to the forefront their evil sides and sparking concerns regarding
their trustworthiness. Despite the wealth of literature on this subject, a
comprehensive survey specifically delving into the intersection of large-scale
generative models and their trustworthiness remains largely absent. To bridge
this gap, this paper investigates both the long-standing and emerging threats
associated with these models across four fundamental dimensions: 1) privacy, 2)
security, 3) fairness, and 4) responsibility. Based on the investigation
results, we develop an extensive map outlining the trustworthiness of large
generative models. After that, we provide practical recommendations and
potential research directions for future secure applications equipped with
large generative models, ultimately promoting the trustworthiness of the models
and benefiting the society as a whole.Abstract
Deep Metric Learning for Open World Semantic Segmentation
Classical close-set semantic segmentation networks have limited ability to
detect out-of-distribution (OOD) objects, which is important for
safety-critical applications such as autonomous driving. Incrementally learning
these OOD objects with few annotations is an ideal way to enlarge the knowledge
base of the deep learning models. In this paper, we propose an open world
semantic segmentation system that includes two modules: (1) an open-set
semantic segmentation module to detect both in-distribution and OOD objects.
(2) an incremental few-shot learning module to gradually incorporate those OOD
objects into its existing knowledge base. This open world semantic segmentation
system behaves like a human being, which is able to identify OOD objects and
gradually learn them with corresponding supervision. We adopt the Deep Metric
Learning Network (DMLNet) with contrastive clustering to implement open-set
semantic segmentation. Compared to other open-set semantic segmentation
methods, our DMLNet achieves state-of-the-art performance on three challenging
open-set semantic segmentation datasets without using additional data or
generative models. On this basis, two incremental few-shot learning methods are
further proposed to progressively improve the DMLNet with the annotations of
OOD objects.Abstract
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
arXiv:2411.03814v1 »Full PDF »Large Language Models (LLMs) demonstrate outstanding performance in their
reservoir of knowledge and understanding capabilities, but they have also been
shown to be prone to illegal or unethical reactions when subjected to jailbreak
attacks. To ensure their responsible deployment in critical applications, it is
crucial to understand the safety capabilities and vulnerabilities of LLMs.
Previous works mainly focus on jailbreak in single-round dialogue, overlooking
the potential jailbreak risks in multi-round dialogues, which are a vital way
humans interact with and extract information from LLMs. Some studies have
increasingly concentrated on the risks associated with jailbreak in multi-round
dialogues. These efforts typically involve the use of manually crafted
templates or prompt engineering techniques. However, due to the inherent
complexity of multi-round dialogues, their jailbreak performance is limited. To
solve this problem, we propose a novel multi-round dialogue jailbreaking agent,
emphasizing the importance of stealthiness in identifying and mitigating
potential threats to human values posed by LLMs. We propose a risk
decomposition strategy that distributes risks across multiple rounds of queries
and utilizes psychological strategies to enhance attack strength. Extensive
experiments show that our proposed method surpasses other attack methods and
achieves state-of-the-art attack success rate. We will make the corresponding
code and dataset available for future research. The code will be released soon.Abstract
Post-translational modifications (PTMs) profoundly expand the complexity and
functionality of the proteome, regulating protein attributes and interactions
that are crucial for biological processes. Accurately predicting PTM sites and
their specific types is therefore essential for elucidating protein function
and understanding disease mechanisms. Existing computational approaches
predominantly focus on protein sequences to predict PTM sites, driven by the
recognition of sequence-dependent motifs. However, these approaches often
overlook protein structural contexts. In this work, we first compile a
large-scale sequence-structure PTM dataset, which serves as the foundation for
fair comparison. We introduce the MeToken model, which tokenizes the
micro-environment of each amino acid, integrating both sequence and structural
information into unified discrete tokens. This model not only captures the
typical sequence motifs associated with PTMs but also leverages the spatial
arrangements dictated by protein tertiary structures, thus providing a holistic
view of the factors influencing PTM sites. Designed to address the long-tail
distribution of PTM types, MeToken employs uniform sub-codebooks that ensure
even the rarest PTMs are adequately represented and distinguished. We validate
the effectiveness and generalizability of MeToken across multiple datasets,
demonstrating its superior performance in accurately identifying PTM types. The
results underscore the importance of incorporating structural data and
highlight MeToken's potential in facilitating accurate and comprehensive PTM
predictions, which could significantly impact proteomics research. The code and
datasets are available at https://github.com/A4Bio/MeToken.Abstract
arXiv:2406.12747v2 »Full PDF »Effective imputation is a crucial preprocessing step for time series
analysis. Despite the development of numerous deep learning algorithms for time
series imputation, the community lacks standardized and comprehensive benchmark
platforms to effectively evaluate imputation performance across different
settings. Moreover, although many deep learning forecasting algorithms have
demonstrated excellent performance, whether their modelling achievements can be
transferred to time series imputation tasks remains unexplored. To bridge these
gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive
benchmark suite for time series imputation utilizing deep learning techniques.
The TSI-Bench pipeline standardizes experimental settings to enable fair
evaluation of imputation algorithms and identification of meaningful insights
into the influence of domain-appropriate missing rates and patterns on model
performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm
to tailor time series forecasting algorithms for imputation purposes. Our
extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with
diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse
downstream tasks and potential to unlock future directions in time series
imputation research and analysis. All source code and experiment logs are
released at https://github.com/WenjieDu/AwesomeImputation.Abstract
arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
FaceChain-FACT: Face Adapter with Decoupled Training for
Identity-preserved Personalization
In the field of human-centric personalized image generation, the
adapter-based method obtains the ability to customize and generate portraits by
text-to-image training on facial data. This allows for identity-preserved
personalization without additional fine-tuning in inference. Although there are
improvements in efficiency and fidelity, there is often a significant
performance decrease in test following ability, controllability, and diversity
of generated faces compared to the base model. In this paper, we analyze that
the performance degradation is attributed to the failure to decouple identity
features from other attributes during extraction, as well as the failure to
decouple the portrait generation training from the overall generation task. To
address these issues, we propose the Face Adapter with deCoupled Training
(FACT) framework, focusing on both model architecture and training strategy. To
decouple identity features from others, we leverage a transformer-based
face-export encoder and harness fine-grained identity features. To decouple the
portrait generation training, we propose Face Adapting Increment
Regularization~(FAIR), which effectively constrains the effect of face adapters
on the facial region, preserving the generative ability of the base model.
Additionally, we incorporate a face condition drop and shuffle mechanism,
combined with curriculum learning, to enhance facial controllability and
diversity. As a result, FACT solely learns identity preservation from training
data, thereby minimizing the impact on the original text-to-image capabilities
of the base model. Extensive experiments show that FACT has both
controllability and fidelity in both text-to-image generation and inpainting
solutions for portrait generation.Abstract
Deconstructing The Ethics of Large Language Models from Long-standing
Issues to New-emerging Dilemmas: A Survey
arXiv:2406.05392v2 »Full PDF »Large Language Models (LLMs) have achieved unparalleled success across
diverse language modeling tasks in recent years. However, this progress has
also intensified ethical concerns, impacting the deployment of LLMs in everyday
contexts. This paper provides a comprehensive survey of ethical challenges
associated with LLMs, from longstanding issues such as copyright infringement,
systematic bias, and data privacy, to emerging problems like truthfulness and
social norms. We critically analyze existing research aimed at understanding,
examining, and mitigating these ethical risks. Our survey underscores
integrating ethical standards and societal values into the development of LLMs,
thereby guiding the development of responsible and ethically aligned language
models.Abstract
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational
Learning
arXiv:2410.15010v1 »Full PDF »Molecular relational learning (MRL) is crucial for understanding the
interaction behaviors between molecular pairs, a critical aspect of drug
discovery and development. However, the large feasible model space of MRL poses
significant challenges to benchmarking, and existing MRL frameworks face
limitations in flexibility and scope. To address these challenges, avoid
repetitive coding efforts, and ensure fair comparison of models, we introduce
FlexMol, a comprehensive toolkit designed to facilitate the construction and
evaluation of diverse model architectures across various datasets and
performance metrics. FlexMol offers a robust suite of preset model components,
including 16 drug encoders, 13 protein sequence encoders, 9 protein structure
encoders, and 7 interaction layers. With its easy-to-use API and flexibility,
FlexMol supports the dynamic construction of over 70, 000 distinct combinations
of model architectures. Additionally, we provide detailed benchmark results and
code examples to demonstrate FlexMol's effectiveness in simplifying and
standardizing MRL model development and comparison.Abstract