Driving systems often rely on high-definition (HD) maps for precise
environmental information, which is crucial for planning and navigation. While
current HD map constructors perform well under ideal conditions, their
resilience to real-world challenges, \eg, adverse weather and sensor failures,
is not well understood, raising safety concerns. This work introduces MapBench,
the first comprehensive benchmark designed to evaluate the robustness of HD map
construction methods against various sensor corruptions. Our benchmark
encompasses a total of 29 types of corruptions that occur from cameras and
LiDAR sensors. Extensive evaluations across 31 HD map constructors reveal
significant performance degradation of existing methods under adverse weather
conditions and sensor failures, underscoring critical safety concerns. We
identify effective strategies for enhancing robustness, including innovative
approaches that leverage multi-modal fusion, advanced data augmentation, and
architectural techniques. These insights provide a pathway for developing more
reliable HD map construction methods, which are essential for the advancement
of autonomous driving technology. The benchmark toolkit and affiliated code and
model checkpoints have been made publicly accessible.Abstract
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth
Estimation
Accurate depth estimation under out-of-distribution (OoD) scenarios, such as
adverse weather conditions, sensor failure, and noise contamination, is
desirable for safety-critical applications. Existing depth estimation systems,
however, suffer inevitably from real-world corruptions and perturbations and
are struggled to provide reliable depth predictions under such cases. In this
paper, we summarize the winning solutions from the RoboDepth Challenge -- an
academic competition designed to facilitate and advance robust OoD depth
estimation. This challenge was developed based on the newly established KITTI-C
and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis
on robust self-supervised and robust fully-supervised depth estimation,
respectively. Out of more than two hundred participants, nine unique and
top-performing solutions have appeared, with novel designs ranging from the
following aspects: spatial- and frequency-domain augmentations, masked image
modeling, image restoration and super-resolution, adversarial training,
diffusion-based noise suppression, vision-language pre-training, learned model
ensembling, and hierarchical feature enhancement. Extensive experimental
analyses along with insightful observations are drawn to better understand the
rationale behind each design. We hope this challenge could lay a solid
foundation for future research on robust and reliable depth estimation and
beyond. The datasets, competition toolkit, workshop recordings, and source code
from the winning teams are publicly available on the challenge website.Abstract
An Empirical Study of Training State-of-the-Art LiDAR Segmentation
Models
Preprint; 17 pages, 4 figures, 7 tables; Code at
https://github.com/open-mmlab/mmdetection3d
In the rapidly evolving field of autonomous driving, precise segmentation of
LiDAR data is crucial for understanding complex 3D environments. Traditional
approaches often rely on disparate, standalone codebases, hindering unified
advancements and fair benchmarking across models. To address these challenges,
we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the
efficient training and evaluation of state-of-the-art LiDAR segmentation
models. We support a wide range of segmentation models and integrate advanced
data augmentation techniques to enhance robustness and generalization.
Additionally, the toolbox provides support for multiple leading sparse
convolution backends, optimizing computational efficiency and performance. By
fostering a unified framework, MMDetection3D-lidarseg streamlines development
and benchmarking, setting new standards for research and application. Our
extensive benchmark experiments on widely-used datasets demonstrate the
effectiveness of the toolbox. The codebase and trained models have been
publicly available, promoting further research and innovation in the field of
LiDAR segmentation for autonomous driving.Abstract
Calib3D: Calibrating Model Preferences for Reliable 3D Scene
Understanding
Preprint; 37 pages, 8 figures, 11 tables; Code at
https://github.com/ldkong1205/Calib3D
Safety-critical 3D scene understanding tasks necessitate not only accurate
but also confident predictions from 3D perception models. This study introduces
Calib3D, a pioneering effort to benchmark and scrutinize the reliability of 3D
scene understanding models from an uncertainty estimation viewpoint. We
comprehensively evaluate 28 state-of-the-art models across 10 diverse 3D
datasets, uncovering insightful phenomena that cope with both the aleatoric and
epistemic uncertainties in 3D scene understanding. We discover that despite
achieving impressive levels of accuracy, existing models frequently fail to
provide reliable uncertainty estimates -- a pitfall that critically undermines
their applicability in safety-sensitive contexts. Through extensive analysis of
key factors such as network capacity, LiDAR representations, rasterization
resolutions, and 3D data augmentation techniques, we correlate these aspects
directly with the model calibration efficacy. Furthermore, we introduce DeptS,
a novel depth-aware scaling approach aimed at enhancing 3D model calibration.
Extensive experiments across a wide range of configurations validate the
superiority of our method. We hope this work could serve as a cornerstone for
fostering reliable 3D scene understanding. Code and benchmark toolkits are
publicly available.Abstract
Optimizing LiDAR Placements for Robust Driving Perception in Adverse
Conditions
Preprint; 40 pages, 11 figures, 15 tables; Code at
https://github.com/ywyeli/Place3D
The robustness of driving perception systems under unprecedented conditions
is crucial for safety-critical usages. Latest advancements have prompted
increasing interests towards multi-LiDAR perception. However, prevailing
driving datasets predominantly utilize single-LiDAR systems and collect data
devoid of adverse conditions, failing to capture the complexities of real-world
environments accurately. Addressing these gaps, we proposed Place3D, a
full-cycle pipeline that encompasses LiDAR placement optimization, data
generation, and downstream evaluations. Our framework makes three appealing
contributions. 1) To identify the most effective configurations for multi-LiDAR
systems, we introduce a Surrogate Metric of the Semantic Occupancy Grids
(M-SOG) to evaluate LiDAR placement quality. 2) Leveraging the M-SOG metric, we
propose a novel optimization strategy to refine multi-LiDAR placements. 3)
Centered around the theme of multi-condition multi-LiDAR perception, we collect
a 364,000-frame dataset from both clean and adverse conditions. Extensive
experiments demonstrate that LiDAR placements optimized using our approach
outperform various baselines. We showcase exceptional robustness in both 3D
object detection and LiDAR semantic segmentation tasks, under diverse adverse
weather and sensor failure conditions. Code and benchmark toolkit are publicly
available.Abstract
RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions
Depth estimation from monocular images is pivotal for real-world visual
perception systems. While current learning-based depth estimation models train
and test on meticulously curated data, they often overlook out-of-distribution
(OoD) situations. Yet, in practical settings -- especially safety-critical ones
like autonomous driving -- common corruptions can arise. Addressing this
oversight, we introduce a comprehensive robustness test suite, RoboDepth,
encompassing 18 corruptions spanning three categories: i) weather and lighting
conditions; ii) sensor failures and movement; and iii) data processing
anomalies. We subsequently benchmark 42 depth estimation models across indoor
and outdoor scenes to assess their resilience to these corruptions. Our
findings underscore that, in the absence of a dedicated robustness evaluation
framework, many leading depth estimation models may be susceptible to typical
corruptions. We delve into design considerations for crafting more robust depth
estimation models, touching upon pre-training, augmentation, modality, model
capacity, and learning paradigms. We anticipate our benchmark will establish a
foundational platform for advancing robust OoD depth estimation.Abstract
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
The robustness of 3D perception systems under natural corruptions from
environments and sensors is pivotal for safety-critical applications. Existing
large-scale 3D perception datasets often contain data that are meticulously
cleaned. Such configurations, however, cannot reflect the reliability of
perception models during the deployment stage. In this work, we present Robo3D,
the first comprehensive benchmark heading toward probing the robustness of 3D
detectors and segmentors under out-of-distribution scenarios against natural
corruptions that occur in real-world environments. Specifically, we consider
eight corruption types stemming from severe weather conditions, external
disturbances, and internal sensor failure. We uncover that, although promising
results have been progressively achieved on standard benchmarks,
state-of-the-art 3D perception models are at risk of being vulnerable to
corruptions. We draw key observations on the use of data representations,
augmentation schemes, and training strategies, that could severely affect the
model's performance. To pursue better robustness, we propose a
density-insensitive training framework along with a simple flexible
voxelization strategy to enhance the model resiliency. We hope our benchmark
and approach could inspire future research in designing more robust and
reliable 3D perception models. Our robustness benchmark suite is publicly
available.Abstract
From Prohibition to Adoption: How Hong Kong Universities Are Navigating
ChatGPT in Academic Workflows
arXiv:2410.01695v3 »Full PDF »This paper aims at comparing the time when Hong Kong universities used to ban
ChatGPT to the current periods where it has become integrated in the academic
processes. Bolted by concerns of integrity and ethical issues in technologies,
institutions have adapted by moving towards the center adopting AI literacy and
responsibility policies. This study examines new paradigms which have been
developed to help implement these positives while preventing negative effects
on academia. Keywords: ChatGPT, Academic Integrity, AI Literacy, Ethical AI
Use, Generative AI in Education, University Policy, AI Integration in Academia,
Higher Education and TechnologyAbstract
Scaling Diffusion Language Models via Adaptation from Autoregressive
Models
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for
text generative modeling, potentially addressing limitations of autoregressive
(AR) models. However, current DLMs have been studied at a smaller scale
compared to their AR counterparts and lack fair comparison on language modeling
benchmarks. Additionally, training diffusion models from scratch at scale
remains challenging. Given the prevalence of open-source AR language models, we
propose adapting these models to build text diffusion models. We demonstrate
connections between AR and diffusion modeling objectives and introduce a simple
continual pre-training approach for training diffusion models. Through
systematic evaluation on language modeling, reasoning, and commonsense
benchmarks, we show that we can convert AR models ranging from 127M to 7B
parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA,
using less than 200B tokens for training. Our experimental results reveal that
these models outperform earlier DLMs and are competitive with their AR
counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters)
capable of generating fluent text, performing in-context learning, filling in
the middle without prompt re-ordering, and following instructions
\url{https://github.com/HKUNLP/DiffuLLaMA}.Abstract
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through
Failure-Inducing Exploration
arXiv:2410.16736v1 »Full PDF »Large language models (LLMs) have significantly benefited from training on
diverse, high-quality task-specific data, leading to impressive performance
across a range of downstream applications. Current methods often rely on
human-annotated data or predefined task templates to direct powerful LLMs in
synthesizing task-relevant data for effective model training. However, this
dependence on manually designed components may constrain the scope of generated
data, potentially overlooking critical edge cases or novel scenarios that could
challenge the model. In this paper, we present a novel approach, ReverseGen,
designed to automatically generate effective training samples that expose the
weaknesses of LLMs. Specifically, we introduce a dedicated proposer trained to
produce queries that lead target models to generate unsatisfactory responses.
These failure-inducing queries are then used to construct training data,
helping to address the models' shortcomings and improve overall performance.
Our approach is flexible and can be applied to models of various scales (3B,
7B, and 8B). We evaluate ReverseGen on three key applications (safety, honesty,
and math), demonstrating that our generated data is both highly effective and
diverse. Models fine-tuned with ReverseGen-generated data consistently
outperform those trained on human-annotated or general model-generated data,
offering a new perspective on data synthesis for task-specific LLM enhancement.Abstract