Accurate depth estimation under out-of-distribution (OoD) scenarios, such as
adverse weather conditions, sensor failure, and noise contamination, is
desirable for safety-critical applications. Existing depth estimation systems,
however, suffer inevitably from real-world corruptions and perturbations and
are struggled to provide reliable depth predictions under such cases. In this
paper, we summarize the winning solutions from the RoboDepth Challenge -- an
academic competition designed to facilitate and advance robust OoD depth
estimation. This challenge was developed based on the newly established KITTI-C
and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis
on robust self-supervised and robust fully-supervised depth estimation,
respectively. Out of more than two hundred participants, nine unique and
top-performing solutions have appeared, with novel designs ranging from the
following aspects: spatial- and frequency-domain augmentations, masked image
modeling, image restoration and super-resolution, adversarial training,
diffusion-based noise suppression, vision-language pre-training, learned model
ensembling, and hierarchical feature enhancement. Extensive experimental
analyses along with insightful observations are drawn to better understand the
rationale behind each design. We hope this challenge could lay a solid
foundation for future research on robust and reliable depth estimation and
beyond. The datasets, competition toolkit, workshop recordings, and source code
from the winning teams are publicly available on the challenge website.Abstract
SynFacePAD 2023: Competition on Face Presentation Attack Detection Based
on Privacy-aware Synthetic Training Data
This paper presents a summary of the Competition on Face Presentation Attack
Detection Based on Privacy-aware Synthetic Training Data (SynFacePAD 2023) held
at the 2023 International Joint Conference on Biometrics (IJCB 2023). The
competition attracted a total of 8 participating teams with valid submissions
from academia and industry. The competition aimed to motivate and attract
solutions that target detecting face presentation attacks while considering
synthetic-based training data motivated by privacy, legal and ethical concerns
associated with personal data. To achieve that, the training data used by the
participants was limited to synthetic data provided by the organizers. The
submitted solutions presented innovations and novel approaches that led to
outperforming the considered baseline in the investigated benchmarks.Abstract
Guided depth map super-resolution (GDSR), which aims to reconstruct a
high-resolution (HR) depth map from a low-resolution (LR) observation with the
help of a paired HR color image, is a longstanding and fundamental problem, it
has attracted considerable attention from computer vision and image processing
communities. A myriad of novel and effective approaches have been proposed
recently, especially with powerful deep learning techniques. This survey is an
effort to present a comprehensive survey of recent progress in GDSR. We start
by summarizing the problem of GDSR and explaining why it is challenging. Next,
we introduce some commonly used datasets and image quality assessment methods.
In addition, we roughly classify existing GDSR methods into three categories,
i.e., filtering-based methods, prior-based methods, and learning-based methods.
In each category, we introduce the general description of the published
algorithms and design principles, summarize the representative methods, and
discuss their highlights and limitations. Moreover, the depth related
applications are introduced. Furthermore, we conduct experiments to evaluate
the performance of some representative methods based on unified experimental
configurations, so as to offer a systematic and fair performance evaluation to
readers. Finally, we conclude this survey with possible directions and open
problems for further research. All the related materials can be found at
\url{https://github.com/zhwzhong/Guided-Depth-Map-Super-resolution-A-Survey}.Abstract
A Survey on Large Language Models for Code Generation
arXiv:2406.00515v2 »Full PDF »Large Language Models (LLMs) have garnered remarkable advancements across
diverse code-related tasks, known as Code LLMs, particularly in code generation
that generates source code with LLM from natural language descriptions. This
burgeoning field has captured significant interest from both academic
researchers and industry professionals due to its practical significance in
software development, e.g., GitHub Copilot. Despite the active exploration of
LLMs for a variety of code tasks, either from the perspective of natural
language processing (NLP) or software engineering (SE) or both, there is a
noticeable absence of a comprehensive and up-to-date literature review
dedicated to LLM for code generation. In this survey, we aim to bridge this gap
by providing a systematic literature review that serves as a valuable reference
for researchers investigating the cutting-edge progress in LLMs for code
generation. We introduce a taxonomy to categorize and discuss the recent
developments in LLMs for code generation, covering aspects such as data
curation, latest advances, performance evaluation, ethical implications,
environmental impact, and real-world applications. In addition, we present a
historical overview of the evolution of LLMs for code generation and offer an
empirical comparison using the HumanEval, MBPP, and BigCodeBench benchmarks
across various levels of difficulty and types of programming tasks to highlight
the progressive enhancements in LLM capabilities for code generation. We
identify critical challenges and promising opportunities regarding the gap
between academia and practical development. Furthermore, we have established a
dedicated resource GitHub page (https://github.com/juyongjiang/CodeLLMSurvey)
to continuously document and disseminate the most recent advances in the field.Abstract
Fairness Risks for Group-conditionally Missing Demographics
arXiv:2402.13393v2 »Full PDF »Fairness-aware classification models have gained increasing attention in
recent years as concerns grow on discrimination against some demographic
groups. Most existing models require full knowledge of the sensitive features,
which can be impractical due to privacy, legal issues, and an individual's fear
of discrimination. The key challenge we will address is the group dependency of
the unavailability, e.g., people of some age range may be more reluctant to
reveal their age. Our solution augments general fairness risks with
probabilistic imputations of the sensitive features, while jointly learning the
group-conditionally missing probabilities in a variational auto-encoder. Our
model is demonstrated effective on both image and tabular datasets, achieving
an improved balance between accuracy and fairness.Abstract
Narrative Feature or Structured Feature? A Study of Large Language
Models to Identify Cancer Patients at Risk of Heart Failure
Cancer treatments are known to introduce cardiotoxicity, negatively impacting
outcomes and survivorship. Identifying cancer patients at risk of heart failure
(HF) is critical to improving cancer treatment outcomes and safety. This study
examined machine learning (ML) models to identify cancer patients at risk of HF
using electronic health records (EHRs), including traditional ML, Time-Aware
long short-term memory (T-LSTM), and large language models (LLMs) using novel
narrative features derived from the structured medical codes. We identified a
cancer cohort of 12,806 patients from the University of Florida Health,
diagnosed with lung, breast, and colorectal cancers, among which 1,602
individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the
best F1 scores, outperforming the traditional support vector machines by 39%,
the T-LSTM deep learning model by 7%, and a widely used transformer model,
BERT, by 5.6%. The analysis shows that the proposed narrative features
remarkably increased feature density and improved performance.Abstract
FEED: Fairness-Enhanced Meta-Learning for Domain Generalization
Generalizing to out-of-distribution data while being aware of model fairness
is a significant and challenging problem in meta-learning. The goal of this
problem is to find a set of fairness-aware invariant parameters of classifier
that is trained using data drawn from a family of related training domains with
distribution shift on non-sensitive features as well as different levels of
dependence between model predictions and sensitive features so that the
classifier can achieve good generalization performance on unknown but distinct
test domains. To tackle this challenge, existing state-of-the-art methods
either address the domain generalization problem but completely ignore learning
with fairness or solely specify shifted domains with various fairness levels.
This paper introduces an approach to fairness-aware meta-learning that
significantly enhances domain generalization capabilities. Our framework,
Fairness-Enhanced Meta-Learning for Domain Generalization (FEED), disentangles
latent data representations into content, style, and sensitive vectors. This
disentanglement facilitates the robust generalization of machine learning
models across diverse domains while adhering to fairness constraints. Unlike
traditional methods that focus primarily on domain invariance or sensitivity to
shifts, our model integrates a fairness-aware invariance criterion directly
into the meta-learning process. This integration ensures that the learned
parameters uphold fairness consistently, even when domain characteristics vary
widely. We validate our approach through extensive experiments across multiple
benchmarks, demonstrating not only superior performance in maintaining high
accuracy and fairness but also significant improvements over existing
state-of-the-art methods in domain generalization tasks.Abstract
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench
Generative models such as Large Language Models (LLM) and Multimodal Large
Language models (MLLMs) trained on massive web corpora can memorize and
disclose individuals' confidential and private data, raising legal and ethical
concerns. While many previous works have addressed this issue in LLM via
machine unlearning, it remains largely unexplored for MLLMs. To tackle this
challenge, we introduce Multimodal Large Language Model Unlearning Benchmark
(MLLMU-Bench), a novel benchmark aimed at advancing the understanding of
multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles
and 153 profiles for public celebrities, each profile feature over 14
customized question-answer pairs, evaluated from both multimodal (image+text)
and unimodal (text) perspectives. The benchmark is divided into four sets to
assess unlearning algorithms in terms of efficacy, generalizability, and model
utility. Finally, we provide baseline results using existing generative model
unlearning algorithms. Surprisingly, our experiments show that unimodal
unlearning algorithms excel in generation and cloze tasks, while multimodal
unlearning approaches perform better in classification tasks with multimodal
inputs.Abstract
GLBench: A Comprehensive Benchmark for Graph with Large Language Models
arXiv:2407.07457v4 »Full PDF »The emergence of large language models (LLMs) has revolutionized the way we
interact with graphs, leading to a new paradigm called GraphLLM. Despite the
rapid development of GraphLLM methods in recent years, the progress and
understanding of this field remain unclear due to the lack of a benchmark with
consistent experimental protocols. To bridge this gap, we introduce GLBench,
the first comprehensive benchmark for evaluating GraphLLM methods in both
supervised and zero-shot scenarios. GLBench provides a fair and thorough
evaluation of different categories of GraphLLM methods, along with traditional
baselines such as graph neural networks. Through extensive experiments on a
collection of real-world datasets with consistent data processing and splitting
strategies, we have uncovered several key findings. Firstly, GraphLLM methods
outperform traditional baselines in supervised settings, with LLM-as-enhancers
showing the most robust performance. However, using LLMs as predictors is less
effective and often leads to uncontrollable output issues. We also notice that
no clear scaling laws exist for current GraphLLM methods. In addition, both
structures and semantics are crucial for effective zero-shot transfer, and our
proposed simple baseline can even outperform several models tailored for
zero-shot scenarios. The data and code of the benchmark can be found at
https://github.com/NineAbyss/GLBench.Abstract
arXiv:2411.00827v1 »Full PDF »As large Vision-Language Models (VLMs) continue to gain prominence, ensuring
their safety deployment in real-world applications has become a critical
concern. Recently, significant research efforts have focused on evaluating the
robustness of VLMs against jailbreak attacks. Due to challenges in obtaining
multi-modal data, current studies often assess VLM robustness by generating
adversarial or query-relevant images based on harmful text datasets. However,
the jailbreak images generated this way exhibit certain limitations.
Adversarial images require white-box access to the target VLM and are
relatively easy to defend against, while query-relevant images must be linked
to the target harmful content, limiting their diversity and effectiveness. In
this paper, we propose a novel jailbreak method named IDEATOR, which
autonomously generates malicious image-text pairs for black-box jailbreak
attacks. IDEATOR is a VLM-based approach inspired by our conjecture that a VLM
itself might be a powerful red team model for generating jailbreak prompts.
Specifically, IDEATOR employs a VLM to generate jailbreak texts while
leveraging a state-of-the-art diffusion model to create corresponding jailbreak
images. Extensive experiments demonstrate the high effectiveness and
transferability of IDEATOR. It successfully jailbreaks MiniGPT-4 with a 94%
success rate and transfers seamlessly to LLaVA and InstructBLIP, achieving high
success rates of 82% and 88%, respectively. IDEATOR uncovers previously
unrecognized vulnerabilities in VLMs, calling for advanced safety mechanisms.Abstract