The goal of multi-objective optimization (MOO) is to learn under multiple,
potentially conflicting, objectives. One widely used technique to tackle MOO is
through linear scalarization, where one fixed preference vector is used to
combine the objectives into a single scalar value for optimization. However,
recent work (Hu et al., 2024) has shown linear scalarization often fails to
capture the non-convex regions of the Pareto Front, failing to recover the
complete set of Pareto optimal solutions. In light of the above limitations,
this paper focuses on Tchebycheff scalarization that optimizes for the
worst-case objective. In particular, we propose an online mirror descent
algorithm for Tchebycheff scalarization, which we call OMD-TCH. We show that
OMD-TCH enjoys a convergence rate of O(√logm/T) where m is the
number of objectives and T is the number of iteration rounds. We also propose
a novel adaptive online-to-batch conversion scheme that significantly improves
the practical performance of OMD-TCH while maintaining the same convergence
guarantees. We demonstrate the effectiveness of OMD-TCH and the adaptive
conversion scheme on both synthetic problems and federated learning tasks under
fairness constraints, showing state-of-the-art performance.Abstract
7th ICML Workshop on Automated Machine Learning (2020)
The CASH problem has been widely studied in the context of automated
configurations of machine learning (ML) pipelines and various solvers and
toolkits are available. However, CASH solvers do not directly handle black-box
constraints such as fairness, robustness or other domain-specific custom
constraints. We present our recent approach [Liu, et al., 2020] that leverages
the ADMM optimization framework to decompose CASH into multiple small problems
and demonstrate how ADMM facilitates incorporation of black-box constraints.Abstract
Revisiting, Benchmarking and Understanding Unsupervised Graph Domain
Adaptation
Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of
knowledge from a label-rich source graph to an unlabeled target graph under
domain discrepancies. Despite the proliferation of methods designed for this
emerging task, the lack of standard experimental settings and fair performance
comparisons makes it challenging to understand which and when models perform
well across different scenarios. To fill this gap, we present the first
comprehensive benchmark for unsupervised graph domain adaptation named
GDABench, which encompasses 16 algorithms across 5 datasets with 74 adaptation
tasks. Through extensive experiments, we observe that the performance of
current UGDA models varies significantly across different datasets and
adaptation scenarios. Specifically, we recognize that when the source and
target graphs face significant distribution shifts, it is imperative to
formulate strategies to effectively address and mitigate graph structural
shifts. We also find that with appropriate neighbourhood aggregation
mechanisms, simple GNN variants can even surpass state-of-the-art UGDA
baselines. To facilitate reproducibility, we have developed an easy-to-use
library PyGDA for training and evaluating existing UGDA methods, providing a
standardized platform in this community. Our source codes and datasets can be
found at: https://github.com/pygda-team/pygda.Abstract
LongSafetyBench: Long-Context LLMs Struggle with Safety Issues
arXiv:2411.06899v1 »Full PDF »With the development of large language models (LLMs), the sequence length of
these models continues to increase, drawing significant attention to
long-context language models. However, the evaluation of these models has been
primarily limited to their capabilities, with a lack of research focusing on
their safety. Existing work, such as ManyShotJailbreak, has to some extent
demonstrated that long-context language models can exhibit safety concerns.
However, the methods used are limited and lack comprehensiveness. In response,
we introduce \textbf{LongSafetyBench}, the first benchmark designed to
objectively and comprehensively evaluate the safety of long-context models.
LongSafetyBench consists of 10 task categories, with an average length of
41,889 words. After testing eight long-context language models on
LongSafetyBench, we found that existing models generally exhibit insufficient
safety capabilities. The proportion of safe responses from most mainstream
long-context LLMs is below 50\%. Moreover, models' safety performance in
long-context scenarios does not always align with that in short-context
scenarios. Further investigation revealed that long-context models tend to
overlook harmful content within lengthy texts. We also proposed a simple yet
effective solution, allowing open-source models to achieve performance
comparable to that of top-tier closed-source models. We believe that
LongSafetyBench can serve as a valuable benchmark for evaluating the safety
capabilities of long-context language models. We hope that our work will
encourage the broader community to pay attention to the safety of long-context
models and contribute to the development of solutions to improve the safety of
long-context LLMs.Abstract
CIMRL: Combining IMitation and Reinforcement Learning for Safe
Autonomous Driving
arXiv:2406.08878v4 »Full PDF »Modern approaches to autonomous driving rely heavily on learned components
trained with large amounts of human driving data via imitation learning.
However, these methods require large amounts of expensive data collection and
even then face challenges with safely handling long-tail scenarios and
compounding errors over time. At the same time, pure Reinforcement Learning
(RL) methods can fail to learn performant policies in sparse, constrained, and
challenging-to-define reward settings such as autonomous driving. Both of these
challenges make deploying purely cloned or pure RL policies in safety critical
applications such as autonomous vehicles challenging. In this paper we propose
Combining IMitation and Reinforcement Learning (CIMRL) approach - a safe
reinforcement learning framework that enables training driving policies in
simulation through leveraging imitative motion priors and safety constraints.
CIMRL does not require extensive reward specification and improves on the
closed loop behavior of pure cloning methods. By combining RL and imitation, we
demonstrate that our method achieves state-of-the-art results in closed loop
simulation and real world driving benchmarks.Abstract
A Comprehensive Survey and Guide to Multimodal Large Language Models in
Vision-Language Tasks
arXiv:2411.06284v1 »Full PDF »This survey and application guide to multimodal large language models(MLLMs)
explores the rapidly developing field of MLLMs, examining their architectures,
applications, and impact on AI and Generative Models. Starting with
foundational concepts, we delve into how MLLMs integrate various data types,
including text, images, video and audio, to enable complex AI systems for
cross-modal understanding and generation. It covers essential topics such as
training methods, architectural components, and practical applications in
various fields, from visual storytelling to enhanced accessibility. Through
detailed case studies and technical analysis, the text examines prominent MLLM
implementations while addressing key challenges in scalability, robustness, and
cross-modal learning. Concluding with a discussion of ethical considerations,
responsible AI development, and future directions, this authoritative resource
provides both theoretical frameworks and practical insights. It offers a
balanced perspective on the opportunities and challenges in the development and
deployment of MLLMs, and is highly valuable for researchers, practitioners, and
students interested in the intersection of natural language processing and
computer vision.Abstract
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
arXiv:2411.04905v2 »Full PDF »Large language models (LLMs) for code have become indispensable in various
domains, including code generation, reasoning tasks and agent systems. While
open-access code LLMs are increasingly approaching the performance levels of
proprietary models, high-quality code LLMs suitable for rigorous scientific
investigation, particularly those with reproducible data processing pipelines
and transparent training protocols, remain limited. The scarcity is due to
various challenges, including resource constraints, ethical considerations, and
the competitive advantages of keeping models advanced. To address the gap, we
introduce OpenCoder, a top-tier code LLM that not only achieves performance
comparable to leading models but also serves as an "open cookbook" for the
research community. Unlike most prior efforts, we release not only model
weights and inference code, but also the reproducible training data, complete
data processing pipeline, rigorous experimental ablation results, and detailed
training protocols for open scientific research. Through this comprehensive
release, we identify the key ingredients for building a top-tier code LLM: (1)
code optimized heuristic rules for data cleaning and methods for data
deduplication, (2) recall of text corpus related to code and (3) high-quality
synthetic data in both annealing and supervised fine-tuning stages. By offering
this level of openness, we aim to broaden access to all aspects of a top-tier
code LLM, with OpenCoder serving as both a powerful model and an open
foundation to accelerate research, and enable reproducible advancements in code
AI.Abstract
Achievable Fairness on Your Data With Utility Guarantees
arXiv:2402.17106v4 »Full PDF »In machine learning fairness, training models that minimize disparity across
different sensitive groups often leads to diminished accuracy, a phenomenon
known as the fairness-accuracy trade-off. The severity of this trade-off
inherently depends on dataset characteristics such as dataset imbalances or
biases and therefore, using a uniform fairness requirement across diverse
datasets remains questionable. To address this, we present a computationally
efficient approach to approximate the fairness-accuracy trade-off curve
tailored to individual datasets, backed by rigorous statistical guarantees. By
utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the
computational burden of having to train multiple models when approximating the
trade-off curve. Crucially, we introduce a novel methodology for quantifying
uncertainty in our estimates, thereby providing practitioners with a robust
framework for auditing model fairness while avoiding false conclusions due to
estimation errors. Our experiments spanning tabular (e.g., Adult), image
(CelebA), and language (Jigsaw) datasets underscore that our approach not only
reliably quantifies the optimum achievable trade-offs across various data
modalities but also helps detect suboptimality in SOTA fairness methods.Abstract
GlitchMiner: Mining Glitch Tokens in Large Language Models via
Gradient-based Discrete Optimization
arXiv:2410.15052v4 »Full PDF »Glitch tokens in Large Language Models (LLMs) can trigger unpredictable
behaviors, threatening model reliability and safety. Existing detection methods
rely on predefined patterns, limiting their adaptability across diverse LLM
architectures. We propose GlitchMiner, a gradient-based discrete optimization
framework that efficiently identifies glitch tokens by introducing entropy as a
measure of prediction uncertainty and employing a local search strategy to
explore the token space. Experiments across multiple LLM architectures
demonstrate that GlitchMiner outperforms existing methods in detection accuracy
and adaptability, achieving over 10% average efficiency improvement. This
method enhances vulnerability assessment in LLMs, contributing to the
development of more robust and reliable applications. Code is available at
https://github.com/wooozihui/GlitchMiner.Abstract
Counterfactual Fairness by Combining Factual and Counterfactual
Predictions
In high-stake domains such as healthcare and hiring, the role of machine
learning (ML) in decision-making raises significant fairness concerns. This
work focuses on Counterfactual Fairness (CF), which posits that an ML model's
outcome on any individual should remain unchanged if they had belonged to a
different demographic group. Previous works have proposed methods that
guarantee CF. Notwithstanding, their effects on the model's predictive
performance remains largely unclear. To fill in this gap, we provide a
theoretical study on the inherent trade-off between CF and predictive
performance in a model-agnostic manner. We first propose a simple but effective
method to cast an optimal but potentially unfair predictor into a fair one
without losing the optimality. By analyzing its excess risk in order to achieve
CF, we quantify this inherent trade-off. Further analysis on our method's
performance with access to only incomplete causal knowledge is also conducted.
Built upon it, we propose a performant algorithm that can be applied in such
scenarios. Experiments on both synthetic and semi-synthetic datasets
demonstrate the validity of our analysis and methods.Abstract