arXiv:2201.13001v8 »Full PDF »Deep discriminative approaches like random forests and deep neural networks
have recently found applications in many important real-world scenarios.
However, deploying these learning algorithms in safety-critical applications
raises concerns, particularly when it comes to ensuring confidence calibration
for both in-distribution and out-of-distribution data points. Many popular
methods for in-distribution (ID) calibration, such as isotonic and Platt's
sigmoidal regression, exhibit excellent ID calibration performance. However,
these methods are not calibrated for the entire feature space, leading to
overconfidence in the case of out-of-distribution (OOD) samples. On the other
end of the spectrum, existing out-of-distribution (OOD) calibration methods
generally exhibit poor in-distribution (ID) calibration. In this paper, we
address ID and OOD calibration problems jointly. We leveraged the fact that
deep models, including both random forests and deep-nets, learn internal
representations which are unions of polytopes with affine activation functions
to conceptualize them both as partitioning rules of the feature space. We
replace the affine function in each polytope populated by the training data
with a Gaussian kernel. Our experiments on both tabular and vision benchmarks
show that the proposed approaches obtain well-calibrated posteriors while
mostly preserving or improving the classification accuracy of the original
algorithm for ID region, and extrapolate beyond the training data to handle OOD
inputs appropriately.Abstract
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
arXiv:2411.04905v2 »Full PDF »Large language models (LLMs) for code have become indispensable in various
domains, including code generation, reasoning tasks and agent systems. While
open-access code LLMs are increasingly approaching the performance levels of
proprietary models, high-quality code LLMs suitable for rigorous scientific
investigation, particularly those with reproducible data processing pipelines
and transparent training protocols, remain limited. The scarcity is due to
various challenges, including resource constraints, ethical considerations, and
the competitive advantages of keeping models advanced. To address the gap, we
introduce OpenCoder, a top-tier code LLM that not only achieves performance
comparable to leading models but also serves as an "open cookbook" for the
research community. Unlike most prior efforts, we release not only model
weights and inference code, but also the reproducible training data, complete
data processing pipeline, rigorous experimental ablation results, and detailed
training protocols for open scientific research. Through this comprehensive
release, we identify the key ingredients for building a top-tier code LLM: (1)
code optimized heuristic rules for data cleaning and methods for data
deduplication, (2) recall of text corpus related to code and (3) high-quality
synthetic data in both annealing and supervised fine-tuning stages. By offering
this level of openness, we aim to broaden access to all aspects of a top-tier
code LLM, with OpenCoder serving as both a powerful model and an open
foundation to accelerate research, and enable reproducible advancements in code
AI.Abstract
Pedestrian Volume Prediction Using a Diffusion Convolutional Gated
Recurrent Unit Model
arXiv:2411.03360v1 »Full PDF »Effective models for analysing and predicting pedestrian flow are important
to ensure the safety of both pedestrians and other road users. These tools also
play a key role in optimising infrastructure design and geometry and supporting
the economic utility of interconnected communities. The implementation of
city-wide automatic pedestrian counting systems provides researchers with
invaluable data, enabling the development and training of deep learning
applications that offer better insights into traffic and crowd flows.
Benefiting from real-world data provided by the City of Melbourne pedestrian
counting system, this study presents a pedestrian flow prediction model, as an
extension of Diffusion Convolutional Grated Recurrent Unit (DCGRU) with dynamic
time warping, named DCGRU-DTW. This model captures the spatial dependencies of
pedestrian flow through the diffusion process and the temporal dependency
captured by Gated Recurrent Unit (GRU). Through extensive numerical
experiments, we demonstrate that the proposed model outperforms the classic
vector autoregressive model and the original DCGRU across multiple model
accuracy metrics.Abstract
arXiv:2304.13917v3 »Full PDF »In recent years, there has been a surge in effort to formalize notions of
fairness in machine learning. We focus on centroid clustering--one of the
fundamental tasks in unsupervised machine learning. We propose a new axiom
``proportionally representative fairness'' (PRF) that is designed for
clustering problems where the selection of centroids reflects the distribution
of data points and how tightly they are clustered together. Our fairness
concept is not satisfied by existing fair clustering algorithms. We design
efficient algorithms to achieve PRF both for unconstrained and discrete
clustering problems. Our algorithm for the unconstrained setting is also the
first known polynomial-time approximation algorithm for the well-studied
Proportional Fairness (PF) axiom. Our algorithm for the discrete setting also
matches the best known approximation factor for PF.Abstract
Training Fair Models in Federated Learning without Data Privacy
Infringement
Accepted by IEEE International Conference on Big Data (2024)
Training fair machine learning models becomes more and more important. As
many powerful models are trained by collaboration among multiple parties, each
holding some sensitive data, it is natural to explore the feasibility of
training fair models in federated learning so that the fairness of trained
models, the data privacy of clients, and the collaboration between clients can
be fully respected simultaneously. However, the task of training fair models in
federated learning is challenging, since it is far from trivial to estimate the
fairness of a model without knowing the private data of the participating
parties, which is often constrained by privacy requirements in federated
learning. In this paper, we first propose a federated estimation method to
accurately estimate the fairness of a model without infringing the data privacy
of any party. Then, we use the fairness estimation to formulate a novel problem
of training fair models in federated learning. We develop FedFair, a
well-designed federated learning framework, which can successfully train a fair
model with high performance without data privacy infringement. Our extensive
experiments on three real-world data sets demonstrate the excellent fair model
training performance of our method.Abstract
V2X-Assisted Distributed Computing and Control Framework for Connected
and Automated Vehicles under Ramp Merging Scenario
This paper has been submitted to IEEE Journal. The source code has
been released at:
https://git...
This paper investigates distributed computing and cooperative control of
connected and automated vehicles (CAVs) in ramp merging scenario under
transportation cyber-physical system. Firstly, a centralized cooperative
trajectory planning problem is formulated subject to the safely constraints and
traffic performance in ramp merging scenario, where the trajectories of all
vehicles are jointly optimized. To get rid of the reliance on a central
controller and reduce computation time, a distributed solution to this problem
implemented among CAVs through Vehicles-to-Everything (V2X) communication is
proposed. Unlike existing method, our method can distribute the computational
task among CAVs and carry out parallel solving through V2X communication. Then,
a multi-vehicles model predictive control (MPC) problem aimed at maximizing
system stability and minimizing control input is formulated based on the
solution of the first problem subject to strict safety constants and input
limits. Due to these complex constraints, this problem becomes
high-dimensional, centralized, and non-convex. To solve it in a short time, a
decomposition and convex reformulation method, namely distributed cooperative
iterative model predictive control (DCIMPC), is proposed. This method leverages
the communication capability of CAVs to decompose the problem, making full use
of the computational resources on vehicles to achieve fast solutions and
distributed control. The two above problems with their corresponding solving
methods form the systemic framework of the V2X assisted distributed computing
and control. Simulations have been conducted to evaluate the framework's
convergence, safety, and solving speed. Additionally, extra experiments are
conducted to validate the performance of DCIMPC. The results show that our
method can greatly improve computation speed without sacrificing system
performance.Abstract
arXiv:2410.21276v1 »Full PDF »GPT-4o is an autoregressive omni model that accepts as input any combination
of text, audio, image, and video, and generates any combination of text, audio,
and image outputs. It's trained end-to-end across text, vision, and audio,
meaning all inputs and outputs are processed by the same neural network. GPT-4o
can respond to audio inputs in as little as 232 milliseconds, with an average
of 320 milliseconds, which is similar to human response time in conversation.
It matches GPT-4 Turbo performance on text in English and code, with
significant improvement on text in non-English languages, while also being much
faster and 50\% cheaper in the API. GPT-4o is especially better at vision and
audio understanding compared to existing models. In line with our commitment to
building AI safely and consistent with our voluntary commitments to the White
House, we are sharing the GPT-4o System Card, which includes our Preparedness
Framework evaluations. In this System Card, we provide a detailed look at
GPT-4o's capabilities, limitations, and safety evaluations across multiple
categories, focusing on speech-to-speech while also evaluating text and image
capabilities, and measures we've implemented to ensure the model is safe and
aligned. We also include third-party assessments on dangerous capabilities, as
well as discussion of potential societal impacts of GPT-4o's text and vision
capabilities.Abstract
Should We Really Edit Language Models? On the Evaluation of Edited
Language Models
Model editing has become an increasingly popular alternative for efficiently
updating knowledge within language models. Current methods mainly focus on
reliability, generalization, and locality, with many methods excelling across
these criteria. Some recent works disclose the pitfalls of these editing
methods such as knowledge distortion or conflict. However, the general
abilities of post-edited language models remain unexplored. In this paper, we
perform a comprehensive evaluation on various editing methods and different
language models, and have following findings. (1) Existing editing methods lead
to inevitable performance deterioration on general benchmarks, indicating that
existing editing methods maintain the general abilities of the model within
only a few dozen edits. When the number of edits is slightly large, the
intrinsic knowledge structure of the model is disrupted or even completely
damaged. (2) Instruction-tuned models are more robust to editing, showing less
performance drop on general knowledge after editing. (3) Language model with
large scale is more resistant to editing compared to small model. (4) The
safety of the edited model, is significantly weakened, even for those
safety-aligned models. Our findings indicate that current editing methods are
only suitable for small-scale knowledge updates within language models, which
motivates further research on more practical and reliable editing methods. The
details of code and reproduction can be found in
https://github.com/lqinfdim/EditingEvaluation.Abstract
Integrating Large Language Models for UAV Control in Simulated
Environments: A Modular Interaction Approach
arXiv:2410.17602v1 »Full PDF »The intersection of LLMs (Large Language Models) and UAV (Unoccupied Aerial
Vehicles) technology represents a promising field of research with the
potential to enhance UAV capabilities significantly. This study explores the
application of LLMs in UAV control, focusing on the opportunities for
integrating advanced natural language processing into autonomous aerial
systems. By enabling UAVs to interpret and respond to natural language
commands, LLMs simplify the UAV control and usage, making them accessible to a
broader user base and facilitating more intuitive human-machine interactions.
The paper discusses several key areas where LLMs can impact UAV technology,
including autonomous decision-making, dynamic mission planning, enhanced
situational awareness, and improved safety protocols. Through a comprehensive
review of current developments and potential future directions, this study aims
to highlight how LLMs can transform UAV operations, making them more adaptable,
responsive, and efficient in complex environments. A template development
framework for integrating LLMs in UAV control is also described. Proof of
Concept results that integrate existing LLM models and popular robotic
simulation platforms are demonstrated. The findings suggest that while there
are substantial technical and ethical challenges to address, integrating LLMs
into UAV control holds promising implications for advancing autonomous aerial
systems.Abstract
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection
Bias in LLMs-as-Judges
The use of large language models (LLMs) as automated evaluation tools to
assess the quality of generated natural language, known as LLMs-as-Judges, has
demonstrated promising capabilities and is rapidly gaining widespread
attention. However, when applied to pairwise comparisons of candidate
responses, LLM-based evaluators often exhibit selection bias. Specifically,
their judgments may become inconsistent when the option positions or ID tokens
are swapped, compromising the effectiveness and fairness of the evaluation
result. To address this challenge, we introduce CalibraEval, a novel label-free
method for mitigating selection bias during inference. Specifically,
CalibraEval reformulates debiasing as an optimization task aimed at adjusting
observed prediction distributions to align with unbiased prediction
distributions. To solve this optimization problem, we propose a non-parametric
order-preserving algorithm (NOA). This algorithm leverages the partial order
relationships between model prediction distributions, thereby eliminating the
need for explicit labels and precise mathematical function modeling.Empirical
evaluations of LLMs in multiple representative benchmarks demonstrate that
CalibraEval effectively mitigates selection bias and improves performance
compared to existing debiasing methods. This work marks a step toward building
more robust and unbiased automated evaluation frameworks, paving the way for
improved reliability in AI-driven assessmentsAbstract