Accepted for publication in ISPRS Journal of Photogrammetry and
Remote Sensing
We present PolyGNN, a polyhedron-based graph neural network for 3D building
reconstruction from point clouds. PolyGNN learns to assemble primitives
obtained by polyhedral decomposition via graph node classification, achieving a
watertight and compact reconstruction. To effectively represent
arbitrary-shaped polyhedra in the neural network, we propose a skeleton-based
sampling strategy to generate polyhedron-wise queries. These queries are then
incorporated with inter-polyhedron adjacency to enhance the classification.
PolyGNN is end-to-end optimizable and is designed to accommodate variable-size
input points, polyhedra, and queries with an index-driven batching technique.
To address the abstraction gap between existing city-building models and the
underlying instances, and provide a fair evaluation of the proposed method, we
develop our method on a large-scale synthetic dataset with well-defined ground
truths of polyhedral labels. We further conduct a transferability analysis
across cities and on real-world point clouds. Both qualitative and quantitative
results demonstrate the effectiveness of our method, particularly its
efficiency for large-scale reconstructions. The source code and data are
available at https://github.com/chenzhaiyu/polygnn.Abstract
BehaviorGPT: Smart Agent Simulation for Autonomous Driving with
Next-Patch Prediction
Simulating realistic behaviors of traffic agents is pivotal for efficiently
validating the safety of autonomous driving systems. Existing data-driven
simulators primarily use an encoder-decoder architecture to encode the
historical trajectories before decoding the future. However, the heterogeneity
between encoders and decoders complicates the models, and the manual separation
of historical and future trajectories leads to low data utilization. Given
these limitations, we propose BehaviorGPT, a homogeneous and fully
autoregressive Transformer designed to simulate the sequential behavior of
multiple agents. Crucially, our approach discards the traditional separation
between "history" and "future" by modeling each time step as the "current" one
for motion generation, leading to a simpler, more parameter- and data-efficient
agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3)
to mitigate the negative effects of autoregressive modeling, in which models
are trained to reason at the patch level of trajectories and capture long-range
spatial-temporal interactions. Despite having merely 3M model parameters,
BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a
realism score of 0.7473 and a minADE score of 1.4147, demonstrating its
exceptional performance in traffic agent simulation.Abstract
When AI Eats Itself: On the Caveats of AI Autophagy
arXiv:2405.09597v3 »Full PDF »Generative Artificial Intelligence (AI) technologies and large models are
producing realistic outputs across various domains, such as images, text,
speech, and music. Creating these advanced generative models requires
significant resources, particularly large and high-quality datasets. To
minimise training expenses, many algorithm developers use data created by the
models themselves as a cost-effective training solution. However, not all
synthetic data effectively improve model performance, necessitating a strategic
balance in the use of real versus synthetic data to optimise outcomes.
Currently, the previously well-controlled integration of real and synthetic
data is becoming uncontrollable. The widespread and unregulated dissemination
of synthetic data online leads to the contamination of datasets traditionally
compiled through web scraping, now mixed with unlabeled synthetic data. This
trend, known as the AI autophagy phenomenon, suggests a future where generative
AI systems may increasingly consume their own outputs without discernment,
raising concerns about model performance, reliability, and ethical
implications. What will happen if generative AI continuously consumes itself
without discernment? What measures can we take to mitigate the potential
adverse effects? To address these research questions, this study examines the
existing literature, delving into the consequences of AI autophagy, analyzing
the associated risks, and exploring strategies to mitigate its impact. Our aim
is to provide a comprehensive perspective on this phenomenon advocating for a
balanced approach that promotes the sustainable development of generative AI
technologies in the era of large models.Abstract
V2X-Assisted Distributed Computing and Control Framework for Connected
and Automated Vehicles under Ramp Merging Scenario
This paper has been submitted to IEEE Journal. The source code has
been released at:
https://git...
This paper investigates distributed computing and cooperative control of
connected and automated vehicles (CAVs) in ramp merging scenario under
transportation cyber-physical system. Firstly, a centralized cooperative
trajectory planning problem is formulated subject to the safely constraints and
traffic performance in ramp merging scenario, where the trajectories of all
vehicles are jointly optimized. To get rid of the reliance on a central
controller and reduce computation time, a distributed solution to this problem
implemented among CAVs through Vehicles-to-Everything (V2X) communication is
proposed. Unlike existing method, our method can distribute the computational
task among CAVs and carry out parallel solving through V2X communication. Then,
a multi-vehicles model predictive control (MPC) problem aimed at maximizing
system stability and minimizing control input is formulated based on the
solution of the first problem subject to strict safety constants and input
limits. Due to these complex constraints, this problem becomes
high-dimensional, centralized, and non-convex. To solve it in a short time, a
decomposition and convex reformulation method, namely distributed cooperative
iterative model predictive control (DCIMPC), is proposed. This method leverages
the communication capability of CAVs to decompose the problem, making full use
of the computational resources on vehicles to achieve fast solutions and
distributed control. The two above problems with their corresponding solving
methods form the systemic framework of the V2X assisted distributed computing
and control. Simulations have been conducted to evaluate the framework's
convergence, safety, and solving speed. Additionally, extra experiments are
conducted to validate the performance of DCIMPC. The results show that our
method can greatly improve computation speed without sacrificing system
performance.Abstract
Beyond Efficiency: A Systematic Survey of Resource-Efficient Large
Language Models
The burgeoning field of Large Language Models (LLMs), exemplified by
sophisticated models like OpenAI's ChatGPT, represents a significant
advancement in artificial intelligence. These models, however, bring forth
substantial challenges in the high consumption of computational, memory,
energy, and financial resources, especially in environments with limited
resource capabilities. This survey aims to systematically address these
challenges by reviewing a broad spectrum of techniques designed to enhance the
resource efficiency of LLMs. We categorize methods based on their optimization
focus: computational, memory, energy, financial, and network resources and
their applicability across various stages of an LLM's lifecycle, including
architecture design, pretraining, finetuning, and system design. Additionally,
the survey introduces a nuanced categorization of resource efficiency
techniques by their specific resource types, which uncovers the intricate
relationships and mappings between various resources and corresponding
optimization techniques. A standardized set of evaluation metrics and datasets
is also presented to facilitate consistent and fair comparisons across
different models and techniques. By offering a comprehensive overview of the
current sota and identifying open research avenues, this survey serves as a
foundational reference for researchers and practitioners, aiding them in
developing more sustainable and efficient LLMs in a rapidly evolving landscape.Abstract
Class-RAG: Content Moderation with Retrieval Augmented Generation
Robust content moderation classifiers are essential for the safety of
Generative AI systems. Content moderation, or safety classification, is
notoriously ambiguous: differences between safe and unsafe inputs are often
extremely subtle, making it difficult for classifiers (and indeed, even humans)
to properly distinguish violating vs. benign samples without further context or
explanation. Furthermore, as these technologies are deployed across various
applications and audiences, scaling risk discovery and mitigation through
continuous model fine-tuning becomes increasingly challenging and costly. To
address these challenges, we propose a Classification approach employing
Retrieval-Augmented Generation (Class-RAG). Class-RAG extends the capability of
its base LLM through access to a retrieval library which can be dynamically
updated to enable semantic hotfixing for immediate, flexible risk mitigation.
Compared to traditional fine-tuned models, Class-RAG demonstrates flexibility
and transparency in decision-making. As evidenced by empirical studies,
Class-RAG outperforms on classification and is more robust against adversarial
attack. Besides, our findings suggest that Class-RAG performance scales with
retrieval library size, indicating that increasing the library size is a viable
and low-cost approach to improve content moderation.Abstract
oRetrieval Augmented Generation for 10 Large Language Models and its
Generalizability in Assessing Medical Fitness
arXiv admin note: substantial text overlap with arXiv:2402.01733
Large Language Models (LLMs) show potential for medical applications but
often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG)
allows customization with domain-specific information, making it suitable for
healthcare. This study evaluates the accuracy, consistency, and safety of RAG
models in determining fitness for surgery and providing preoperative
instructions. We developed LLM-RAG models using 35 local and 23 international
preoperative guidelines and tested them against human-generated responses. A
total of 3,682 responses were evaluated. Clinical documents were processed
using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were
assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects
of preoperative instructions. Established guidelines and expert judgment were
used to determine correct responses, with human-generated answers serving as
comparisons. The LLM-RAG models generated responses within 20 seconds,
significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model
achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no
hallucinations and producing correct instructions comparable to clinicians.
Results were consistent across both local and international guidelines. This
study demonstrates the potential of LLM-RAG models for preoperative healthcare
tasks, highlighting their efficiency, scalability, and reliability.Abstract
DeMo: Decoupling Motion Forecasting into Directional Intentions and
Dynamic States
Accurate motion forecasting for traffic agents is crucial for ensuring the
safety and efficiency of autonomous driving systems in dynamically changing
environments. Mainstream methods adopt a one-query-one-trajectory paradigm,
where each query corresponds to a unique trajectory for predicting multi-modal
trajectories. While straightforward and effective, the absence of detailed
representation of future trajectories may yield suboptimal outcomes, given that
the agent states dynamically evolve over time. To address this problem, we
introduce DeMo, a framework that decouples multi-modal trajectory queries into
two types: mode queries capturing distinct directional intentions and state
queries tracking the agent's dynamic states over time. By leveraging this
format, we separately optimize the multi-modality and dynamic evolutionary
properties of trajectories. Subsequently, the mode and state queries are
integrated to obtain a comprehensive and detailed representation of the
trajectories. To achieve these operations, we additionally introduce combined
Attention and Mamba techniques for global information aggregation and state
sequence modeling, leveraging their respective strengths. Extensive experiments
on both the Argoverse 2 and nuScenes benchmarks demonstrate that our DeMo
achieves state-of-the-art performance in motion forecasting.Abstract
FairFML: Fair Federated Machine Learning with a Case Study on Reducing
Gender Disparities in Cardiac Arrest Outcome Prediction
arXiv:2410.17269v1 »Full PDF »Objective: Mitigating algorithmic disparities is a critical challenge in
healthcare research, where ensuring equity and fairness is paramount. While
large-scale healthcare data exist across multiple institutions,
cross-institutional collaborations often face privacy constraints, highlighting
the need for privacy-preserving solutions that also promote fairness.
Materials and Methods: In this study, we present Fair Federated Machine
Learning (FairFML), a model-agnostic solution designed to reduce algorithmic
bias in cross-institutional healthcare collaborations while preserving patient
privacy. As a proof of concept, we validated FairFML using a real-world
clinical case study focused on reducing gender disparities in cardiac arrest
outcome prediction.
Results: We demonstrate that the proposed FairFML framework enhances fairness
in federated learning (FL) models without compromising predictive performance.
Our findings show that FairFML improves model fairness by up to 65% compared to
the centralized model, while maintaining performance comparable to both local
and centralized models, as measured by receiver operating characteristic
analysis.
Discussion and Conclusion: FairFML offers a promising and flexible solution
for FL collaborations, with its adaptability allowing seamless integration with
various FL frameworks and models, from traditional statistical methods to deep
learning techniques. This makes FairFML a robust approach for developing fairer
FL models across diverse clinical and biomedical applications.Abstract
SELP: Generating Safe and Efficient Task Plans for Robot Agents with
Large Language Models
arXiv:2409.19471v1 »Full PDF »Despite significant advancements in large language models (LLMs) that enhance
robot agents' understanding and execution of natural language (NL) commands,
ensuring the agents adhere to user-specified constraints remains challenging,
particularly for complex commands and long-horizon tasks. To address this
challenge, we present three key insights, equivalence voting, constrained
decoding, and domain-specific fine-tuning, which significantly enhance LLM
planners' capability in handling complex tasks. Equivalence voting ensures
consistency by generating and sampling multiple Linear Temporal Logic (LTL)
formulas from NL commands, grouping equivalent LTL formulas, and selecting the
majority group of formulas as the final LTL formula. Constrained decoding then
uses the generated LTL formula to enforce the autoregressive inference of
plans, ensuring the generated plans conform to the LTL. Domain-specific
fine-tuning customizes LLMs to produce safe and efficient plans within specific
task domains. Our approach, Safe Efficient LLM Planner (SELP), combines these
insights to create LLM planners to generate plans adhering to user commands
with high confidence. We demonstrate the effectiveness and generalizability of
SELP across different robot agents and tasks, including drone navigation and
robot manipulation. For drone navigation tasks, SELP outperforms
state-of-the-art planners by 10.8% in safety rate (i.e., finishing tasks
conforming to NL commands) and by 19.8% in plan efficiency. For robot
manipulation tasks, SELP achieves 20.4% improvement in safety rate. Our
datasets for evaluating NL-to-LTL and robot task planning will be released in
github.com/lt-asset/selp.Abstract