arXiv:2403.17338v2 »Full PDF »Optimal control methods provide solutions to safety-critical problems but
easily become intractable. Control Barrier Functions (CBFs) have emerged as a
popular technique that facilitates their solution by provably guaranteeing
safety, through their forward invariance property, at the expense of some
performance loss. This approach involves defining a performance objective
alongside CBF-based safety constraints that must always be enforced.
Unfortunately, both performance and solution feasibility can be significantly
impacted by two key factors: (i) the selection of the cost function and
associated parameters, and (ii) the calibration of parameters within the
CBF-based constraints, which capture the trade-off between performance and
conservativeness. %as well as infeasibility. To address these challenges, we
propose a Reinforcement Learning (RL)-based Receding Horizon Control (RHC)
approach leveraging Model Predictive Control (MPC) with CBFs (MPC-CBF). In
particular, we parameterize our controller and use bilevel optimization, where
RL is used to learn the optimal parameters while MPC computes the optimal
control input. We validate our method by applying it to the challenging
automated merging control problem for Connected and Automated Vehicles (CAVs)
at conflicting roadways. Results demonstrate improved performance and a
significant reduction in the number of infeasible cases compared to traditional
heuristic approaches used for tuning CBF-based controllers, showcasing the
effectiveness of the proposed method.Abstract
Tangled Program Graphs as an alternative to DRL-based control algorithms
for UAVs
The papers was accepted for the 2024 Signal Processing: Algorithms,
Architectures, Arrangements, a...
Deep reinforcement learning (DRL) is currently the most popular AI-based
approach to autonomous vehicle control. An agent, trained for this purpose in
simulation, can interact with the real environment with a human-level
performance. Despite very good results in terms of selected metrics, this
approach has some significant drawbacks: high computational requirements and
low explainability. Because of that, a DRL-based agent cannot be used in some
control tasks, especially when safety is the key issue. Therefore we propose to
use Tangled Program Graphs (TPGs) as an alternative for deep reinforcement
learning in control-related tasks. In this approach, input signals are
processed by simple programs that are combined in a graph structure. As a
result, TPGs are less computationally demanding and their actions can be
explained based on the graph structure. In this paper, we present our studies
on the use of TPGs as an alternative for DRL in control-related tasks. In
particular, we consider the problem of navigating an unmanned aerial vehicle
(UAV) through the unknown environment based solely on the on-board LiDAR
sensor. The results of our work show promising prospects for the use of TPGs in
control related-tasks.Abstract
A Barrier Certificate-based Simplex Architecture for Systems with
Approximate and Hybrid Dynamics
This version includes the following new contributions. (1) We extend
Bb-Simplex to hybrid systems ...
We present Barrier-based Simplex (Bb-Simplex), a new, provably correct design
for runtime assurance of continuous dynamical systems. Bb-Simplex is centered
around the Simplex control architecture, which consists of a high-performance
advanced controller that is not guaranteed to maintain safety of the plant, a
verified-safe baseline controller, and a decision module that switches control
of the plant between the two controllers to ensure safety without sacrificing
performance. In Bb-Simplex, Barrier certificates are used to prove that the
baseline controller ensures safety. Furthermore, Bb-Simplex features a new
automated method for deriving, from the barrier certificate, the conditions for
switching between the controllers. Our method is based on the Taylor expansion
of the barrier certificate and yields computationally inexpensive switching
conditions.
We also propose extensions to Bb-Simplex to enable its use in hybrid systems,
which have multiple modes each with its own dynamics, and to support its use
when only approximate dynamics (not exact dynamics) are available, for both
continuous-time and hybrid dynamical systems.
We consider significant applications of Bb-Simplex to microgrids featuring
advanced controllers in the form of neural networks trained using reinforcement
learning. These microgrids are modeled in RTDS, an industry-standard
high-fidelity, real-time power systems simulator. Our results demonstrate that
Bb-Simplex can automatically derive switching conditions for complex
continuous-time and hybrid systems, the switching conditions are not overly
conservative, and Bb-Simplex ensures safety even in the presence of adversarial
attacks on the neural controller when only approximate dynamics (with an error
bound) are available.Abstract
A Comparative Study of Deep Reinforcement Learning for Crop Production
Management
Crop production management is essential for optimizing yield and minimizing a
field's environmental impact to crop fields, yet it remains challenging due to
the complex and stochastic processes involved. Recently, researchers have
turned to machine learning to address these complexities. Specifically,
reinforcement learning (RL), a cutting-edge approach designed to learn optimal
decision-making strategies through trial and error in dynamic environments, has
emerged as a promising tool for developing adaptive crop management policies.
RL models aim to optimize long-term rewards by continuously interacting with
the environment, making them well-suited for tackling the uncertainties and
variability inherent in crop management. Studies have shown that RL can
generate crop management policies that compete with, and even outperform,
expert-designed policies within simulation-based crop models. In the gym-DSSAT
crop model environment, one of the most widely used simulators for crop
management, proximal policy optimization (PPO) and deep Q-networks (DQN) have
shown promising results. However, these methods have not yet been
systematically evaluated under identical conditions. In this study, we
evaluated PPO and DQN against static baseline policies across three different
RL tasks, fertilization, irrigation, and mixed management, provided by the
gym-DSSAT environment. To ensure a fair comparison, we used consistent default
parameters, identical reward functions, and the same environment settings. Our
results indicate that PPO outperforms DQN in fertilization and irrigation
tasks, while DQN excels in the mixed management task. This comparative analysis
provides critical insights into the strengths and limitations of each approach,
advancing the development of more effective RL-based crop management
strategies.Abstract
Contraction Theory for Nonlinear Stability Analysis and Learning-based
Control: A Tutorial Overview
Contraction theory is an analytical tool to study differential dynamics of a
non-autonomous (i.e., time-varying) nonlinear system under a contraction metric
defined with a uniformly positive definite matrix, the existence of which
results in a necessary and sufficient characterization of incremental
exponential stability of multiple solution trajectories with respect to each
other. By using a squared differential length as a Lyapunov-like function, its
nonlinear stability analysis boils down to finding a suitable contraction
metric that satisfies a stability condition expressed as a linear matrix
inequality, indicating that many parallels can be drawn between well-known
linear systems theory and contraction theory for nonlinear systems.
Furthermore, contraction theory takes advantage of a superior robustness
property of exponential stability used in conjunction with the comparison
lemma. This yields much-needed safety and stability guarantees for neural
network-based control and estimation schemes, without resorting to a more
involved method of using uniform asymptotic stability for input-to-state
stability. Such distinctive features permit systematic construction of a
contraction metric via convex optimization, thereby obtaining an explicit
exponential bound on the distance between a time-varying target trajectory and
solution trajectories perturbed externally due to disturbances and learning
errors. The objective of this paper is therefore to present a tutorial overview
of contraction theory and its advantages in nonlinear stability analysis of
deterministic and stochastic systems, with an emphasis on deriving formal
robustness and stability guarantees for various learning-based and data-driven
automatic control methods. In particular, we provide a detailed review of
techniques for finding contraction metrics and associated control and
estimation laws using deep neural networks.Abstract
The Future of Intelligent Healthcare: A Systematic Analysis and
Discussion on the Integration and Impact of Robots Using Large Language
Models for Healthcare
arXiv:2411.03287v1 »Full PDF »The potential use of large language models (LLMs) in healthcare robotics can
help address the significant demand put on healthcare systems around the world
with respect to an aging demographic and a shortage of healthcare
professionals. Even though LLMs have already been integrated into medicine to
assist both clinicians and patients, the integration of LLMs within healthcare
robots has not yet been explored for clinical settings. In this perspective
paper, we investigate the groundbreaking developments in robotics and LLMs to
uniquely identify the needed system requirements for designing health specific
LLM based robots in terms of multi modal communication through human robot
interactions (HRIs), semantic reasoning, and task planning. Furthermore, we
discuss the ethical issues, open challenges, and potential future research
directions for this emerging innovative field.Abstract
Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using
Knowledge Distillation and In-Context Adaptation
arXiv:2411.02975v1 »Full PDF »This study presents a transformer-based approach for fault-tolerant control
in fixed-wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time
to dynamic changes caused by structural damage or actuator failures. Unlike
traditional Flight Control Systems (FCSs) that rely on classical control theory
and struggle under severe alterations in dynamics, our method directly maps
outer-loop reference values -- altitude, heading, and airspeed -- into control
commands using the in-context learning and attention mechanisms of
transformers, thus bypassing inner-loop controllers and fault-detection layers.
Employing a teacher-student knowledge distillation framework, the proposed
approach trains a student agent with partial observations by transferring
knowledge from a privileged expert agent with full observability, enabling
robust performance across diverse failure scenarios. Experimental results
demonstrate that our transformer-based controller outperforms industry-standard
FCS and state-of-the-art reinforcement learning (RL) methods, maintaining high
tracking accuracy and stability in nominal conditions and extreme failure
cases, highlighting its potential for enhancing UAV operational safety and
reliability.Abstract
Embedding Safety into RL: A New Take on Trust Region Methods
arXiv:2411.02957v1 »Full PDF »Reinforcement Learning (RL) agents are able to solve a wide variety of tasks
but are prone to producing unsafe behaviors. Constrained Markov Decision
Processes (CMDPs) provide a popular framework for incorporating safety
constraints. However, common solution methods often compromise reward
maximization by being overly conservative or allow unsafe behavior during
training. We propose Constrained Trust Region Policy Optimization (C-TRPO), a
novel approach that modifies the geometry of the policy space based on the
safety constraints and yields trust regions composed exclusively of safe
policies, ensuring constraint satisfaction throughout training. We
theoretically study the convergence and update properties of C-TRPO and
highlight connections to TRPO, Natural Policy Gradient (NPG), and Constrained
Policy Optimization (CPO). Finally, we demonstrate experimentally that C-TRPO
significantly reduces constraint violations while achieving competitive reward
maximization compared to state-of-the-art CMDP algorithms.Abstract
Excluding the Irrelevant: Focusing Reinforcement Learning through
Continuous Action Masking
arXiv:2406.03704v2 »Full PDF »Continuous action spaces in reinforcement learning (RL) are commonly defined
as multidimensional intervals. While intervals usually reflect the action
boundaries for tasks well, they can be challenging for learning because the
typically large global action space leads to frequent exploration of irrelevant
actions. Yet, little task knowledge can be sufficient to identify significantly
smaller state-specific sets of relevant actions. Focusing learning on these
relevant actions can significantly improve training efficiency and
effectiveness. In this paper, we propose to focus learning on the set of
relevant actions and introduce three continuous action masking methods for
exactly mapping the action space to the state-dependent set of relevant
actions. Thus, our methods ensure that only relevant actions are executed,
enhancing the predictability of the RL agent and enabling its use in
safety-critical applications. We further derive the implications of the
proposed methods on the policy gradient. Using proximal policy optimization
(PPO), we evaluate our methods on four control tasks, where the relevant action
set is computed based on the system dynamics and a relevant state set. Our
experiments show that the three action masking methods achieve higher final
rewards and converge faster than the baseline without action masking.Abstract
Towards safe Bayesian optimization with Wiener kernel regression
arXiv:2411.02253v1 »Full PDF »Bayesian Optimization (BO) is a data-driven strategy for
minimizing/maximizing black-box functions based on probabilistic surrogate
models. In the presence of safety constraints, the performance of BO crucially
relies on tight probabilistic error bounds related to the uncertainty
surrounding the surrogate model. For the case of Gaussian Process surrogates
and Gaussian measurement noise, we present a novel error bound based on the
recently proposed Wiener kernel regression. We prove that under rather mild
assumptions, the proposed error bound is tighter than bounds previously
documented in the literature which leads to enlarged safety regions. We draw
upon a numerical example to demonstrate the efficacy of the proposed error
bound in safe BO.Abstract