arXiv:2401.00588v2 »Full PDF »High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide
range of requests from short chat conversations to long document reading. To
ensure that all client requests are processed fairly, most major LLM inference
services have request rate limits, to ensure that no client can dominate the
request queue. However, this rudimentary notion of fairness also results in
under-utilization of the resources and poor client experience when there is
spare capacity. While there is a rich literature on fair scheduling, serving
LLMs presents new challenges due to their unpredictable request lengths and
their unique batching characteristics on parallel accelerators. This paper
introduces the definition of LLM serving fairness based on a cost function that
accounts for the number of input and output tokens processed. To achieve
fairness in serving, we propose a novel scheduling algorithm, the Virtual Token
Counter (VTC), a fair scheduler based on the continuous batching mechanism. We
prove a 2x tight upper bound on the service difference between two backlogged
clients, adhering to the requirement of work-conserving. Through extensive
experiments, we demonstrate the superior performance of VTC in ensuring
fairness, especially in contrast to other baseline methods, which exhibit
shortcomings under various conditions. The reproducible code is available at
https://github.com/Ying1123/VTC-artifactAbstract
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
arXiv:2309.11998v4 »Full PDF »Studying how people interact with large language models (LLMs) in real-world
scenarios is increasingly important due to their widespread use in various
applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset
containing one million real-world conversations with 25 state-of-the-art LLMs.
This dataset is collected from 210K unique IP addresses in the wild on our
Vicuna demo and Chatbot Arena website. We offer an overview of the dataset's
content, including its curation process, basic statistics, and topic
distribution, highlighting its diversity, originality, and scale. We
demonstrate its versatility through four use cases: developing content
moderation models that perform similarly to GPT-4, building a safety benchmark,
training instruction-following models that perform similarly to Vicuna, and
creating challenging benchmark questions. We believe that this dataset will
serve as a valuable resource for understanding and advancing LLM capabilities.
The dataset is publicly available at
https://huggingface.co/datasets/lmsys/lmsys-chat-1m.Abstract
Learning Competitive Equilibria in Exchange Economies with Bandit
Feedback
Proceedings of The 25nd International Conference on Artificial
Intelligence and Statistics; 32 pag...
The sharing of scarce resources among multiple rational agents is one of the
classical problems in economics. In exchange economies, which are used to model
such situations, agents begin with an initial endowment of resources and
exchange them in a way that is mutually beneficial until they reach a
competitive equilibrium (CE). The allocations at a CE are Pareto efficient and
fair. Consequently, they are used widely in designing mechanisms for fair
division. However, computing CEs requires the knowledge of agent preferences
which are unknown in several applications of interest. In this work, we explore
a new online learning mechanism, which, on each round, allocates resources to
the agents and collects stochastic feedback on their experience in using that
allocation. Its goal is to learn the agent utilities via this feedback and
imitate the allocations at a CE in the long run. We quantify CE behavior via
two losses and propose a randomized algorithm which achieves sublinear loss
under a parametric class of utilities. Empirically, we demonstrate the
effectiveness of this mechanism through numerical simulations.Abstract
On Guiding Visual Attention with Language Specification
While real world challenges typically define visual categories with language
words or phrases, most visual classification methods define categories with
numerical indices. However, the language specification of the classes provides
an especially useful prior for biased and noisy datasets, where it can help
disambiguate what features are task-relevant. Recently, large-scale multimodal
models have been shown to recognize a wide variety of high-level concepts from
a language specification even without additional image training data, but they
are often unable to distinguish classes for more fine-grained tasks. CNNs, in
contrast, can extract subtle image features that are required for fine-grained
discrimination, but will overfit to any bias or noise in datasets. Our insight
is to use high-level language specification as advice for constraining the
classification evidence to task-relevant features, instead of distractors. To
do this, we ground task-relevant words or phrases with attention maps from a
pretrained large-scale model. We then use this grounding to supervise a
classifier's spatial attention away from distracting context. We show that
supervising spatial attention in this way improves performance on
classification tasks with biased and noisy data, including about 3-15%
worst-group accuracy improvements and 41-45% relative improvements on fairness
metrics.Abstract
LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of
Sparse Reward Iterative Tasks
Conference on Robot Learning (CoRL) 2021. First two authors
contributed equally
Reinforcement learning (RL) has shown impressive success in exploring
high-dimensional environments to learn complex tasks, but can often exhibit
unsafe behaviors and require extensive environment interaction when exploration
is unconstrained. A promising strategy for learning in dynamically uncertain
environments is requiring that the agent can robustly return to learned safe
sets, where task success (and therefore safety) can be guaranteed. While this
approach has been successful in low-dimensions, enforcing this constraint in
environments with visual observations is exceedingly challenging. We present a
novel continuous representation for safe sets by framing it as a binary
classification problem in a learned latent space, which flexibly scales to
image observations. We then present a new algorithm, Latent Space Safe Sets
(LS3), which uses this representation for long-horizon tasks with sparse
rewards. We evaluate LS3 on 4 domains, including a challenging sequential
pushing task in simulation and a physical cable routing task. We find that LS3
can use prior task successes to restrict exploration and learn more efficiently
than prior algorithms while satisfying constraints. See
https://tinyurl.com/latent-ss for code and supplementary material.Abstract
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones
RA-L and ICRA 2021. First two authors contributed equally
Safety remains a central obstacle preventing widespread use of RL in the real
world: learning new tasks in uncertain environments requires extensive
exploration, but safety requires limiting exploration. We propose Recovery RL,
an algorithm which navigates this tradeoff by (1) leveraging offline data to
learn about constraint violating zones before policy learning and (2)
separating the goals of improving task performance and constraint satisfaction
across two policies: a task policy that only optimizes the task reward and a
recovery policy that guides the agent to safety when constraint violation is
likely. We evaluate Recovery RL on 6 simulation domains, including two
contact-rich manipulation tasks and an image-based navigation task, and an
image-based obstacle avoidance task on a physical robot. We compare Recovery RL
to 5 prior safe RL methods which jointly optimize for task performance and
safety via constrained optimization or reward shaping and find that Recovery RL
outperforms the next best prior method across all domains. Results suggest that
Recovery RL trades off constraint violations and task successes 2 - 20 times
more efficiently in simulation domains and 3 times more efficiently in physical
experiments. See https://tinyurl.com/rl-recovery for videos and supplementary
material.Abstract
arXiv:2012.08648v1 »Full PDF »We describe mechanisms for the allocation of a scarce resource among multiple
users in a way that is efficient, fair, and strategy-proof, but when users do
not know their resource requirements. The mechanism is repeated for multiple
rounds and a user's requirements can change on each round. At the end of each
round, users provide feedback about the allocation they received, enabling the
mechanism to learn user preferences over time. Such situations are common in
the shared usage of a compute cluster among many users in an organisation,
where all teams may not precisely know the amount of resources needed to
execute their jobs. By understating their requirements, users will receive less
than they need and consequently not achieve their goals. By overstating them,
they may siphon away precious resources that could be useful to others in the
organisation. We formalise this task of online learning in fair division via
notions of efficiency, fairness, and strategy-proofness applicable to this
setting, and study this problem under three types of feedback: when the users'
observations are deterministic, when they are stochastic and follow a
parametric model, and when they are stochastic and nonparametric. We derive
mechanisms inspired by the classical max-min fairness procedure that achieve
these requisites, and quantify the extent to which they are achieved via
asymptotic rates. We corroborate these insights with an experimental evaluation
on synthetic problems and a web-serving task.Abstract
Safety Augmented Value Estimation from Demonstrations (SAVED): Safe Deep
Model-Based RL for Sparse Cost Robotic Tasks
Robotics and Automation Letters and International Conference on
Robotics and Automation 2020. Firs...
Reinforcement learning (RL) for robotics is challenging due to the difficulty
in hand-engineering a dense cost function, which can lead to unintended
behavior, and dynamical uncertainty, which makes exploration and constraint
satisfaction challenging. We address these issues with a new model-based
reinforcement learning algorithm, Safety Augmented Value Estimation from
Demonstrations (SAVED), which uses supervision that only identifies task
completion and a modest set of suboptimal demonstrations to constrain
exploration and learn efficiently while handling complex constraints. We then
compare SAVED with 3 state-of-the-art model-based and model-free RL algorithms
on 6 standard simulation benchmarks involving navigation and manipulation and a
physical knot-tying task on the da Vinci surgical robot. Results suggest that
SAVED outperforms prior methods in terms of success rate, constraint
satisfaction, and sample efficiency, making it feasible to safely learn a
control policy directly on a real robot in less than an hour. For tasks on the
robot, baselines succeed less than 5% of the time while SAVED has a success
rate of over 75% in the first 50 training iterations. Code and supplementary
material is available at https://tinyurl.com/saved-rl.Abstract
Fair Algorithms for Infinite and Contextual Bandits
arXiv:1610.09559v4 »Full PDF »We study fairness in linear bandit problems. Starting from the notion of
meritocratic fairness introduced in Joseph et al. [2016], we carry out a more
refined analysis of a more general problem, achieving better performance
guarantees with fewer modelling assumptions on the number and structure of
available choices as well as the number selected. We also analyze the
previously-unstudied question of fairness in infinite linear bandit problems,
obtaining instance-dependent regret upper bounds as well as lower bounds
demonstrating that this instance-dependence is necessary. The result is a
framework for meritocratic fairness in an online linear setting that is
substantially more powerful, general, and realistic than the current state of
the art.Abstract
Ultra-marginal Feature Importance: Learning from Data with Causal
Guarantees
arXiv:2204.09938v5 »Full PDF »Scientists frequently prioritize learning from data rather than training the
best possible model; however, research in machine learning often prioritizes
the latter. Marginal contribution feature importance (MCI) was developed to
break this trend by providing a useful framework for quantifying the
relationships in data. In this work, we aim to improve upon the theoretical
properties, performance, and runtime of MCI by introducing ultra-marginal
feature importance (UMFI), which uses dependence removal techniques from the AI
fairness literature as its foundation. We first propose axioms for feature
importance methods that seek to explain the causal and associative
relationships in data, and we prove that UMFI satisfies these axioms under
basic assumptions. We then show on real and simulated data that UMFI performs
better than MCI, especially in the presence of correlated interactions and
unrelated features, while partially learning the structure of the causal graph
and reducing the exponential runtime of MCI to super-linear.Abstract