arXiv:2309.14258v1 »Full PDF »Event understanding aims at understanding the content and relationship of
events within texts, which covers multiple complicated information extraction
tasks: event detection, event argument extraction, and event relation
extraction. To facilitate related research and application, we present an event
understanding toolkit OmniEvent, which features three desiderata: (1)
Comprehensive. OmniEvent supports mainstream modeling paradigms of all the
event understanding tasks and the processing of 15 widely-used English and
Chinese datasets. (2) Fair. OmniEvent carefully handles the inconspicuous
evaluation pitfalls reported in Peng et al. (2023), which ensures fair
comparisons between different models. (3) Easy-to-use. OmniEvent is designed to
be easily used by users with varying needs. We provide off-the-shelf models
that can be directly deployed as web services. The modular framework also
enables users to easily implement and evaluate new event understanding models
with OmniEvent. The toolkit (https://github.com/THU-KEG/OmniEvent) is publicly
released along with the demonstration website and video
(https://omnievent.xlore.cn/).Abstract
Scaling Diffusion Language Models via Adaptation from Autoregressive
Models
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for
text generative modeling, potentially addressing limitations of autoregressive
(AR) models. However, current DLMs have been studied at a smaller scale
compared to their AR counterparts and lack fair comparison on language modeling
benchmarks. Additionally, training diffusion models from scratch at scale
remains challenging. Given the prevalence of open-source AR language models, we
propose adapting these models to build text diffusion models. We demonstrate
connections between AR and diffusion modeling objectives and introduce a simple
continual pre-training approach for training diffusion models. Through
systematic evaluation on language modeling, reasoning, and commonsense
benchmarks, we show that we can convert AR models ranging from 127M to 7B
parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA,
using less than 200B tokens for training. Our experimental results reveal that
these models outperform earlier DLMs and are competitive with their AR
counterparts. We release a suite of DLMs (with 127M, 355M, and 7B parameters)
capable of generating fluent text, performing in-context learning, filling in
the middle without prompt re-ordering, and following instructions
\url{https://github.com/HKUNLP/DiffuLLaMA}.Abstract
Privacy in Large Language Models: Attacks, Defenses and Future
Directions
We upload the survey to cover more recent papers and inlcude privacy
resaearch on multi-modality
The advancement of large language models (LLMs) has significantly enhanced
the ability to effectively tackle various downstream NLP tasks and unify these
tasks into generative pipelines. On the one hand, powerful language models,
trained on massive textual data, have brought unparalleled accessibility and
usability for both models and users. On the other hand, unrestricted access to
these models can also introduce potential malicious and unintentional privacy
risks. Despite ongoing efforts to address the safety and privacy concerns
associated with LLMs, the problem remains unresolved. In this paper, we provide
a comprehensive analysis of the current privacy attacks targeting LLMs and
categorize them according to the adversary's assumed capabilities to shed light
on the potential vulnerabilities present in LLMs. Then, we present a detailed
overview of prominent defense strategies that have been developed to counter
these privacy attacks. Beyond existing works, we identify upcoming privacy
concerns as LLMs evolve. Lastly, we point out several potential avenues for
future exploration.Abstract
Evaluation of waterway lock service quality in Yangtze Delta: from the
perspectives of customer and supplier
arXiv:2410.07132v1 »Full PDF »In recent decades, the waterway locks in the Yangtze Delta, China, have
become major traffic bottlenecks. To gain a comprehensive understanding of the
crew's perspectives and primary concerns regarding lock services during vessel
lockage, and to enhance customer satisfaction and improve vessel lockage
efficiency, it is necessary to assess the waterway lock service quality (WLSQ).
This paper presents an evaluation system for WLSQ from various stakeholders'
viewpoints. Firstly, by employing questionnaire surveys and the structural
equation model method, in conjunction with factor analysis, the WLSQ and its
influencing factors in the Yangtze River Delta region are analyzed from a
customer perspective. Secondly, the Analytic Hierarchy Process method is
utilized, along with a dedicated questionnaire for service suppliers, to
examine their concerns regarding the performance of vessel lock services. The
findings indicate that there exists a cognitive bias towards factors
influencing the WLSQ. Crew members express the greatest concern over vessel
lockage delays, whereas vessel lockage safety is the primary concern for
management department administrators. Furthermore, enhancing the supporting
facilities of waterway locks can significantly increase crew members'
satisfaction during vessel lockage. Improving staff skills, and safety
conditions can also greatly enhance customers' tolerance for lockage delays.
The results of this study will provide valuable insights for the lock
management department, operators, and the government in formulating relevant
policies to improve WLSQ and implementing ongoing service quality evaluations.Abstract
Adaptive Differentially Private Structural Entropy Minimization for
Unsupervised Social Event Detection
Social event detection refers to extracting relevant message clusters from
social media data streams to represent specific events in the real world.
Social event detection is important in numerous areas, such as opinion
analysis, social safety, and decision-making. Most current methods are
supervised and require access to large amounts of data. These methods need
prior knowledge of the events and carry a high risk of leaking sensitive
information in the messages, making them less applicable in open-world
settings. Therefore, conducting unsupervised detection while fully utilizing
the rich information in the messages and protecting data privacy remains a
significant challenge. To this end, we propose a novel social event detection
framework, ADP-SEMEvent, an unsupervised social event detection method that
prioritizes privacy. Specifically, ADP-SEMEvent is divided into two stages,
i.e., the construction stage of the private message graph and the clustering
stage of the private message graph. In the first stage, an adaptive
differential privacy approach is used to construct a private message graph. In
this process, our method can adaptively apply differential privacy based on the
events occurring each day in an open environment to maximize the use of the
privacy budget. In the second stage, to address the reduction in data utility
caused by noise, a novel 2-dimensional structural entropy minimization
algorithm based on optimal subgraphs is used to detect events in the message
graph. The highlight of this process is unsupervised and does not compromise
differential privacy. Extensive experiments on two public datasets demonstrate
that ADP-SEMEvent can achieve detection performance comparable to
state-of-the-art methods while maintaining reasonable privacy budget
parameters.Abstract
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
The unprecedented performance of large language models (LLMs) necessitates
improvements in evaluations. Rather than merely exploring the breadth of LLM
abilities, we believe meticulous and thoughtful designs are essential to
thorough, unbiased, and applicable evaluations. Given the importance of world
knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark
(KoLA), in which we carefully design three crucial factors: (1) For
\textbf{ability modeling}, we mimic human cognition to form a four-level
taxonomy of knowledge-related abilities, covering 19 tasks. (2) For
\textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus
prevalently pre-trained by LLMs, along with continuously collected emerging
corpora, aiming to evaluate the capacity to handle unseen data and evolving
knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system,
including overall standard scores for better numerical comparability across
tasks and models and a unique self-contrast metric for automatically evaluating
knowledge-creating ability. We evaluate 28 open-source and commercial LLMs
and obtain some intriguing findings. The KoLA dataset and open-participation
leaderboard are publicly released at https://kola.xlore.cn and will be
continuously updated to provide references for developing LLMs and
knowledge-related systems.Abstract
Simulating the Integration of Urban Air Mobility into Existing
Transportation Systems: A Survey
arXiv:2301.12901v4 »Full PDF »Urban air mobility (UAM) has the potential to revolutionize transportation in
metropolitan areas, providing a new mode of transportation that could alleviate
congestion and improve accessibility. However, the integration of UAM into
existing transportation systems is a complex task that requires a thorough
understanding of its impact on traffic flow and capacity. In this paper, we
conduct a survey to investigate the current state of research on UAM in
metropolitan-scale traffic using simulation techniques. We identify key
challenges and opportunities for the integration of UAM into urban
transportation systems, including impacts on existing traffic patterns and
congestion; safety analysis and risk assessment; potential economic and
environmental benefits; and the development of shared infrastructure and routes
for UAM and ground-based transportation. We also discuss the potential benefits
of UAM, such as reduced travel times and improved accessibility for underserved
areas. Our survey provides a comprehensive overview of the current state of
research on UAM in metropolitan-scale traffic using simulation and highlights
key areas for future research and development.Abstract
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph
Learning
The Thirty-eight Conference on Neural Information Processing Systems
Datasets and Benchmarks Track...
Deep graph learning has gained grand popularity over the past years due to
its versatility and success in representing graph data across a wide range of
domains. However, the pervasive issue of imbalanced graph data distributions,
where certain parts exhibit disproportionally abundant data while others remain
sparse, undermines the efficacy of conventional graph learning algorithms,
leading to biased outcomes. To address this challenge, Imbalanced Graph
Learning (IGL) has garnered substantial attention, enabling more balanced data
distributions and better task performance. Despite the proliferation of IGL
algorithms, the absence of consistent experimental protocols and fair
performance comparisons pose a significant barrier to comprehending
advancements in this field. To bridge this gap, we introduce IGL-Bench, a
foundational comprehensive benchmark for imbalanced graph learning, embarking
on 16 diverse graph datasets and 24 distinct IGL algorithms with uniform data
processing and splitting strategies. Specifically, IGL-Bench systematically
investigates state-of-the-art IGL algorithms in terms of effectiveness,
robustness, and efficiency on node-level and graph-level tasks, with the scope
of class-imbalance and topology-imbalance. Extensive experiments demonstrate
the potential benefits of IGL algorithms on various imbalanced conditions,
offering insights and opportunities in the IGL field. Further, we have
developed an open-sourced and unified package to facilitate reproducible
evaluation and inspire further innovative research, which is available at
https://github.com/RingBDStack/IGL-Bench.Abstract
On Prompt-Driven Safeguarding for Large Language Models
Prepending model inputs with safety prompts is a common practice for
safeguarding large language models (LLMs) against queries with harmful intents.
However, the underlying working mechanisms of safety prompts have not been
unraveled yet, restricting the possibility of automatically optimizing them to
improve LLM safety. In this work, we investigate how LLMs' behavior (i.e.,
complying with or refusing user queries) is affected by safety prompts from the
perspective of model representation. We find that in the representation space,
the input queries are typically moved by safety prompts in a "higher-refusal"
direction, in which models become more prone to refusing to provide assistance,
even when the queries are harmless. On the other hand, LLMs are naturally
capable of distinguishing harmful and harmless queries without safety prompts.
Inspired by these findings, we propose a method for safety prompt optimization,
namely DRO (Directed Representation Optimization). Treating a safety prompt as
continuous, trainable embeddings, DRO learns to move the queries'
representations along or opposite the refusal direction, depending on their
harmfulness. Experiments with eight LLMs on out-of-domain and jailbreak
benchmarks demonstrate that DRO remarkably improves the safeguarding
performance of human-crafted safety prompts, without compromising the models'
general performance.Abstract
CycLight: learning traffic signal cooperation with a cycle-level
strategy
arXiv:2401.08121v1 »Full PDF »This study introduces CycLight, a novel cycle-level deep reinforcement
learning (RL) approach for network-level adaptive traffic signal control
(NATSC) systems. Unlike most traditional RL-based traffic controllers that
focus on step-by-step decision making, CycLight adopts a cycle-level strategy,
optimizing cycle length and splits simultaneously using Parameterized Deep
Q-Networks (PDQN) algorithm. This cycle-level approach effectively reduces the
computational burden associated with frequent data communication, meanwhile
enhancing the practicality and safety of real-world applications. A
decentralized framework is formulated for multi-agent cooperation, while
attention mechanism is integrated to accurately assess the impact of the
surroundings on the current intersection. CycLight is tested in a large
synthetic traffic grid using the microscopic traffic simulation tool, SUMO.
Experimental results not only demonstrate the superiority of CycLight over
other state-of-the-art approaches but also showcase its robustness against
information transmission delays.Abstract