arXiv:2406.17092v1 »Full PDF »Safety backdoor attacks in large language models (LLMs) enable the stealthy
triggering of unsafe behaviors while evading detection during normal
interactions. The high dimensionality of potential triggers in the token space
and the diverse range of malicious behaviors make this a critical challenge. We
present BEEAR, a mitigation approach leveraging the insight that backdoor
triggers induce relatively uniform drifts in the model's embedding space. Our
bi-level optimization method identifies universal embedding perturbations that
elicit unwanted behaviors and adjusts the model parameters to reinforce safe
behaviors against these perturbations. Experiments show BEEAR reduces the
success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0%
for instruction-tuning time backdoors targeting malicious code generation,
without compromising model utility. Requiring only defender-defined safe and
unwanted behaviors, BEEAR represents a step towards practical defenses against
safety backdoors in LLMs, providing a foundation for further advancements in AI
safety and security.Abstract
arXiv:2404.18416v2 »Full PDF »Excellence in a wide variety of medical applications poses considerable
challenges for AI, requiring advanced reasoning, access to up-to-date medical
knowledge and understanding of complex multimodal data. Gemini models, with
strong general capabilities in multimodal and long-context reasoning, offer
exciting possibilities in medicine. Building on these core strengths of Gemini,
we introduce Med-Gemini, a family of highly capable multimodal models that are
specialized in medicine with the ability to seamlessly use web search, and that
can be efficiently tailored to novel modalities using custom encoders. We
evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art
(SoTA) performance on 10 of them, and surpass the GPT-4 model family on every
benchmark where a direct comparison is viable, often by a wide margin. On the
popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves
SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search
strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU
(health & medicine), Med-Gemini improves over GPT-4V by an average relative
margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context
capabilities through SoTA performance on a needle-in-a-haystack retrieval task
from long de-identified health records and medical video question answering,
surpassing prior bespoke methods using only in-context learning. Finally,
Med-Gemini's performance suggests real-world utility by surpassing human
experts on tasks such as medical text summarization, alongside demonstrations
of promising potential for multimodal medical dialogue, medical research and
education. Taken together, our results offer compelling evidence for
Med-Gemini's potential, although further rigorous evaluation will be crucial
before real-world deployment in this safety-critical domain.Abstract
RAI4IoE: Responsible AI for Enabling the Internet of Energy
Accepted to IEEE International Conference on Trust, Privacy and
Security in Intelligent Systems, a...
This paper plans to develop an Equitable and Responsible AI framework with
enabling techniques and algorithms for the Internet of Energy (IoE), in short,
RAI4IoE. The energy sector is going through substantial changes fueled by two
key drivers: building a zero-carbon energy sector and the digital
transformation of the energy infrastructure. We expect to see the convergence
of these two drivers resulting in the IoE, where renewable distributed energy
resources (DERs), such as electric cars, storage batteries, wind turbines and
photovoltaics (PV), can be connected and integrated for reliable energy
distribution by leveraging advanced 5G-6G networks and AI technology. This
allows DER owners as prosumers to participate in the energy market and derive
economic incentives. DERs are inherently asset-driven and face equitable
challenges (i.e., fair, diverse and inclusive). Without equitable access,
privileged individuals, groups and organizations can participate and benefit at
the cost of disadvantaged groups. The real-time management of DER resources not
only brings out the equity problem to the IoE, it also collects highly
sensitive location, time, activity dependent data, which requires to be handled
responsibly (e.g., privacy, security and safety), for AI-enhanced predictions,
optimization and prioritization services, and automated management of flexible
resources. The vision of our project is to ensure equitable participation of
the community members and responsible use of their data in IoE so that it could
reap the benefits of advances in AI to provide safe, reliable and sustainable
energy services.Abstract
DLUE: Benchmarking Document Language Understanding
arXiv:2305.09520v1 »Full PDF »Understanding documents is central to many real-world tasks but remains a
challenging topic. Unfortunately, there is no well-established consensus on how
to comprehensively evaluate document understanding abilities, which
significantly hinders the fair comparison and measuring the progress of the
field. To benchmark document understanding researches, this paper summarizes
four representative abilities, i.e., document classification, document
structural analysis, document information extraction, and document
transcription. Under the new evaluation framework, we propose \textbf{Document
Language Understanding Evaluation} -- \textbf{DLUE}, a new task suite which
covers a wide-range of tasks in various forms, domains and document genres. We
also systematically evaluate six well-established transformer models on DLUE,
and find that due to the lengthy content, complicated underlying structure and
dispersed knowledge, document understanding is still far from being solved, and
currently there is no neural architecture that dominates all tasks, raising
requirements for a universal document understanding architecture.Abstract
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
for Web Agents
Language agents have demonstrated promising capabilities in automating
web-based tasks, though their current reactive approaches still underperform
largely compared to humans. While incorporating advanced planning algorithms,
particularly tree search methods, could enhance these agents' performance,
implementing tree search directly on live websites poses significant safety
risks and practical constraints due to irreversible actions such as confirming
a purchase. In this paper, we introduce a novel paradigm that augments language
agents with model-based planning, pioneering the innovative use of large
language models (LLMs) as world models in complex web environments. Our method,
WebDreamer, builds on the key insight that LLMs inherently encode comprehensive
knowledge about website structures and functionalities. Specifically,
WebDreamer uses LLMs to simulate outcomes for each candidate action (e.g.,
"what would happen if I click this button?") using natural language
descriptions, and then evaluates these imagined outcomes to determine the
optimal action at each step. Empirical results on two representative web agent
benchmarks with online interaction -- VisualWebArena and Mind2Web-live --
demonstrate that WebDreamer achieves substantial improvements over reactive
baselines. By establishing the viability of LLMs as world models in web
environments, this work lays the groundwork for a paradigm shift in automated
web interaction. More broadly, our findings open exciting new avenues for
future research into 1) optimizing LLMs specifically for world modeling in
complex, dynamic environments, and 2) model-based speculative planning for
language agents.Abstract
Integrating Object Detection Modality into Visual Language Model for
Enhanced Autonomous Driving Agent
In this paper, we propose a novel framework for enhancing visual
comprehension in autonomous driving systems by integrating visual language
models (VLMs) with additional visual perception module specialised in object
detection. We extend the Llama-Adapter architecture by incorporating a
YOLOS-based detection network alongside the CLIP perception network, addressing
limitations in object detection and localisation. Our approach introduces
camera ID-separators to improve multi-view processing, crucial for
comprehensive environmental awareness. Experiments on the DriveLM visual
question answering challenge demonstrate significant improvements over baseline
models, with enhanced performance in ChatGPT scores, BLEU scores, and CIDEr
metrics, indicating closeness of model answer to ground truth. Our method
represents a promising step towards more capable and interpretable autonomous
driving systems. Possible safety enhancement enabled by detection modality is
also discussed.Abstract
Trusting Your AI Agent Emotionally and Cognitively: Development and
Validation of a Semantic Differential Scale for AI Trust
arXiv:2408.05354v2 »Full PDF »Trust is not just a cognitive issue but also an emotional one, yet the
research in human-AI interactions has primarily focused on the cognitive route
of trust development. Recent work has highlighted the importance of studying
affective trust towards AI, especially in the context of emerging human-like
LLMs-powered conversational agents. However, there is a lack of validated and
generalizable measures for the two-dimensional construct of trust in AI agents.
To address this gap, we developed and validated a set of 27-item semantic
differential scales for affective and cognitive trust through a scenario-based
survey study. We then further validated and applied the scale through an
experiment study. Our empirical findings showed how the emotional and cognitive
aspects of trust interact with each other and collectively shape a person's
overall trust in AI agents. Our study methodology and findings also provide
insights into the capability of the state-of-art LLMs to foster trust through
different routes.Abstract
From Word Vectors to Multimodal Embeddings: Techniques, Applications,
and Future Directions For Large Language Models
Word embeddings and language models have transformed natural language
processing (NLP) by facilitating the representation of linguistic elements in
continuous vector spaces. This review visits foundational concepts such as the
distributional hypothesis and contextual similarity, tracing the evolution from
sparse representations like one-hot encoding to dense embeddings including
Word2Vec, GloVe, and fastText. We examine both static and contextualized
embeddings, underscoring advancements in models such as ELMo, BERT, and GPT and
their adaptations for cross-lingual and personalized applications. The
discussion extends to sentence and document embeddings, covering aggregation
methods and generative topic models, along with the application of embeddings
in multimodal domains, including vision, robotics, and cognitive science.
Advanced topics such as model compression, interpretability, numerical
encoding, and bias mitigation are analyzed, addressing both technical
challenges and ethical implications. Additionally, we identify future research
directions, emphasizing the need for scalable training techniques, enhanced
interpretability, and robust grounding in non-textual modalities. By
synthesizing current methodologies and emerging trends, this survey offers
researchers and practitioners an in-depth resource to push the boundaries of
embedding-based language models.Abstract
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable
Safety Detectors
19 pages. Camera ready version of EMNLP 2024 Findings
The safety of Large Language Models (LLMs) has gained increasing attention in
recent years, but there still lacks a comprehensive approach for detecting
safety issues within LLMs' responses in an aligned, customizable and
explainable manner. In this paper, we propose ShieldLM, an LLM-based safety
detector, which aligns with common safety standards, supports customizable
detection rules, and provides explanations for its decisions. To train
ShieldLM, we compile a large bilingual dataset comprising 14,387 query-response
pairs, annotating the safety of responses based on various safety standards.
Through extensive experiments, we demonstrate that ShieldLM surpasses strong
baselines across four test sets, showcasing remarkable customizability and
explainability. Besides performing well on standard detection datasets,
ShieldLM has also been shown to be effective as a safety evaluator for advanced
LLMs. ShieldLM is released at \url{https://github.com/thu-coai/ShieldLM} to
support accurate and explainable safety detection under various safety
standards.Abstract
Modeling Uncertainty in 3D Gaussian Splatting through Continuous
Semantic Splatting
arXiv:2411.02547v1 »Full PDF »In this paper, we present a novel algorithm for probabilistically updating
and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although
previous methods have introduced algorithms which learn to rasterize features
in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which
presents a challenge for safety-critical robotic applications. To address this
gap, we propose a method which advances the literature of continuous semantic
mapping from voxels to ellipsoids, combining the precise structure of 3D-GS
with the ability to quantify uncertainty of probabilistic robotic maps. Given a
set of images, our algorithm performs a probabilistic semantic update directly
on the 3D ellipsoids to obtain an expectation and variance through the use of
conjugate priors. We also propose a probabilistic rasterization which returns
per-pixel segmentation predictions with quantifiable uncertainty. We compare
our method with similar probabilistic voxel-based methods to verify our
extension to 3D ellipsoids, and perform ablation studies on uncertainty
quantification and temporal smoothing.Abstract