19 pages. Camera ready version of EMNLP 2024 Findings
The safety of Large Language Models (LLMs) has gained increasing attention in
recent years, but there still lacks a comprehensive approach for detecting
safety issues within LLMs' responses in an aligned, customizable and
explainable manner. In this paper, we propose ShieldLM, an LLM-based safety
detector, which aligns with common safety standards, supports customizable
detection rules, and provides explanations for its decisions. To train
ShieldLM, we compile a large bilingual dataset comprising 14,387 query-response
pairs, annotating the safety of responses based on various safety standards.
Through extensive experiments, we demonstrate that ShieldLM surpasses strong
baselines across four test sets, showcasing remarkable customizability and
explainability. Besides performing well on standard detection datasets,
ShieldLM has also been shown to be effective as a safety evaluator for advanced
LLMs. ShieldLM is released at \url{https://github.com/thu-coai/ShieldLM} to
support accurate and explainable safety detection under various safety
standards.Abstract
Jailbreak Attacks and Defenses Against Large Language Models: A Survey
arXiv:2407.04295v2 »Full PDF »Large Language Models (LLMs) have performed exceptionally in various
text-generative tasks, including question answering, translation, code
completion, etc. However, the over-assistance of LLMs has raised the challenge
of "jailbreaking", which induces the model to generate malicious responses
against the usage policy and society by designing adversarial prompts. With the
emergence of jailbreak attack methods exploiting different vulnerabilities in
LLMs, the corresponding safety alignment measures are also evolving. In this
paper, we propose a comprehensive and detailed taxonomy of jailbreak attack and
defense methods. For instance, the attack methods are divided into black-box
and white-box attacks based on the transparency of the target model. Meanwhile,
we classify defense methods into prompt-level and model-level defenses.
Additionally, we further subdivide these attack and defense methods into
distinct sub-classes and present a coherent diagram illustrating their
relationships. We also conduct an investigation into the current evaluation
methods and compare them from different perspectives. Our findings aim to
inspire future research and practical implementations in safeguarding LLMs
against adversarial attacks. Above all, although jailbreak remains a
significant concern within the community, we believe that our work enhances the
understanding of this domain and provides a foundation for developing more
secure LLMs.Abstract
arXiv:2409.00133v1 »Full PDF »Recent breakthroughs in large language models (LLMs) offer unprecedented
natural language understanding and generation capabilities. However, existing
surveys on LLMs in biomedicine often focus on specific applications or model
architectures, lacking a comprehensive analysis that integrates the latest
advancements across various biomedical domains. This review, based on an
analysis of 484 publications sourced from databases including PubMed, Web of
Science, and arXiv, provides an in-depth examination of the current landscape,
applications, challenges, and prospects of LLMs in biomedicine, distinguishing
itself by focusing on the practical implications of these models in real-world
biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot
learning across a broad spectrum of biomedical tasks, including diagnostic
assistance, drug discovery, and personalized medicine, among others, with
insights drawn from 137 key studies. Then, we discuss adaptation strategies of
LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to
enhance their performance in specialized biomedical contexts where zero-shot
fails to achieve, such as medical question answering and efficient processing
of biomedical literature. Finally, we discuss the challenges that LLMs face in
the biomedicine domain including data privacy concerns, limited model
interpretability, issues with dataset quality, and ethics due to the sensitive
nature of biomedical data, the need for highly reliable model outputs, and the
ethical implications of deploying AI in healthcare. To address these
challenges, we also identify future research directions of LLM in biomedicine
including federated learning methods to preserve data privacy and integrating
explainable AI methodologies to enhance the transparency of LLMs.Abstract
arXiv:2407.21783v2 »Full PDF »Modern artificial intelligence (AI) systems are powered by foundation models.
This paper presents a new set of foundation models, called Llama 3. It is a
herd of language models that natively support multilinguality, coding,
reasoning, and tool usage. Our largest model is a dense Transformer with 405B
parameters and a context window of up to 128K tokens. This paper presents an
extensive empirical evaluation of Llama 3. We find that Llama 3 delivers
comparable quality to leading language models such as GPT-4 on a plethora of
tasks. We publicly release Llama 3, including pre-trained and post-trained
versions of the 405B parameter language model and our Llama Guard 3 model for
input and output safety. The paper also presents the results of experiments in
which we integrate image, video, and speech capabilities into Llama 3 via a
compositional approach. We observe this approach performs competitively with
the state-of-the-art on image, video, and speech recognition tasks. The
resulting models are not yet being broadly released as they are still under
development.Abstract
Lightweight Large Language Model for Medication Enquiry: Med-Pal
arXiv:2407.12822v1 »Full PDF »Large Language Models (LLMs) have emerged as a potential solution to assist
digital health development with patient education, commonly medication-related
enquires. We trained and validated Med-Pal, a medication domain-specific
LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a
selection of five light-weighted open-source LLMs of smaller parameter size (7
billion or less) regarding computational constraints and prioritizing
operational efficiency. A multi-disciplinary team performed a clinical
evaluation of LLMs responses using the SCORE criteria, focusing on safety,
accuracy, bias, reproducibility, and ease of understanding. Best performing
light-weighted LLM was chosen as Med-Pal for further engineering with
guard-railing using adversarial prompting. Med-Pal and existing light-weighted
LLMs, including pretrained Biomistral and finetuned Meerkat, were validated on
an independent dataset on a broad range of medication-related questions (231 in
total), 12 different question types across 14 different medication classes.
Mistral-7b emerged as the top performer among selected lightweight LLMs,
achieving the highest median score of 14 and 71.9% high-quality responses in
accuracy and safety domains, hence chosen as the backbone LLM for Med-Pal. When
compared against Biomistral, Med-pal outperformed in generating responses
appropriate for patient communication, with significant reductions bias and
errors typical of general LLMs. Comparable performance was observed when
comparing Med-Pal with Meerkat. Med-Pal showcases the feasibility of developing
and employing fine-tuned light-weighted LLMs to enhance digital health
communications.Abstract
PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for
Autonomous Driving
Vision-centric occupancy networks, which represent the surrounding
environment with uniform voxels with semantics, have become a new trend for
safe driving of camera-only autonomous driving perception systems, as they are
able to detect obstacles regardless of their shape and occlusion. Modern
occupancy networks mainly focus on reconstructing visible voxels from object
surfaces with voxel-wise semantic prediction. Usually, they suffer from
inconsistent predictions of one object and mixed predictions for adjacent
objects. These confusions may harm the safety of downstream planning modules.
To this end, we investigate panoptic segmentation on 3D voxel scenarios and
propose an instance-aware occupancy network, PanoSSC. We predict foreground
objects and backgrounds separately and merge both in post-processing. For
foreground instance grouping, we propose a novel 3D instance mask decoder that
can efficiently extract individual objects. we unify geometric reconstruction,
3D semantic segmentation, and 3D instance segmentation into PanoSSC framework
and propose new metrics for evaluating panoptic voxels. Extensive experiments
show that our method achieves competitive results on SemanticKITTI semantic
scene completion benchmark.Abstract
SISSA: Real-time Monitoring of Hardware Functional Safety and
Cybersecurity with In-vehicle SOME/IP Ethernet Traffic
arXiv:2402.14862v1 »Full PDF »Scalable service-Oriented Middleware over IP (SOME/IP) is an Ethernet
communication standard protocol in the Automotive Open System Architecture
(AUTOSAR), promoting ECU-to-ECU communication over the IP stack. However,
SOME/IP lacks a robust security architecture, making it susceptible to
potential attacks. Besides, random hardware failure of ECU will disrupt SOME/IP
communication. In this paper, we propose SISSA, a SOME/IP communication
traffic-based approach for modeling and analyzing in-vehicle functional safety
and cyber security. Specifically, SISSA models hardware failures with the
Weibull distribution and addresses five potential attacks on SOME/IP
communication, including Distributed Denial-of-Services, Man-in-the-Middle, and
abnormal communication processes, assuming a malicious user accesses the
in-vehicle network. Subsequently, SISSA designs a series of deep learning
models with various backbones to extract features from SOME/IP sessions among
ECUs. We adopt residual self-attention to accelerate the model's convergence
and enhance detection accuracy, determining whether an ECU is under attack,
facing functional failure, or operating normally. Additionally, we have created
and annotated a dataset encompassing various classes, including indicators of
attack, functionality, and normalcy. This contribution is noteworthy due to the
scarcity of publicly accessible datasets with such characteristics.Extensive
experimental results show the effectiveness and efficiency of SISSA.Abstract
Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots
in Ophthalmology and LLM-based evaluation using GPT-4
Purpose: To assess the alignment of GPT-4-based evaluation to human clinician
experts, for the evaluation of responses to ophthalmology-related patient
queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology
questions and paired answers were created by ophthalmologists to represent
commonly asked patient questions, divided into fine-tuning (368; 92%), and
testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b,
LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset,
additional 8 glaucoma QnA pairs were included. 200 responses to the testing
dataset were generated by 5 fine-tuned LLMs for evaluation. A customized
clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on
clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4
evaluation was then compared against ranking by 5 clinicians for clinical
alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest
(87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%),
LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4
evaluation demonstrated significant agreement with human clinician rankings,
with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80
respectively; while correlation based on Cohen Kappa was more modest at 0.50.
Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical
inaccuracies in the LLM-generated responses, which were appropriately
identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment
of GPT-4 evaluation highlighted its potential to streamline the clinical
evaluation of LLM chatbot responses to healthcare-related queries. By
complementing the existing clinician-dependent manual grading, this efficient
and automated evaluation could assist the validation of future developments in
LLM applications for healthcare.Abstract
InferAligner: Inference-Time Alignment for Harmlessness through
Cross-Model Guidance
arXiv:2401.11206v1 »Full PDF »With the rapid development of large language models (LLMs), they are not only
used as general-purpose AI assistants but are also customized through further
fine-tuning to meet the requirements of different applications. A pivotal
factor in the success of current LLMs is the alignment process. Current
alignment methods, such as supervised fine-tuning (SFT) and reinforcement
learning from human feedback (RLHF), focus on training-time alignment and are
often complex and cumbersome to implement. Therefore, we develop
\textbf{InferAligner}, a novel inference-time alignment method that utilizes
cross-model guidance for harmlessness alignment. InferAligner utilizes safety
steering vectors extracted from safety-aligned model to modify the activations
of the target model when responding to harmful inputs, thereby guiding the
target model to provide harmless responses. Experimental results show that our
method can be very effectively applied to domain-specific models in finance,
medicine, and mathematics, as well as to multimodal large language models
(MLLMs) such as LLaVA. It significantly diminishes the Attack Success Rate
(ASR) of both harmful instructions and jailbreak attacks, while maintaining
almost unchanged performance in downstream tasks.Abstract
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
Model Systems
arXiv:2401.05778v1 »Full PDF »Large language models (LLMs) have strong capabilities in solving diverse
natural language processing tasks. However, the safety and security issues of
LLM systems have become the major obstacle to their widespread application.
Many studies have extensively investigated risks in LLM systems and developed
the corresponding mitigation strategies. Leading-edge enterprises such as
OpenAI, Google, Meta, and Anthropic have also made lots of efforts on
responsible LLMs. Therefore, there is a growing need to organize the existing
studies and establish comprehensive taxonomies for the community. In this
paper, we delve into four essential modules of an LLM system, including an
input module for receiving prompts, a language model trained on extensive
corpora, a toolchain module for development and deployment, and an output
module for exporting LLM-generated content. Based on this, we propose a
comprehensive taxonomy, which systematically analyzes potential risks
associated with each module of an LLM system and discusses the corresponding
mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to
facilitate the risk assessment of LLM systems. We hope that this paper can help
LLM participants embrace a systematic perspective to build their responsible
LLM systems.Abstract