arXiv:2409.00133v1 »Full PDF »Recent breakthroughs in large language models (LLMs) offer unprecedented
natural language understanding and generation capabilities. However, existing
surveys on LLMs in biomedicine often focus on specific applications or model
architectures, lacking a comprehensive analysis that integrates the latest
advancements across various biomedical domains. This review, based on an
analysis of 484 publications sourced from databases including PubMed, Web of
Science, and arXiv, provides an in-depth examination of the current landscape,
applications, challenges, and prospects of LLMs in biomedicine, distinguishing
itself by focusing on the practical implications of these models in real-world
biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot
learning across a broad spectrum of biomedical tasks, including diagnostic
assistance, drug discovery, and personalized medicine, among others, with
insights drawn from 137 key studies. Then, we discuss adaptation strategies of
LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to
enhance their performance in specialized biomedical contexts where zero-shot
fails to achieve, such as medical question answering and efficient processing
of biomedical literature. Finally, we discuss the challenges that LLMs face in
the biomedicine domain including data privacy concerns, limited model
interpretability, issues with dataset quality, and ethics due to the sensitive
nature of biomedical data, the need for highly reliable model outputs, and the
ethical implications of deploying AI in healthcare. To address these
challenges, we also identify future research directions of LLM in biomedicine
including federated learning methods to preserve data privacy and integrating
explainable AI methodologies to enhance the transparency of LLMs.Abstract
A Survey on Large Language Models for Code Generation
arXiv:2406.00515v2 »Full PDF »Large Language Models (LLMs) have garnered remarkable advancements across
diverse code-related tasks, known as Code LLMs, particularly in code generation
that generates source code with LLM from natural language descriptions. This
burgeoning field has captured significant interest from both academic
researchers and industry professionals due to its practical significance in
software development, e.g., GitHub Copilot. Despite the active exploration of
LLMs for a variety of code tasks, either from the perspective of natural
language processing (NLP) or software engineering (SE) or both, there is a
noticeable absence of a comprehensive and up-to-date literature review
dedicated to LLM for code generation. In this survey, we aim to bridge this gap
by providing a systematic literature review that serves as a valuable reference
for researchers investigating the cutting-edge progress in LLMs for code
generation. We introduce a taxonomy to categorize and discuss the recent
developments in LLMs for code generation, covering aspects such as data
curation, latest advances, performance evaluation, ethical implications,
environmental impact, and real-world applications. In addition, we present a
historical overview of the evolution of LLMs for code generation and offer an
empirical comparison using the HumanEval, MBPP, and BigCodeBench benchmarks
across various levels of difficulty and types of programming tasks to highlight
the progressive enhancements in LLM capabilities for code generation. We
identify critical challenges and promising opportunities regarding the gap
between academia and practical development. Furthermore, we have established a
dedicated resource GitHub page (https://github.com/juyongjiang/CodeLLMSurvey)
to continuously document and disseminate the most recent advances in the field.Abstract
Accepted for publication at NeurIPS 2024, 34 Pages, 9 Figures
This paper examines the issue of fairness in the estimation of graphical
models (GMs), particularly Gaussian, Covariance, and Ising models. These models
play a vital role in understanding complex relationships in high-dimensional
data. However, standard GMs can result in biased outcomes, especially when the
underlying data involves sensitive characteristics or protected groups. To
address this, we introduce a comprehensive framework designed to reduce bias in
the estimation of GMs related to protected attributes. Our approach involves
the integration of the pairwise graph disparity error and a tailored loss
function into a nonsmooth multi-objective optimization problem, striving to
achieve fairness across different sensitive groups while maintaining the
effectiveness of the GMs. Experimental evaluations on synthetic and real-world
datasets demonstrate that our framework effectively mitigates bias without
undermining GMs' performance.Abstract
SMoA: Improving Multi-agent Large Language Models with Sparse
Mixture-of-Agents
While multi-agent systems have been shown to significantly enhance the
performance of Large Language Models (LLMs) across various tasks and
applications, the dense interaction between scaling agents potentially hampers
their efficiency and diversity. To address these challenges, we draw
inspiration from the sparse mixture-of-agents (SMoE) and propose a sparse
mixture-of-agents (SMoA) framework to improve the efficiency and diversity of
multi-agent LLMs. Unlike completely connected structures, SMoA introduces novel
Response Selection and Early Stopping mechanisms to sparsify information flows
among individual LLM agents, striking a balance between performance and
efficiency. Additionally, inspired by the expert diversity principle in SMoE
frameworks for workload balance between experts, we assign distinct role
descriptions to each LLM agent, fostering diverse and divergent thinking.
Extensive experiments on reasoning, alignment, and fairness benchmarks
demonstrate that SMoA achieves performance comparable to traditional
mixture-of-agents approaches but with significantly lower computational costs.
Further analysis reveals that SMoA is more stable, has a greater capacity to
scale, and offers considerable potential through hyper-parameter optimization.
Code and data will be available at: https://github.com/David-Li0406/SMoA.Abstract
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse
Activation Control
arXiv:2411.02461v1 »Full PDF »As the development and application of Large Language Models (LLMs) continue
to advance rapidly, enhancing their trustworthiness and aligning them with
human preferences has become a critical area of research. Traditional methods
rely heavily on extensive data for Reinforcement Learning from Human Feedback
(RLHF), but representation engineering offers a new, training-free approach.
This technique leverages semantic features to control the representation of
LLM's intermediate hidden states, enabling the model to meet specific
requirements such as increased honesty or heightened safety awareness. However,
a significant challenge arises when attempting to fulfill multiple requirements
simultaneously. It proves difficult to encode various semantic contents, like
honesty and safety, into a singular semantic feature, restricting its
practicality. In this work, we address this issue through ``Sparse Activation
Control''. By delving into the intrinsic mechanisms of LLMs, we manage to
identify and pinpoint components that are closely related to specific tasks
within the model, i.e., attention heads. These heads display sparse
characteristics that allow for near-independent control over different tasks.
Our experiments, conducted on the open-source Llama series models, have yielded
encouraging results. The models were able to align with human preferences on
issues of safety, factuality, and bias concurrently.Abstract
Risk Sources and Risk Management Measures in Support of Standards for
General-Purpose AI Systems
There is an urgent need to identify both short and long-term risks from newly
emerging types of Artificial Intelligence (AI), as well as available risk
management measures. In response, and to support global efforts in regulating
AI and writing safety standards, we compile an extensive catalog of risk
sources and risk management measures for general-purpose AI (GPAI) systems,
complete with descriptions and supporting examples where relevant. This work
involves identifying technical, operational, and societal risks across model
development, training, and deployment stages, as well as surveying established
and experimental methods for managing these risks. To the best of our
knowledge, this paper is the first of its kind to provide extensive
documentation of both GPAI risk sources and risk management measures that are
descriptive, self-contained and neutral with respect to any existing regulatory
framework. This work intends to help AI providers, standards experts,
researchers, policymakers, and regulators in identifying and mitigating
systemic risks from GPAI systems. For this reason, the catalog is released
under a public domain license for ease of direct use by stakeholders in AI
governance and standards.Abstract
Rethinking the Uncertainty: A Critical Review and Analysis in the Era of
Large Language Models
arXiv:2410.20199v1 »Full PDF »In recent years, Large Language Models (LLMs) have become fundamental to a
broad spectrum of artificial intelligence applications. As the use of LLMs
expands, precisely estimating the uncertainty in their predictions has become
crucial. Current methods often struggle to accurately identify, measure, and
address the true uncertainty, with many focusing primarily on estimating model
confidence. This discrepancy is largely due to an incomplete understanding of
where, when, and how uncertainties are injected into models. This paper
introduces a comprehensive framework specifically designed to identify and
understand the types and sources of uncertainty, aligned with the unique
characteristics of LLMs. Our framework enhances the understanding of the
diverse landscape of uncertainties by systematically categorizing and defining
each type, establishing a solid foundation for developing targeted methods that
can precisely quantify these uncertainties. We also provide a detailed
introduction to key related concepts and examine the limitations of current
methods in mission-critical and safety-sensitive applications. The paper
concludes with a perspective on future directions aimed at enhancing the
reliability and practical adoption of these methods in real-world scenarios.Abstract
A Survey on Knowledge Distillation of Large Language Models
In the era of Large Language Models (LLMs), Knowledge Distillation (KD)
emerges as a pivotal methodology for transferring advanced capabilities from
leading proprietary LLMs, such as GPT-4, to their open-source counterparts like
LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a
crucial role in both compressing these models, and facilitating their
self-improvement by employing themselves as teachers. This paper presents a
comprehensive survey of KD's role within the realm of LLM, highlighting its
critical function in imparting advanced knowledge to smaller models and its
utility in model compression and self-improvement. Our survey is meticulously
structured around three foundational pillars: \textit{algorithm},
\textit{skill}, and \textit{verticalization} -- providing a comprehensive
examination of KD mechanisms, the enhancement of specific cognitive abilities,
and their practical implications across diverse fields. Crucially, the survey
navigates the intricate interplay between data augmentation (DA) and KD,
illustrating how DA emerges as a powerful paradigm within the KD framework to
bolster LLMs' performance. By leveraging DA to generate context-rich,
skill-specific training data, KD transcends traditional boundaries, enabling
open-source models to approximate the contextual adeptness, ethical alignment,
and deep semantic insights characteristic of their proprietary counterparts.
This work aims to provide an insightful guide for researchers and
practitioners, offering a detailed overview of current methodologies in KD and
proposing future research directions. Importantly, we firmly advocate for
compliance with the legal terms that regulate the use of LLMs, ensuring ethical
and lawful application of KD of LLMs. An associated Github repository is
available at https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs.Abstract
A Hybrid Defense Strategy for Boosting Adversarial Robustness in
Vision-Language Models
arXiv:2410.14911v1 »Full PDF »The robustness of Vision-Language Models (VLMs) such as CLIP is critical for
their deployment in safety-critical applications like autonomous driving,
healthcare diagnostics, and security systems, where accurate interpretation of
visual and textual data is essential. However, these models are highly
susceptible to adversarial attacks, which can severely compromise their
performance and reliability in real-world scenarios. Previous methods have
primarily focused on improving robustness through adversarial training and
generating adversarial examples using models like FGSM, AutoAttack, and
DeepFool. However, these approaches often rely on strong assumptions, such as
fixed perturbation norms or predefined attack patterns, and involve high
computational complexity, making them challenging to implement in practical
settings. In this paper, we propose a novel adversarial training framework that
integrates multiple attack strategies and advanced machine learning techniques
to significantly enhance the robustness of VLMs against a broad range of
adversarial attacks. Experiments conducted on real-world datasets, including
CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly
enhances model robustness. The fine-tuned CLIP model achieved an accuracy of
43.5% on adversarially perturbed images, compared to only 4% for the baseline
model. The neural network model achieved a high accuracy of 98% in these
challenging classification tasks, while the XGBoost model reached a success
rate of 85.26% in prediction tasks.Abstract
Class-RAG: Content Moderation with Retrieval Augmented Generation
Robust content moderation classifiers are essential for the safety of
Generative AI systems. Content moderation, or safety classification, is
notoriously ambiguous: differences between safe and unsafe inputs are often
extremely subtle, making it difficult for classifiers (and indeed, even humans)
to properly distinguish violating vs. benign samples without further context or
explanation. Furthermore, as these technologies are deployed across various
applications and audiences, scaling risk discovery and mitigation through
continuous model fine-tuning becomes increasingly challenging and costly. To
address these challenges, we propose a Classification approach employing
Retrieval-Augmented Generation (Class-RAG). Class-RAG extends the capability of
its base LLM through access to a retrieval library which can be dynamically
updated to enable semantic hotfixing for immediate, flexible risk mitigation.
Compared to traditional fine-tuned models, Class-RAG demonstrates flexibility
and transparency in decision-making. As evidenced by empirical studies,
Class-RAG outperforms on classification and is more robust against adversarial
attack. Besides, our findings suggest that Class-RAG performance scales with
retrieval library size, indicating that increasing the library size is a viable
and low-cost approach to improve content moderation.Abstract