Word embeddings represent a transformative technology for analyzing text data
in social work research, offering sophisticated tools for understanding case
notes, policy documents, research literature, and other text-based materials.
This methodological paper introduces word embeddings to social work
researchers, explaining how these mathematical representations capture meaning
and relationships in text data more effectively than traditional keyword-based
approaches. We discuss fundamental concepts, technical foundations, and
practical applications, including semantic search, clustering, and retrieval
augmented generation. The paper demonstrates how embeddings can enhance
research workflows through concrete examples from social work practice, such as
analyzing case notes for housing instability patterns and comparing social work
licensing examinations across languages. While highlighting the potential of
embeddings for advancing social work research, we acknowledge limitations
including information loss, training data constraints, and potential biases. We
conclude that successfully implementing embedding technologies in social work
requires developing domain-specific models, creating accessible tools, and
establishing best practices aligned with social work's ethical principles. This
integration can enhance our ability to analyze complex patterns in text data
while supporting more effective services and interventions.Abstract
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue
arXiv:2411.03814v1 »Full PDF »Large Language Models (LLMs) demonstrate outstanding performance in their
reservoir of knowledge and understanding capabilities, but they have also been
shown to be prone to illegal or unethical reactions when subjected to jailbreak
attacks. To ensure their responsible deployment in critical applications, it is
crucial to understand the safety capabilities and vulnerabilities of LLMs.
Previous works mainly focus on jailbreak in single-round dialogue, overlooking
the potential jailbreak risks in multi-round dialogues, which are a vital way
humans interact with and extract information from LLMs. Some studies have
increasingly concentrated on the risks associated with jailbreak in multi-round
dialogues. These efforts typically involve the use of manually crafted
templates or prompt engineering techniques. However, due to the inherent
complexity of multi-round dialogues, their jailbreak performance is limited. To
solve this problem, we propose a novel multi-round dialogue jailbreaking agent,
emphasizing the importance of stealthiness in identifying and mitigating
potential threats to human values posed by LLMs. We propose a risk
decomposition strategy that distributes risks across multiple rounds of queries
and utilizes psychological strategies to enhance attack strength. Extensive
experiments show that our proposed method surpasses other attack methods and
achieves state-of-the-art attack success rate. We will make the corresponding
code and dataset available for future research. The code will be released soon.Abstract
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned
Judge Model is not a General Substitute for GPT-4
arXiv:2403.02839v3 »Full PDF »Recently, there has been a growing trend of utilizing Large Language Model
(LLM) to evaluate the quality of other LLMs. Many studies have employed
proprietary close-sourced models, especially GPT-4, as the evaluator.
Alternatively, other works have fine-tuned judge models based on open-source
LLMs as the evaluator. While the fine-tuned judge models are claimed to achieve
comparable evaluation capability with GPT-4, in this work, we conduct an
empirical study of judge models. Our findings indicate that although the
fine-tuned judge models achieve high performance on in-domain test sets, even
surpassing GPT-4, they underperform GPT-4 across several dimensions, including
generalizability, fairness, aspect-specific evaluation, and scalability. We
also reveal that the fine-tuned judge model inherently operates as a
task-specific classifier, consequently imposing the limitations. Finally, we
introduce a integrated method, leveraging GPT-4 to compensate for the
limitations and improve the fine-tuned judges. Experiment results show our
method achieves accuracy on par with GPT-4 with only 50% of the API expense.Abstract
Building Altruistic and Moral AI Agent with Brain-inspired Affective
Empathy Mechanisms
arXiv:2410.21882v1 »Full PDF »As AI closely interacts with human society, it is crucial to ensure that its
decision-making is safe, altruistic, and aligned with human ethical and moral
values. However, existing research on embedding ethical and moral
considerations into AI remains insufficient, and previous external constraints
based on principles and rules are inadequate to provide AI with long-term
stability and generalization capabilities. In contrast, the intrinsic
altruistic motivation based on empathy is more willing, spontaneous, and
robust. Therefore, this paper is dedicated to autonomously driving intelligent
agents to acquire morally behaviors through human-like affective empathy
mechanisms. We draw inspiration from the neural mechanism of human brain's
moral intuitive decision-making, and simulate the mirror neuron system to
construct a brain-inspired affective empathy-driven altruistic decision-making
model. Here, empathy directly impacts dopamine release to form intrinsic
altruistic motivation. Based on the principle of moral utilitarianism, we
design the moral reward function that integrates intrinsic empathy and
extrinsic self-task goals. A comprehensive experimental scenario incorporating
empathetic processes, personal objectives, and altruistic goals is developed.
The proposed model enables the agent to make consistent moral decisions
(prioritizing altruism) by balancing self-interest with the well-being of
others. We further introduce inhibitory neurons to regulate different levels of
empathy and verify the positive correlation between empathy levels and
altruistic preferences, yielding conclusions consistent with findings from
psychological behavioral experiments. This work provides a feasible solution
for the development of ethical AI by leveraging the intrinsic human-like
empathy mechanisms, and contributes to the harmonious coexistence between
humans and AI.Abstract
Is Your HD Map Constructor Reliable under Sensor Corruptions?
Driving systems often rely on high-definition (HD) maps for precise
environmental information, which is crucial for planning and navigation. While
current HD map constructors perform well under ideal conditions, their
resilience to real-world challenges, \eg, adverse weather and sensor failures,
is not well understood, raising safety concerns. This work introduces MapBench,
the first comprehensive benchmark designed to evaluate the robustness of HD map
construction methods against various sensor corruptions. Our benchmark
encompasses a total of 29 types of corruptions that occur from cameras and
LiDAR sensors. Extensive evaluations across 31 HD map constructors reveal
significant performance degradation of existing methods under adverse weather
conditions and sensor failures, underscoring critical safety concerns. We
identify effective strategies for enhancing robustness, including innovative
approaches that leverage multi-modal fusion, advanced data augmentation, and
architectural techniques. These insights provide a pathway for developing more
reliable HD map construction methods, which are essential for the advancement
of autonomous driving technology. The benchmark toolkit and affiliated code and
model checkpoints have been made publicly accessible.Abstract
A Common Pitfall of Margin-based Language Model Alignment: Gradient
Entanglement
arXiv:2410.13828v1 »Full PDF »Reinforcement Learning from Human Feedback (RLHF) has become the predominant
approach for language model (LM) alignment. At its core, RLHF uses a
margin-based loss for preference optimization, specifying ideal LM behavior
only by the difference between preferred and dispreferred responses. In this
paper, we identify a common pitfall of margin-based methods -- the
under-specification of ideal LM behavior on preferred and dispreferred
responses individually, which leads to two unintended consequences as the
margin increases: (1) The probability of dispreferred (e.g., unsafe) responses
may increase, resulting in potential safety alignment failures. (2) The
probability of preferred responses may decrease, even when those responses are
ideal. We demystify the reasons behind these problematic behaviors:
margin-based losses couple the change in the preferred probability to the
gradient of the dispreferred one, and vice versa, often preventing the
preferred probability from increasing while the dispreferred one decreases, and
thus causing a synchronized increase or decrease in both probabilities. We term
this effect, inherent in margin-based objectives, gradient entanglement.
Formally, we derive conditions for general margin-based alignment objectives
under which gradient entanglement becomes concerning: the inner product of the
gradients of preferred and dispreferred log-probabilities is large relative to
the individual gradient norms. We theoretically investigate why such inner
products can be large when aligning language models and empirically validate
our findings. Empirical implications of our framework extend to explaining
important differences in the training dynamics of various preference
optimization algorithms, and suggesting potential algorithm designs to mitigate
the under-specification issue of margin-based methods and thereby improving
language model alignment.Abstract
Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety,
Toxicity, and Legal Reasoning
arXiv:2410.12621v1 »Full PDF »As large language models (LLMs) continue to advance, ensuring their alignment
with human values becomes increasingly critical. Traditional alignment methods
heavily rely on human feedback to fine-tune models. With the emergence of
superhuman models whose outputs may surpass human understanding, evaluating and
aligning these models using human judgments poses significant challenges. To
address the challenges, recent works use weak supervisors to elicit knowledge
from much stronger models. However, there are important disanalogies between
the empirical setup in the existing works and the genuine goal of alignment. We
remark that existing works investigate the phenomenon of weak-to-strong
generation in analogous setup (i.e., binary classification), rather than
practical alignment-relevant tasks (e.g., safety). In this paper, we bridge
this gap by extending weak-to-strong generation to the context of practical
alignment. We empirically demonstrate the widespread phenomenon of
weak-to-strong generation in three complicated alignment tasks: safety,
toxicity, and legal reasoning}. Furthermore, we explore efficient strategies
for improving alignment performance to enhance the quality of model outcomes.
Lastly, we summarize and analyze the challenges and potential solutions in
regard to specific alignment tasks, which we hope to catalyze the research
progress on the topic of weak-to-strong generalization. Our code is released at
https://github.com/yeruimeng/WTS.git.Abstract
Gender Bias of LLM in Economics: An Existentialism Perspective
Gender Bias, Large Language Models, Decision-Making
Large Language Models (LLMs), such as GPT-4 and BERT, have rapidly gained
traction in natural language processing (NLP) and are now integral to financial
decision-making. However, their deployment introduces critical challenges,
particularly in perpetuating gender biases that can distort decision-making
outcomes in high-stakes economic environments. This paper investigates gender
bias in LLMs through both mathematical proofs and empirical experiments using
the Word Embedding Association Test (WEAT), demonstrating that LLMs inherently
reinforce gender stereotypes even without explicit gender markers. By comparing
the decision-making processes of humans and LLMs, we reveal fundamental
differences: while humans can override biases through ethical reasoning and
individualized understanding, LLMs maintain bias as a rational outcome of their
mathematical optimization on biased data. Our analysis proves that bias in LLMs
is not an unintended flaw but a systematic result of their rational processing,
which tends to preserve and amplify existing societal biases encoded in
training data. Drawing on existentialist theory, we argue that LLM-generated
bias reflects entrenched societal structures and highlights the limitations of
purely technical debiasing methods. This research underscores the need for new
theoretical frameworks and interdisciplinary methodologies that address the
ethical implications of integrating LLMs into economic and financial
decision-making. We advocate for a reconceptualization of how LLMs influence
economic decisions, emphasizing the importance of incorporating human-like
ethical considerations into AI governance to ensure fairness and equity in
AI-driven financial systems.Abstract
Learning Fair Models without Sensitive Attributes: A Generative Approach
arXiv:2203.16413v2 »Full PDF »Most existing fair classifiers rely on sensitive attributes to achieve
fairness. However, for many scenarios, we cannot obtain sensitive attributes
due to privacy and legal issues. The lack of sensitive attributes challenges
many existing fair classifiers. Though we lack sensitive attributes, for many
applications, there usually exists features or information of various formats
that are relevant to sensitive attributes. For example, purchase history of a
person can reflect his or her race, which would help for learning fair
classifiers on race. However, the work on exploring relevant features for
learning fair models without sensitive attributes is rather limited. Therefore,
in this paper, we study a novel problem of learning fair models without
sensitive attributes by exploring relevant features. We propose a probabilistic
generative framework to effectively estimate the sensitive attribute from the
training data with relevant features in various formats and utilize the
estimated sensitive attribute information to learn fair models. Experimental
results on real-world datasets show the effectiveness of our framework in terms
of both accuracy and fairness.Abstract
arXiv:2409.19545v1 »Full PDF »Labor market forecasting on talent demand and supply is essential for
business management and economic development. With accurate and timely
forecasts, employers can adapt their recruitment strategies to align with the
evolving labor market, and employees can have proactive career path planning
according to future demand and supply. However, previous studies ignore the
interconnection between demand-supply sequences among different companies and
positions for predicting variations. Moreover, companies are reluctant to share
their private human resource data for global labor market analysis due to
concerns over jeopardizing competitive advantage, security threats, and
potential ethical or legal violations. To this end, in this paper, we formulate
the Federated Labor Market Forecasting (FedLMF) problem and propose a
Meta-personalized Convergence-aware Clustered Federated Learning (MPCAC-FL)
framework to provide accurate and timely collaborative talent demand and supply
prediction in a privacy-preserving way. First, we design a graph-based
sequential model to capture the inherent correlation between demand and supply
sequences and company-position pairs. Second, we adopt meta-learning techniques
to learn effective initial model parameters that can be shared across
companies, allowing personalized models to be optimized for forecasting
company-specific demand and supply, even when companies have heterogeneous
data. Third, we devise a Convergence-aware Clustering algorithm to dynamically
divide companies into groups according to model similarity and apply federated
aggregation in each group. The heterogeneity can be alleviated for more stable
convergence and better performance. Extensive experiments demonstrate that
MPCAC-FL outperforms compared baselines on three real-world datasets and
achieves over 97% of the state-of-the-art model, i.e., DH-GEM, without exposing
private company data.Abstract