Time Perspective-Enhanced Suicidal Ideation Detection Using Multi-Task Learning

Qianyi Yang; Jing Zhou; Zheng Wei

doi:10.53941/ijndi.2024.100011

Article

Time Perspective-Enhanced Suicidal Ideation Detection Using Multi-Task Learning

Qianyi Yang ¹, Jing Zhou ^1,*, and Zheng Wei ²

¹ School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China

² Zhongxing Telecommunication Equipment Corporation, Shenzhen 518057, China

^* Correspondence: zhoujing@cuc.edu.cn

Received: 25 October 2023

Accepted: 5 March 2024

Published: 26 June 2024

Abstract: Suicide notes are written documents left behind by suicide victims, either on paper or on social media, and can help us understand the mentality and thought processes of those struggling with suicidal thoughts. In our preliminary work, we have proposed the use of Time Perspective (TP), which takes into consideration how people think of or appraise their past, present, or future life would shape their behavior, in suicide tendency detection based on suicide notes. The detection result is highly dependent upon a ternary classification task that groups any suicide tendency into one of the three pre-defined types. In this work, we define the suicidal emotion trajectory, a concept that is based on TP and used for depicting the dynamic evolution of an individual's emotional state over time, and this trajectory serve as an auxiliary task to the primary ternary classification task for a multi-task learning model, i.e. TP-MultiBert. The model features Bidirectional Encoder Representation from Transformer (BERT) components, replacing its counterpart, i.e., GloVe, in the previous model. Thanks to its desirable capability of understanding word contextual relationships, as well as multi-task learning capability of leveraging complementary information from various tasks, BERT shows promissing results in further improving the performance of suicide ideation detection.

Keywords:

suicide tendency Time Perspective (TP) suicidal emotion trajectory multi-task learning BERT

1. Introduction

According to the World Health Organization (WHO), there are about 703,000 suicides globally each year, with millions more experiencing intense grief or being deeply affected by suicidal behavior [1]. A plethora of suicide tendency detection models have, therefore, been developed to capture the potential warning signals of suicide in an early stage, using a deluge of user activity records on social media [2−4]. Individuals committing suicide often leave a suicide note behind, which is a written document indicative of suicidal intent and can serve as an indicator of suicide intention.

In our preliminary work on the detection of suicidal ideation [5], Time Perspective (TP) [6] was introduced to facilitate gaining insights into real motives behind suicide ideation. TPs revolve around the exploration, analysis, and study of how individuals focus on their past, present, or future and how this state of mind affects their behavior and psychological state. Part-of-Speech Tagging (POS_TAG) [7] was used for tense detection, and the Emotion English DistilRoBERTa-base (EED-Bert) algorithm [8] was used for emotion classification. Under weakly supervised learning, TP-oriented labels for suicide notes (in the form of free-text in the CEASE dataset [5]) were obtained and used as annotations attached to the original notes. A deep learning model, Time Perspective-Global Vectors for Word Representation-Gated Recurrent Unit (TP-GloVe-GRU), was also proposed for suicide tendency classification. Both the original notes and related annotations were fed into the model in pursuit of improving the accuracy of suicide risk assessment.

The TP-GloVe-GRU model is lightweight yet effective, and outperforms all the benchmarks on the CEASE dataset. TP labels representing psychological states at different TPs are added to the same suicide note, treating these labels as independent when processing. Handling these independently processed TP labels in deep learning models does not fully exploit the correlations between the labels and can have a negative impact on the accuracy of suicide tendency detection.

Suicidal ideation, often called suicidal thoughts or ideas, is a time-dependent evolving process. Studies have demonstrated that individuals with suicidal thoughts typically undergo stages of emotional turbulence and dysregulation [9−12]. Their emotions can fluctuate between hopelessness, despair, an fleeting moments of positivity, and sudden deteriorations in their emotional state can significantly impact their psychological distress. Presenting this dynamic process and capturing emotional changes across various time intervals, may aid in more precise identification of potential suicide tendency.

For instance, a TP label sequence like < (past, positive), (present, neutral), (future, negative) > reflects a gradual emotional downturn in an individual. This approach is more natural and reasonable than that of analyzing a series of specific emotional snapshots within different time periods without considering the chronological order.

In this work, we use TP label sequences to construct suicidal emotion trajectories, depict the dynamic evolution of an individual's emotional state over time (see Sect. 3.2), and explore the impact of dynamic emotional changes on suicide tendencies. The suicidal emotion trajectory is used to serve as an auxiliary task to the primary ternary classification task for a multi-task learning model, TP-MultiBert. The model features a Bidirectional Encoder Representations from Transformers (BERT) component that replaces its counterpart, i.e., GloVe, in the previous model. BERT is a machine learning framework for natural language processing (NLP) and is pre-trained using only a plain text corpus. BERT takes into account the context for each occurrence of a given word. Such a feature allows BERT to be extremely favorable to applications such as suicidal tendency detection based on textual suicide notes. We believe that the introduction of a BERT component in our detection model may enable more accurately capturing of subtle emotional changes or differences hidden in suicide notes.

Our main contributions are summarized as follows.

(1) We map the nine independent TP labels from our previous work into five categories of suicidal emotion trajectories to investigate the impact of emotional changes reflected by TP labels on the formation of suicidal ideation. This mapping reveals that the intricate relationship between temporal perspective and emotional dynamics, shedding new light on the emotional journey within suicide notes and implications for suicide risk assessment.

(2) We construct multiple tasks, including task A that involves suicidal emotion trajectories and task B that deals with suicide tendency detection. By adopting a model based on BERT, we validate the enhancement brought by multi-task learning on suicide tendency detection.

The paper is structured as follows. Sect. 2 summarises our previous efforts in the field of suicide tendency detection and the application of multi-task learning techniques in this domain. In Sect. 3 we introduce an approach to TP labeling and present the proposed TP-SuicideBert model. A detailed account of our experimental process is provided in Sect. 4. We discuss and analyse the experimental results in Sect. 5 and concludes the paper as well as offers insights into future directions in Sect. 6.

2. Background

2.1. Preliminary Work

We considered TP as the synergy between time and emotions [5], which is key to the understanding of our TP label design. To begin with, we selected 2,393 sentences from the CEASE dataset, which were originated from real-world suicide notes of various sources, and we retained 1,805 sentences with rich information on emotions.

As shown in Figure 1, we obtained three kinds of time labels for each sentence (i.e., past, present, and future) using POS_TAG [7] and seven emotion labels via EED-Bert [8]. In the preliminary stage of our research, we simply mapped these seven emotion labels into three more coarse-grained emotion groups (including positive, neutral, and negative). Then, we placed sentences from suicide notes into categories such as single-tensed single-emotion (\( {\mathrm{T}}_{0}{\mathrm{E}}_{0} \)), multi-tensed single-emotion (\( {T}_{1}{E}_{0} \)), single-tensed multi-emotion (\( {T}_{0}{E}_{1} \)), and multi-tensed multi-emotion (\( {T}_{1}{E}_{1} \)). Finally, each sentence was labeled with its corresponding TP label. Specifically, we carried out TP labeling on these sentences, calculating values for nine TP labels using Equation (1).

\( T{P}_{i}=\left\{\left({t}_{i},{e}_{j}\right)|{t}_{i}\in \left\{past,present,future\right\},{e}_{j}\in \left\{positive,neutral,negative\right\}\right\} \) (1)

where \( {t}_{i} \) corresponds to one of the three time periods: past, present, or future, and \( {e}_{j} \) represents one of the three emotions: positive, neutral, or negative. As a result, each sentence is assigned values for the following nine TP labels: (past, negative), (past, neutral), (past, positive), (present, negative), (present, neutral), (present, positive), (future, negative), (future, neutral), and (future, positive).

The nine TP labels are considered independent of one another. To put it simply, for a suicide note, if its TP label contains (past, positive), (present, neutral), and (future, negative), each of the three labels will be fed into the TP-GloVe-GRU model with specific weights, and the final suicide probability is calculated as follows:

\( \begin{split} P\left(Suicide\right)=&{\omega }_{past,positive}P\left(past,positive\right)+{\omega }_{present,\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{l}}P\left(present,neutral\right) \\&+{\omega }_{future,negative}P\left(future,negative\right) \end{split} \) (2)

where \( P\left(Suicide\right) \) represents the suicide probability for a given suicide note, \( P\left(past,positive\right) \), \( P(present, neutral) \), and \( P\left(future,negative\right) \) denote the suicide probabilities under different TP labels, and \( {\omega }_{past,positive} \), \( {\omega }_{present,neutral} \) and \( {\omega }_{future,negative} \) correspond to the impact weights of each label.

We proposed a TP-enhanced Recurrent Neural Network (RNN) model, a.k.a. TP-GloVe-GRU. Both binary and ternary classification labeling for suicide tendencies were done for each sentence of a suicide note. The binary classification categorizes sentences into two classes: “no suicidal tendencies” and “suicidal tendencies” while the ternary classification groups the sentences into three categories including “no suicidal tendencies,” “implicit suicidal tendencies,” and “explicit and intense suicidal tendencies”. Thanks to TP labels, the accuracy of the suicide tendency detection model shows significant improvement. Particularly, the accuracy of the ternary classification task increases from 69.29% to 71.29%.

2.2. Enabling Technique: Multi-Task Learning

Multi-task learning is an inductive transfer method that leverages domain information contained in training signals from related tasks to improve model generalization [13]. During the learning process, multiple tasks are trained in parallel, allowing the model to acquire auxiliary and constraint information from other tasks, thereby enhancing the learning performance of the task at hand.

In addition to natural language processing, computer vision, speech recognition, and recommender systems [14−17], multi-task learning models are also used in solving tasks such as suicide tendency detection. For example, by using artificial neural networks to predict suicide risk from the daily language of social media users, researchers found that multi-task models outperformed single-task models [18]. Similarly, in predicting suicide risk and mental health, multi-task learning models had lower false alarm rates and higher accuracy compared with single-task baselines [19]. Furthermore, a multi-task neural model that jointly predicts the degree of depression and the reasons for depression, achieved satisfactory results on a manually annotated Chinese microblog comment dataset [20]. The effectiveness of multi-task learning was leveraged in [21] to discover the clear correlation between temporal orientation and sentiment classification when jointly analyzing the emotional state of victims. Furthermore, an end-to-end transformer-based multi-task network was proposed for detecting emotions and their intensity in suicide notes [22].

Meanwhile, suicide tendency detection often faces the challenge of data scarcity, as labeled data related to suicide tendencies is relatively scarce. Multi-task learning, however, allows more efficient utilization of existing data through sharing different perspectives. Moreover, multi-task learning leverages the sharing of lower-level feature learning across different tasks. For example, emotional expression and mental health can be potential indicators of suicide tendencies. By sharing feature learning, a multi-task learning model can better capture these common features, thereby improving task performance.

Hence, we endeavor to incorporate effective auxiliary tasks into the main task of suicide tendency detection, and we aim at fully leveraging the limited dataset of suicide notes to capture essential features. It has been shown in recent studies that fluctuating emotions can enhance specific memories [14], and more neurotic individuals experience greater variability in negative emotions in their daily lives [15]. It is possible to experience a rapid transition from positive to negative emotion, as opposed to maintaining a consistently mildly negative emotional state, and this may inflict greater psychological distress. Based on this premise, we will innovatively introduce the concept of suicidal emotion trajectories which is designed to help analyse the impact of emotional fluctuations identified within suicide notes over time on suicide tendency.

3. Methodology

3.1. Annotation of Suicidal Emotion Trajectory

We define the suicidal emotion trajectory \( T \) as a temporally sensitive sequence of TP labels:

\(\begin{split} T=& < \left({t}_{past},{e}_{i}\right),\left({t}_{present},{e}_{j}\right),\left({t}_{future},{e}_{k}\right) > ,\\& {e}_{m}\in \left\{positive,neutral,negative\right\},m\in \left\{i,j,k\right\} \end{split} \) (3)

where the TP label in the trajectory \( T \) is considered empty if the corresponding emotional information is missing within a particular time span.

We categorize all possible sequences of TP labels identified from suicide notes into five different suicidal emotional trajectories, including “Persistent Elation,” “Positive Attitude,” “Negative Attitude,” “Emotional Disorder,” and “Persistent Depressed,” to investigate the impact of dynamic TPs on suicide tendency. Take “Persistent Elation” from Table 1 as an example (Group 0), which includes either the absence of any negative emotions or two emotion trajectory patterns, such as < (past, neutral), (present, negative), (future, neutral) >. This suggests that authors of the corresponding suicide notes maintain consistent and neutral emotions in different time periods, which is often indicative of low suicide tendency. On the other hand, “Emotional Disorder” (Group 3) includes trajectories with both negative and positive emotions within the same time span, indicating that the author exhibits complex and disordered emotions during specific time intervals, which can be mapped to high suicide tendency.

Table 1 Classification of Suicidal Emotion Trajectory
Groups	Explanation
Persistent Elation	(1) No negative emotions(2) With a suicidal emotion trajectory \( T= < (\mathrm{p}\mathrm{a}\mathrm{s}\mathrm{t},\mathrm{ }\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{l}),\mathrm{ }(\mathrm{p}\mathrm{r}\mathrm{e}\mathrm{s}\mathrm{e}\mathrm{n}\mathrm{t},\mathrm{ }\mathrm{n}\mathrm{e}\mathrm{g}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{v}\mathrm{e}),\mathrm{ }(\mathrm{f}\mathrm{u}\mathrm{t}\mathrm{u}\mathrm{r}\mathrm{e},\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{t}\mathrm{r}\mathrm{a}\mathrm{l}) > \)
Positive Attitude	Transition from negative to neutral or even positive emotions
Negative Attitude	Transition from positive to neutral or even negative emotions
Emotional Disorder	Simultaneous presence of both negative and positive emotions during the same time period
Persistent Depressed	(1) Simultaneous presence of both negative and neutral emotions during the same time span(2) Solely containing negative emotions

The process of classifying suicide emotional trajectories is as follows. Taking Figure 2 as an example, if a note corresponds to a suicidal emotion trajectory such as < (past, positive), (present, neutral), (future, negative) > or < (past, neutral), (present, neutral), (future, negative) >, it will be categorised into the Negative Attitude group. If it is < (past, negative), (present, negative), (future, negative) >, it will fall into the Persistent Depressed group. In particular, if trajectories resembling < (past, positive) > and < (past, negative) > occur, indicating significant fluctuations in emotions within the same time period, then these trajectories will be considered as the member of the Emotional Disorder group. The calculation of the suicide tendency probability P' (Suicide) associated with this note is shown in Equation (4).

\( \begin{split} {P}'\left(Suicide\right)=&{\omega }_{1}P\left(\left(past,positive\right),\left(present,neutral\right)\right)\\&+ {\omega }_{2}P\left(\left(present,neutral\right),\left(future,positive\right)\right) \end{split} \) (4)

where, for an emotional trend expressed by two TP labels \( \left(past,positive\right),\left(present,neutral\right) \), \( {\omega }_{1} \) represents the weight associated with the trend and \( P\left(\left(past,positive\right),\left(present,neutral\right)\right) \) denotes the suicide probability corresponding to such a trend.

3.2. The TP-MultiBert Model

We construct a multi-task learning model, TP-MultiBert, which is intended to accomplish two tasks: Task A, the auxiliary task responsible for identifying suicide emotional trajectories, and Task B, the primary task intending to detect suicide tendency. Note that Task B is equivalent to the ternary classification task in TP-GloVe-GRU.

The model structure is illustrated in Figure 3. Sentences from a suicide note are input into the model, and after being processed by BERT, they are used for both Task A and Task B. Task A comprises the classification of any emotion trajectory into one of the following groups: Persistent Elation, Positive Attitude, Negative Attitude, Emotional Disorder, and Persistent Depressed, while Task B determines whether any suicide note belongs to having no suicide tendency, having implicit suicide tendency, or having explicit and intense suicide tendency.

Input Layer: The input sentence \( S \) is represented as a token sequence \( \left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\} \) where each \( {x}_{i} \) is a token in the sentence. The BERT model used in this experiment, specifically bert-base-uncased, is a pre-trained version of BERT with 110 million parameters, designed for processing English text and suitable for the requirements of the CEASE dataset.

\( S=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\} \) (5)

BERT Layer: \( S \) is fed into the BERT model, which utilises BERT's self-attention mechanism and feed-forward neural network to obtain a feature representation of the sentence \( H\left(S\right) \). In this multi-task learning model, we assume that there is certain correlation between Task A and Task B, so they can share the BERT representation.

\( H\left(S\right)=BERT\left(\left\{{x}_{1},{x}_{2},\cdots ,{x}_{n}\right\}\right) \) (6)

Classifier: For the sake of computational efficiency and dataset size, two linear classifiers are added on top of the BERT output to map to the label spaces of Task A and Task B.

\( {\text{logits}}_{A}=H\left(S\right){W}_{A}+{b}_{A} \) (7)

\( {\text{logits}}_{B}=H\left(S\right){W}_{B}+{b}_{B} \) (8)

where \( {W}_{A} \) and \( {b}_{A} \) are the weights and biases of the linear classifier for Task A, and \( {W}_{B} \) and \( {b}_{B} \) are the weights and biases of the linear classifier for Task B. These parameters are initialized with random values and updated iteratively through optimization techniques to minimize the loss associated with their own tasks. \( {\text{logits}}_{A} \) represents the logits for Task A, while \( {\text{logits}}_{B} \) denotes the logits for Task B.

Model Training: The training process makes use of the cross-entropy loss function. To train both Task A and Task B simultaneously, we use a weighted total loss function, \( {L}_{total} \).

\( {L}_{A}=-\sum_{Label{s}_{A}} \log * {softmax}({logits}_A) \) (9)

\( {L}_{B}=-\sum_{Label{s}_{B}} \log * {softmax}({logits}_B) \) (10)

\( {L}_{total}=\alpha \mathrm{*}{L}_{A}+\beta \mathrm{*}{L}_{B} \) (11)

where \( {L}_{A} \) is the loss function for Task A, \( {L}_{B} \) is the loss function for Task B, and \( \alpha \) and \( \beta \) represent the weights of \( {L}_{A} \) and \( {L}_{B} \) that indicate the importance assigned to individual tasks. To explore the best ratio of \( \alpha \) to \( \beta \), we run a number of experiments and find out that the most desirable performance of our TP-MultiBert model is obtained when \( \alpha /\beta =1/1 \).

4. Experiments

4.1. Evaluation Metrics

We employ multiple metrics to assess the performance of the TP-MultiBert model, including Accuracy, Precision, Recall, F1-Score, and Support.

• Accuracy measures the proportion of correctly classified instances out of the total instances, with higher accuracy indicating better alignment between the model's predictions and the actual labels.

• Precision gauges the proportion of true positive predictions among all positive predictions made by the model.

• Recall measures the proportion of true positive predictions among all actual positive instances.

• F1-Score is the harmonic mean of precision and recall, providing a balanced evaluation that considers both precision and recall.

• Support indicates the number of suicide notes in each class.

\( \mathrm{A}\mathrm{c}\mathrm{c}\mathrm{u}\mathrm{r}\mathrm{a}\mathrm{c}\mathrm{y}=\frac{TP+TN}{TP+TN+FP+FN} \) (12)

\( \mathrm{P}\mathrm{r}\mathrm{e}\mathrm{c}\mathrm{i}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}=\frac{TP}{TP+FP} \) (13)

\( \mathrm{R}\mathrm{e}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l}=\frac{TP}{TP+FN} \) (14)

\( \mathrm{F}1-\mathrm{S}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{e}=\frac{2\times \left(Precision\times Recall\right)}{Precision+Recall} \) (15)

\( {\text{Support}}_{i}={n}_{i} \) (16)

where \( \mathrm{T}\mathrm{P} \) stands for true positives (that is, suicide notes predicted by models as true and proven to be true), \( \mathrm{T}\mathrm{N} \)for true negatives, \( \mathrm{F}\mathrm{P} \) for false positives, and \( \mathrm{F}\mathrm{N} \) for false negatives. \( {n}_{i} \) represents the number of samples in class \( \mathrm{i} \), and \( {\text{Support}}_{i} \) denotes the support for class \( \mathrm{i} \).

Moreover, \( Macro\_Avg \) and \( Weighted\_Avg \) are used to measure the average performance between dissimilar classes.

● \( Macro\_Avg \) computes the average values of precision, recall, and F1-Score for each class, without considering the actual distribution of each class in the dataset.

● \( Weighted\_Avg \) on the other hand, takes into account the actual distribution of each class in the dataset. It calculates the weighted average of the performance of each class, with weights based on the class proportions in the dataset.

\(Macro\_{Avg}=\frac{1}{n}{\sum }_{i=1}^{n}{\text{metric}}_{i} \) (17)

\( Weighted\_Avg=\frac{{\sum }_{i=1}^{n}{\text{metric}}_{i}\times {\text{Support}}_{i}}{{\sum }_{i=1}^{n}{\text{Support}}_{i}} \) (18)

where \( n \) represents the number of classes, \( {\text{metric}}_{i} \) stands for the value of the metric (precision, recall, F1-Score) for class \( \mathrm{i} \), and \( {\text{Support}}_{i} \) denotes the support for class \( i \).

4.2. Experimental Methods

The experiments are intended to serve two purposes in the context of classification of suicide tendency identified from suicide notes. First, we would like to assess the strength and weakness of Global Vectors for Word Representation (GloVe) and BERT. Also, we are keen on exploring the varying effectiveness of the single-task and multi-task learning models.

To begin with, we compare the performance of the multi-task learning model TP-MultiBert and its counterpart (the single-task learning model TP-Bert) in dealing with three-class suicide tendency detection. Then, we conduct a comparative analysis between the components involved in TP-GloVe-GRU from our preliminary work and the components in TP-MultiBert.

4.3. Experimental Settings

To mitigate the challenges posed by limited data sets and prevent model bias stemming from uneven data distribution, we employ stratified k-fold cross-validation on the entire dataset. In this approach, k is set to be 5. Each fold of cross-validation ensures that 80% of the dataset is used for training, and the remaining 20% is reserved for testing.

We conduct multiple experiments, testing different learning rates to strike a balance between training speed and model performance. Eventually, we obtain a learning rate of \( {2e}^{-5} \), as it comes with good model performance and stability during training. According to the best practice reported in [23, 24], the random seed is set to 42 and the maximum string length read by the tokenizer is set to 128.

To fully utilize hardware resources and accommodate dataset size during training, we set the batch size as 16. The Adam optimization algorithm, known for its efficiency in updating neural network parameters, is employed in two distinct scenarios: single-task learning and multi-task learning. This versatile optimizer plays a crucial role in enhancing model performance by efficiently updating parameters and facilitating the convergence of the neural network during training, showcasing its excellent performance in multi-task learning applications [25].

5. Results and Analysis

5.1. Single-Task vs. Multi-task

Figure 4 provides a comparison of the classification results in the ternary classification task (Class 0, Class 1, and Class 2) between the single-task TP-Bert model and the multi-task TP-MultiBert model, as well as a comparison of the overall \( Macro\_Avg \) and \( Weighted\_Avg \) indicators.

The benchmark, TP-Bert, used in our evaluation is a single-task deep learning model based on BERT. Compared with the deep learning model based on GloVe and GRU used in our previous work, this benchmark demonstrates superior performance in terms of suicidal ideation detection. We observe that TP-MultiBert achieves even better improvements over TP-Bert in most metrics. For instance, in identifying Class 2, which corresponds to strong and clear suicidal tendency, there are improvements brought by TP-MutiBert across all the metrics. The Supports for Class 0, Class 1, and Class 2 are 185.2, 65.8, and 37.8, respectively, indicating no extreme imbalanced data.

The result confirms the synergy between the main task of suicidal tendency detection and the auxiliary task of identification of the suicidal emotion trajectory. The knowledge acquired from identification of suicidal emotion trajectory has been effectively transferred to suicidal tendency detection. We believe that it is the multi-task learning model that enables bidirectional information sharing and exchange, and leads to enhanced model performance.

5.2. GloVe vs. Bert

Previously, the TP-GloVe-GRU model incorporates the nine TP labels (considered independent) into the deep learning model, leading to an accuracy increase from 69.29% to 71.29% (see Figure 5). In this work, we replace the GloVe component in the model with BERT, which is better at understanding word contextual relationships.

Since we introduce the concept of suicide emotion trajectories to facilitate construction of a multi-task learning model, one can witness an enhanced suicide tendency detection accuracy of 74.34%. Compared with TP-GloVe-GRU, TP-MultiBert increases accuracy by 3.05%. Also, TP-MultiBert brings about an improvement on accuracy by 1.94% over the single-task model without suicide emotion trajectories, i.e., TP-Bert in Figure 5.

When constructing a suicide tendency detection model, most researchers tend to utilize GloVe [26, 27] despite its limitations in capturing contextually relevant semantic information. By incorporating RNNs (e.g. GRU and LSTM), it is still possible for the detection model to capture the contextual information, thereby completing feature extraction and obtaining the final classification results. In this work, we attempt to introduce BERT, a dynamic word embedding model, to replace the combination of GloVe and GRU. Experimental results demonstrate that BERT not only performes the same tasks competently but also improves the detection model's performance on important metrics such as accuracy. Compared with GloVe, BERT is capable of computing the contextual representation of text during pre-training, thus more accurately capturing subtle emotional changes or differences hidden in suicide notes. We speculate that this might be one of the reasons that TP-Bert outperforms TP-GloVe in terms of performance.

5.3. TP-MultiBert vs. ChatGPT

Meanwhile, we select a few sentences in a random fashion from the CEASE dataset for a comparison between TP-MultiBert and ChatGPT, one of the state-of-the-art language models. The result is shown in Table 2.

Table 2 Comparison of TP-MultiBert and ChatGPT
Sentence	Classification	Chat-GPT	TP-MultiBert
S1: i think about suicide on a daily basis sometimes it is all that i can think about.	2	\( \surd \)	\( \surd \)
S2: i should have liked to finish my account of working for arthur a story which began when our paths happened to cross in 1949.	0	\( \surd \)	\( \surd \)
S3: if i cannot see my daughter here i will see her from above	1	\( \surd \)	\( \surd \)
S4: please comfort my wife and tell her that this was no ordinary suicide and that she can rest assured that god will still gather me up in his great mercy god protect my dearest ones god bless you dear pastor evermore.	1	\( \surd \)	\( \surd \)
S5: all my thoughts are with you with edda and all my beloved the last beats of my hear i will mark our great and eternal love your NAME	1	0	\( \surd \)
S6: i wish with all my heart that they might have been better rewarded all of you my dear ones i ask to keep my memory alive in your hearts to live on in the hearts of our dear ones is all that i can conceive of immortality.	1	0	\( \surd \)
S7: i cannot keep on going because it should be me that is gone from this earth not her.	2	\( \surd \)	\( \surd \)
S8: and i cannot wait till the day i get to see you again.	0	\( \surd \)	1

The comparison demonstrates that ChatGPT's insight into suicide tendency detection is slightly weaker than that of TP-MultiBert. Although the former can discern the hint of a heart stopping in sentence S6, it still believes that the author of S6 bears no suicidal tendency after considering the overall positive emotion expressed in the sentence. TP-MultiBert, on the other hand, exhibits a higher sensitivity but may also deliver misclassifications, especially of short texts difficult to be labeled with an explicit suicidal emotion trajectory. For instance, sentence S8 only reveals positive emotions about one’s future and TP-MultiBert fails to provide accurate classification of its related suicide tendency.

As a result, TP-MultiBert excels in suicide tendency detection in terms of complex texts, especially those reveal more pronounced emotional feature changes. However, for simpler texts, Large Language Models (LLMs) such as ChatGPT may provide more accurate judgments.

6. Conclusion and Future Work

The evolution of an individual’s suicidal ideation from inception to clear manifestation is a dynamic process closely linked to emotions. Therefore, we have argued that the suicidal emotion trajectory described by emotional changes over time can be integrated as an auxiliary task into a multi-task learning model, TP-MultiBert, for suicide tendency detection with an enhanced accurancy. Compared with single-task learning models (e.g. TP-Bert) that do not incorporate the suicidal emotion trajectory, TP-MultiBert has brought about improvements in metrics including accuracy, F1-Score, etc.

Today, the practice of using LLMs such as ChatGPT for natural language processing tasks becomes prevalent. Our model still holds significance for certain scenarios. On one hand, models like ChatGPT, which are general language models, perform well across various domains and tasks. However, for specific tasks such as suicide tendency detection, which requires more authoritative and sensitive professional judgments, model transparency and interpretability are essential for individual autonomy. Individuals should be able to understand how the model makes decisions to make informed choices about whether or not to accept intervention and what kind of intervention can be accepted. Customized models, as opposed to large models, offer a better solution for suicide tendency detection. On the other hand, suicide tendency detection faces challenges from data scarcity and the need for specific skills in feature engineering, data preprocessing, and post-processing. Under such circumstances, customizing and targeted training of models are much valuable for taking full advantage of limited data resources. Moreover, AI ethics demand that we manage suicide-related issues in a confidential manner, rather than relying on large models for automated decisions that could potentially lead to adverse impact on individuals.

We believe that suicide tendency detection with the integration of multimodal data will be an intriguing research direction. With the diversification of how individuals share information, suicide tendency detection may require the amalgamation of various information sources, including text, images, audios, and other multimodal data [28, 29]. Among others, advanced learning models proposed in [30, 31] provide insights into dealing with images for object/pedestrian attribute detection from a computer vision perspective. Taking into account relevant multimodal data will allow us to gain a deeper understanding of an individual’s emotional state and mental health, thus assisting the improvement of the accuracy and predictability of suicide risk detection.

Transfer learning based algorithms including Evolutionary Transfer Optimization (ETO) opens up new possibilities for knowledge transfer between different domains [32]. When dealing with niche topics (e.g. suicide tendency detection), interdisciplinary knowledge transfer can enhance our understanding and handling of complex societal issues, offering intriguing insights and tools for relevant research and practice.

Author Contributions: Qianyi Yang: conceptualization, data curation, analysis, methodology, writing — original draft; Jing Zhou: writing — review and editing, supervision; Zheng Wei: writing — review and editing. All authors have read and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Data Availability Statement: Source code is available at https://github.com/qianyiyang1/SuicideClassification and the dataset can be accessed via https://www.iitp.ac.in/%7eai-nlp-ml/resources.html.

Conflicts of Interest: The authors declare no conflicts of interest.

References

World Health Organization. World suicide prevention day 2022. Available online: https://www.who.int/campaigns/world-suicide-prevention-day/2022 (accessed on 21 March 2023).
Lee, D.; Kang, M.; Kim, M.; et al. Detecting suicidality with a contextual graph neural network. In Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology, Seattle, USA, July 2022; ACL, 2022; pp 116–125. doi: 10.18653/v1/2022.clpsych-1.10
Shaygan, M.; Hosseini, F.A.; Shemiran, M.; et al. The effect of mobile-based logotherapy on depression, suicidal ideation, and hopelessness in patients with major depressive disorder: A mixed-methods study. Sci. Rep., 2023, 13: 15828. doi: 10.1038/s41598-023-43051-8
Tadesse, M.M.; Lin, H.F.; Xu, B.; et al. Detection of suicide ideation in social media forums using deep learning. Algorithms, 2019, 13: 7. doi: 10.3390/a13010007
Yang, Q.Y.; Zhou, J. Incorporating time perspectives into detection of suicidal ideation. In Proceedings of the 2023 IEEE Smart World Congress (SWC), Portsmouth, UK, 28–31 August 2023; IEEE: New York, 2023; pp. 1–8. doi: 10.1109/SWC57546.2023.10448961
Lewin, K. Field theory in Social Science: Selected Theoretical Papers; Harper & Brothers: New York, 1951.
Manning, C.D.; Schütze, H. Foundations of Statistical Natural Language Processing; MIT Press: Cambridge, 1999.
Hartmann, J. Emotion English Distilroberta-base. Available online: https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/ (accessed on 12 September 2023).
Law, K.C.; Khazem, L.R.; Anestis, M.D. The role of emotion dysregulation in suicide as considered through the ideation to action framework. Curr. Opin. Psychol., 2015, 3: 30−35. doi: 10.1016/j.copsyc.2015.01.014
Rajappa, K.; Gallagher, M.; Miranda, R. Emotion dysregulation and vulnerability to suicidal ideation and attempts. Cogn. Ther. Res., 2012, 36: 833−839. doi: 10.1007/s10608-011-9419-2
Everall, R.D.; Bostik, K.E.; Paulson, B.L. Being in the safety zone: Emotional experiences of suicidal adolescents and emerging adults. J. Adolesc. Res., 2006, 21: 370−392. doi: 10.1177/0743558406289753
Heffer, T.; Willoughby, T. The role of emotion dysregulation: A longitudinal investigation of the interpersonal theory of suicide. Psychiatry Res., 2018, 260: 379−383. doi: 10.1016/j.psychres.2017.11.075
Caruana, R. Multitask learning. Mach. Learn., 1997, 28: 41−75. doi: 10.1023/A:1007379606734
Li, N.; Chow, C.Y.; Zhang, J.D. SEML: A semi-supervised multi-task learning framework for aspect-based sentiment analysis. IEEE Access, 2020, 8: 189287−189297. doi: 10.1109/ACCESS.2020.3031665
Liu, P.F.; Qiu, X.P.; Huang, X.J. Recurrent neural network for text classification with multi-task learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, USA, 9–15 July 2016; IJCAI, 2016; pp. 2873–2879.
He, K.M.; Gkioxari, G.; Dollár, P.; et al. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; IEEE: New York, 2017; pp 2980–2988. doi: 10.1109/ICCV.2017.322
Hadash, G.; Shalom, O.S.; Osadchy, R. Rank and rate: Multi-task learning for recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver, Canada, September 2018; ACM: New York, 2018; pp 451–454. doi: 10.1145/3240323.3240406
Ghosh, S.; Ekbal, A.; Bhattacharyya, P. A multitask framework to detect depression, sentiment and multi-label emotion from suicide notes. Cogn. Comput., 2022, 14: 110−129. doi: 10.1007/s12559-021-09828-7
Benton, A.; Mitchell, M.; Hovy, D. Multi-task learning for mental health using social media text. arXiv: 1712.03538, 2017. doi:10.48550/arXiv.1712.03538
Yang, T.T.; Li, F.; Ji, D.H; et al. Fine-grained depression analysis based on Chinese micro-blog reviews. Inf. Process. Manag., 2021, 58: 102681. doi: 10.1016/j.ipm.2021.102681
Ghosh, S.; Ekbal, A.; Bhattacharyya, P. Deep cascaded multitask framework for detection of temporal orientation, sentiment and emotion from suicide notes. Sci. Rep., 2022, 12: 4457. doi: 10.1038/s41598-022-08438-z
Ghosh, S.; Ekbal, A.; Bhattacharyya, P. VAD-assisted multitask transformer framework for emotion recognition and intensity prediction on suicide notes. Inf. Process. Manag., 2023, 60: 103234. doi: 10.1016/j.ipm.2022.103234
Ranade, A.; Telge, S.; Mate, Y. Predicting disasters from tweets using GloVe Embeddings and BERT layer classification. In 11th International Conference on Advanced Computing, Msida, Malta, 18–19 December 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp 492–503. doi: 10.1007/978-3-030-95502-1_37
Kaushal, A.; Mahowald, K. What do tokens know about their characters and how do they know it? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, USA, July 2022; ACL, 2022; pp. 2487–2507. doi: 10.18653/v1/2022.naacl-main.179
Jha, A.; Kumar, A.; Banerjee, B.; et al. AdaMT-Net: An adaptive weight learning based multi-task learning model for scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 14–19 June 2020; IEEE: New York, 2020; pp. 3027–3035. doi: 10.1109/CVPRW50498.2020.00361
Mohammadi, E.; Amini, H.; Kosseim, L. CLaC at CLPsych 2019: Fusion of neural features and predicted class probabilities for suicide risk assessment based on online posts. In Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology, Minneapolis, Minnesota, June 2019; ACL, 2019; pp. 34–38. doi: 10.18653/v1/W19-3004
Du, J.C.; Zhang, Y.Y.; Luo, J.H.; et al. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med. Inf. Decis. Mak., 2018, 18: 43. doi: 10.1186/s12911-018-0632-8
Gao, M.X.; Wong, N.M.L.; Lin, C.M.; et al. Multimodal brain connectome-based prediction of suicide risk in people with late-life depression. Nat. Mental Health, 2023, 1: 100−113. doi: 10.1038/s44220-022-00007-7
Garg, M. Mental health analysis in social media posts: A survey. Arch. Comput. Methods Eng., 2023, 30: 1819−1842. doi: 10.1007/s11831-022-09863-z
Zeng, N.Y.; Wu, P.S.; Wang, Z.D.; et al. A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas., 2022, 71: 3507014. doi: 10.1109/TIM.2022.3153997
Wu, P.S.; Wang, Z.D.; Li, H.; et al. KD-PAR: A knowledge distillation-based pedestrian attribute recognition model with multi-label mixed feature learning network. Expert Syst. Appl., 2024, 237: 121305. doi: 10.1016/j.eswa.2023.121305
Li, H.; Wang, Z.D.; Lan, C.B.; et al. A novel dynamic multiobjective optimization algorithm with non-inductive transfer learning based on multi-strategy adaptive selection. IEEE Trans. Neural Netw. Learn. Syst. 2023 , in press. doi:10.1109/TNNLS.2023.3295461

Downloads

Time Perspective-Enhanced Suicidal Ideation Detection Using Multi-Task Learning

Keywords:

1. Introduction

2. Background

2.1. Preliminary Work

2.2. Enabling Technique: Multi-Task Learning

3. Methodology

3.1. Annotation of Suicidal Emotion Trajectory

3.2. The TP-MultiBert Model

4. Experiments

4.1. Evaluation Metrics

4.2. Experimental Methods

4.3. Experimental Settings

5. Results and Analysis

5.1. Single-Task vs. Multi-task

5.2. GloVe vs. Bert

5.3. TP-MultiBert vs. ChatGPT

6. Conclusion and Future Work

References

About Scilight

Journals

Publishing Policies

Contact

Manage your cookie preferences

Strictly Necessary Cookies

Performance/Analytics Cookies

Functional Cookies

Targeting/Advertising Cookies