TrendNCart

society-logo-bcs-informatics

Mapping loneliness through social intelligence analysis: a step towards creating global loneliness map

[ad_1]

Results

Data about the keywords associated with loneliness were collected during October 2022 through the developer API of Twitter. The purpose of this paper is not to find the number of people with loneliness in a particular area or country. That kind of study would require collecting billions of tweets. Rather the purpose in this study is to find the correlations of loneliness with socioeconomic, political and personal-psychological categories. For this purpose, we do not need to go deeper into a user’s timeline and monitor their activity. We are more interested in the aggregate behaviour of users in relation to the expression of loneliness. We deidentify the tweets before analysing them, that is, we remove the users’ names and IDs. This is part of the data cleaning process. The data are publicly available, but we will not disclose the collected data without anonymising it.

Globally, 4.1 million tweets were collected. Out of these 841 796 were from the USA. Five cities had tweets higher than 10 000 which we analysed. We also analysed one city with tweets less than 10 000 but higher than 5000 to see whether the result conforms to the other cities with number of tweets more than 10 000. Orlando was the city, and the number of tweets was 5535.

Figure 1 presents our pipeline of analysis of data collected from Twitter. Twitter gives access to the users’ data through its publicly available Twitter API for developers. The data we gathered was based on topic modelling through open-vocabulary topics. The relevant tweets about loneliness were gathered and stored in a database. Topics, which are combinations of clusters of co-occurring words, were created. These topics are then analysed further through a dictatory-based approach.

Pipeline for processing Twitter data.

Tweets were collected containing the keywords mentioned in the last subsection. Tweets were extracted from these two countries to make a subdataset belonging to the USA. This was meant to reflect the majority composition of the dataset. Sentiment analysis was carried out after cleaning the data such as removing redundant characters, numbers, special characters, users’ profile ID and information such as ‘retweet’. Sentiment analysis is important to differentiate between the phrases and topics carrying meaningful information on loneliness and metaphorical and non-sequitur uses of the terms and topics associated with loneliness. Figure 2 gives the process of collecting data from Twitter and the process of analysis of the tweets.

Strengthening the Reporting of Observational Studies in Epidemiology diagram for the Twitter data.

Table 1 gives sentiment analysis for different cities as explained above and for the overall dataset which contains data about the USA. Table 1 also points towards an interesting outlier in the dataset, that is, Houston accounts for almost all the neutral tweets. Some of the cities have a more balanced amount of negative and other tweets (ie, positive and neutral) while two clear outlines can be pointed out in the dataset. For Houston, only 21.2% tweets are negative while for Queens 80.6% tweets are negative. The data were collected for 2 weeks, and it is not wide and deep enough to know with certainty the causes of these outliers. As mentioned, this study is a proof of concept for a wider loneliness map on the basis of SIA, that is, through analysing various social media and web based data through the tools of machine learning and AI. However, the neutral tweets along with the positive tweets do not add to the analysis of loneliness as carried out in this paper. With this dataset, the reason for these outliers cannot be ascertained without looking further into long-term data for each city. In further studies, the long term data will be collected to have balanced dataset for each city and find out the reasons for proportion of each category of tweets.

Sentiment analysis of tweets containing the keywords/topics of loneliness

The aim of the loneliness map and this paper is to find the correlation between loneliness and mental health issues and other topics which can vary from personal expression to socioeconomic factors. Before going into detailed analysis of the tweets on loneliness, it was important to find out the tweets which are metaphorical or non-sequitur. The neutrality can also represent the mention of loneliness in descriptive terms. The data here show that the sample size is consistent in producing reliable results as Orlando with the smallest sample size has similar results as other cities.

Figure 3 presents the word clouds of the sentiment of the tweets. This figure illustrates the most highly associated words with the groups of users tweeting with keywords associated with loneliness. It is important to plot the word cloud of both positive tweets and negative tweets to differentiate between metaphorical use and the meaningful use as intended by the study design of this paper. From the figure it can be seen that the words associated with positive sentiment of mention of loneliness are positive words such as commitment, sobriety, sober and months (number of months). The word cloud was generated after redundant words were removed such as the ‘RT’ (retweeted) and mention of the user’s ID.

Words more likely to be posted by Twitter users (A) when the sentiment of the tweet is positive, (B) when sentiment of the tweet is negative.

Table 2 presents the highly correlated topics with negative mention of loneliness. The tweets with negative sentiment were first tokenised and stemmed to get a concise list of words and topics associated with loneliness. The list was then analysed and meaningful words representing topics of interest such as emotional, social and health, etc identifiers were found out. Words such as ‘oh’, ‘yeah’ and ‘ur’ were ignored in composing the list. From table 2, it can be seen for the overall US dataset intimate relationships followed by interpersonal relationships are the highest correlated topics, thus, issues associated with loneliness. ‘COVID-19’ is the single highest occurring word in the dataset. The search keywords contained ‘isolated’ and ‘isolation’ and given the social and physical distancing required by COVID-19 prevention guidelines the highest occurrence of COVID-19 in association with negative sentiment of loneliness is expected. This tells us that the isolation because of COVID-19 has negative effects on people’s sentiments, thus their overall mental health. We also found the association of drug and addiction words with loneliness. The same was also found in figure 3B where the word ‘sober’ which is associated with recovery from addiction was used although in a positive sense. The combination of both figure 3B and table 2 shows the association of drug/alcohol addiction with loneliness; thus, it can be further investigated with keywords associated with both loneliness and addiction.

Highly correlated topics with mentions of loneliness

In table 3, the city-wise topic association of themes with loneliness was found out. It was based on analysis of tweets with negative sentiment. It was found out that the sample, however, limited, contained variation as per themes and topics associated with the negative consequences of loneliness. While these topics and their association with loneliness are not definitive, that is, it may change with availability of more data per city, it provides proof of concept for the idea of mapping loneliness, nonetheless. Some of the corelations are intuitive and self-expressive, for example, Queens being a big city with the peculiar nature of big city one would expect more self-oriented or self-focused expressions. The data analysed here provide a peek for data collected over a limited period, but it proves that the expression and dynamics of loneliness can change with geography which in turn can be dependent on particular urban infrastructure, healthcare system, socioeconomic issues and culture of the region. Similarly, figure 4 shows a few selected examples of city-wise association of topics with loneliness. As can be seen each word cloud is different with some meaningful words contained in each. For example, in Houston the word ‘lgbtq’ can be seen, while for Orlando words such as ‘love’ can be spotted out. This again drives home the point of variance in experience and expression of loneliness. It must be noted that the word cloud is based on the full words and phrases while the list in table 2 is based on stemmed words.

Selected city-wise examples of world clouds of words/topics associated with negative sentiment of loneliness, (A) Houston, (B) Orlando, (C) Nashville.

Top correlated topics with negative mention of loneliness across cities analysed

[ad_2]

Source link

Leave a Comment

Your email address will not be published. Required fields are marked *