Tweet Analysis

Type

Course Project

Date

Dec 2022
(1 week)

Frameworks & Tech

R, R Shiny, leaflet, highcharter, rtweet, syuzhet

Throughout COMM 497DB: Survey of Digital Behavioral Data, students used natural language processing to explore social issues on a more individualized level, gathering and analyzing data from Twitter's API.

We often discussed the emotional impacts of world events, reporting biases, and location, while acknowledging the production and consumption of misinformation and disinformation.

Requirements

Based on project guidelines, students were required to include a written component for to accompany the data that detailed the purpose and scope of the app, the type of data collected, key findings, and analytical limitations. While we were required to create an R Shiny application, where we had multiple pages, included radio panels, etc. were up to our discretion. Additionally, displaying more than one graph was preferred.



Gathering Data

The project utilized Twitter's API to gather tweets using the the hashtag "NATO," which which subsequently pulled from the dates November 26, 2022 to November 30, 2022. My goal was to pick a somewhat pollarizing topic and put it into a social context while possibly revealing sentiments that aren't captured in official media.

Gathering data proved to be challenging since many fields were optional for users, i.e. location which was needed to create the main map display. Subsequently, a set of 5,000 tweets provided by the API was condensed down to 97 viable ones with 48 variables each to analyze.

The sentiment analysis was not dependent on having geocodes, so the entire dataset could be utilized. The package syuzhet provided a list of seniments and scores for each tweet. These were then aggregated by date and sentiment to produce our final dataset, each row being a date, feeling, and value between 0 and 1 with 1 being strongest and 0 being weakest.


Dataset Samples

Aggregated Sentiments
(50 obs. of 3 variables)

API Output (97 obs. of 48 variables)



Product & Improvements

I settled on creating a two-page app with the first page displaying the results of the geocode dataset and the second page focusing on the results of the sentiment analysis.

The map page provides a visually pleasing interactive, with a radio panel that further emphasizes my written contributions through its focus on specific continents. I hoped to show users where discussion using #NATO were most prevelant, with the notion that an area is either affected by proximity to the issue (e.g., Europe) or political closeness (e.g., North America, Asia). The analysis was fairly strong in that regard. However, it could have been strengthened with further commentary on the contents of an area's tweets in a more specified sentiment analysis.

The sentiment analysis had interesting results considering the variation present in the most overarching emotions (negative, positive). I attempted to explain these changes through unbiased, recent news, though it's worth exploring other variables since no topic can be fully explained by one event. I would further research cyclical changes in sentiment and how adjacent topics such as the publicity of a war and the number of countries impacted affect the strength of sentiment values on a daily basis.

Final Images