Type
Course Project
Date
Dec 2022
(1 week)
Frameworks & Tech
R, R Shiny, leaflet, highcharter, rtweet, syuzhet
Throughout COMM 497DB: Survey of Digital Behavioral Data, students
used natural language processing to explore social issues on a more
individualized level, gathering and analyzing data from Twitter's API.
We often discussed the emotional impacts of world events, reporting
biases, and location, while acknowledging the production and
consumption of misinformation and disinformation.
Based on project guidelines, students were required to include a written component for to accompany the data that detailed the purpose and scope of the app, the type of data collected, key findings, and analytical limitations. While we were required to create an R Shiny application, where we had multiple pages, included radio panels, etc. were up to our discretion. Additionally, displaying more than one graph was preferred.
The project utilized Twitter's API to gather tweets using the the
hashtag "NATO," which which subsequently pulled from the dates
November 26, 2022 to November 30, 2022. My goal was to pick a somewhat
pollarizing topic and put it into a social context while possibly
revealing sentiments that aren't captured in official media.
Gathering data proved to be challenging since many fields were
optional for users, i.e. location which was needed to create the main
map display. Subsequently, a set of 5,000 tweets provided by the API
was condensed down to 97 viable ones with 48 variables each to
analyze.
The sentiment analysis was not dependent on having geocodes, so the
entire dataset could be utilized. The package
syuzhet provided a list of seniments and scores for each
tweet. These were then aggregated by date and sentiment to produce our
final dataset, each row being a date, feeling, and value between 0 and
1 with 1 being strongest and 0 being weakest.
I settled on creating a two-page app with the first page displaying
the results of the geocode dataset and the second page focusing on the
results of the sentiment analysis.
The map page provides a visually pleasing interactive, with a radio
panel that further emphasizes my written contributions through its
focus on specific continents. I hoped to show users where discussion
using #NATO were most prevelant, with the notion that an area is
either affected by proximity to the issue (e.g., Europe) or political
closeness (e.g., North America, Asia). The analysis was fairly strong
in that regard. However, it could have been strengthened with further
commentary on the contents of an area's tweets in a more specified
sentiment analysis.
The sentiment analysis had interesting results considering the
variation present in the most overarching emotions (negative,
positive). I attempted to explain these changes through unbiased,
recent news, though it's worth exploring other variables since no
topic can be fully explained by one event. I would further research
cyclical changes in sentiment and how adjacent topics such as the
publicity of a war and the number of countries impacted affect the
strength of sentiment values on a daily basis.
See the final application here!