1. Problem Statement, Objectives, and Motivation
Understanding how music influences society and/or simultaneously reflects societal values has been studied in a variety of ways (mostly qualitative) throughout modern history. Although qualitative approaches can yield a substantial amount of information, leveraging the power of data analytics could drastically enhance the quality of findings made in sociological contexts. Using the Billboard Top 100 as a proxy for what is classified as popular music, our group sought to develop a better understanding of how popular music has changed over time.
– Over time how has general sentiment changed in popular music?
– Who are the top artists/influencers for each decade based off of the charts?
– How has our culture changed the way we convey messages over time? If so, how can popular music show this?
The data we collected came from Billboard’s “Top 100”. Billboard uses sales, streaming, and radio plays to compile a list representative of the top 100 songs from each year, which is considered to be an accurate reflection of the most popular music of said year. The time range of our data is 1946 through 2016. Billboard gets their data from Nielsen Soundscan (starting from 1990), which is an information and sales tracking system. Prior to 1990, Billboard called stores to gather sales information for their calculations.
2.2 Analysis of Word Usage Using R
We began our analysis with a simple visualization of the most used words across the entire data set. Figure 1 below shows all words that appeared in the data set more the 2000 times in a bar chart. It can be seen in this graph that “love” was definitely the most frequently used word. Figure 2 is a word cloud depicting the word frequency based on the size and boldness of the word. This chart is another means of viewing the same data results in a different format.
Figure 3 below dives a bit deeper into the data. This graph was generated in R and seems to indicate an increasing trend in the number of unique words per song over the years in the data set. In other words, we seem to be getting “wordier”. Our hypothesis was that new technological advances contributed to songs with more unique words, which may indicate that we, as a society, are valuing words with more complicated messages. Several lines on the graph indicate specific technological advances of the time, and it appears that after the invention of the Walkman, the number of unique words per song began to really increase.
Figure 4 explores the relationship between a song’s chart position and the number of unique words using a simple linear regression. Based on our initial regression, we found a significant p-value of 0.003. It’s important to note that the fit is not ideal and corresponding R-squared value are very low. This is because of the large amount of noise found in the data, and a regular line would not be able to fit the data.
Due to the noise of the data illustrated in Figure 4, we averaged the number of unique words used for each chart position and replotted the data as in Figure 5. This resulted in a stronger R-squared value and an even more significant p-value. Although the R-squared value is still very low, it appears that there could be a relationship with chart position and the number of unique words used in a song.
2.3 Sentiment Analysis
The next portion of our analysis focused on analyzing lyrical sentiment in order to determine if there were any specific trends related to the sentiment of popular music over time. In order to accomplish this, we utilized R’s Bing and Afinn sentiment lexicons. All stop words (words that have no sentiment or meaning associated with them) were removed, and the remaining words were assigned sentiment values based on these two lexicons. Normalizing for the total number of words found in each year, the data suggests that there does seem to be a potential negative trend in overall sentiment over time. This finding is illustrated in the two plots, Figures 6 and Figure 7, found below.
In order to test whether a trend may actually exist, another set of regression analyses were conducted, one for the Bing Lexicon sentiment results and another for the Afinn Lexicon sentiment results. Because the Bing Lexicon strictly assigns a binary “positive” or “negative” values to each word, a logistic regression was chosen. A significant p-value was found as the result of this regression, suggesting that there could be a significant downward trend in overall sentiment based on this data set. A simple linear regression was then conducted using the Afinn Lexicon sentiment results, and a significant p-value was achieved in this model, as well. The fitted line for this regression can be seen in Figure 8 below.
2.4 Artist and Word Trend Analysis using Tableau
We began our Tableau analysis by plotting the number of charted songs by artist and creating a graph that is partitioned by decade (see Figure 9 below). In order to focus on the most influential artists of each decade, we used the filter feature in Tableau with the constraint that the artist must have charted a minimum of 10 songs in that decade. Note that a few artists, including Elvis Presley and Madonna, charted more than 10 songs in multiple decades, which shows the longevity and overall success of their musical careers.
Next, we wanted to investigate how average chart position varies among different artists. For this analysis, we wanted to include additional artists, so we changed the minimum song requirement to 5 charted songs in that decade. We created a plot for each decade and implemented a drop-down menu that allows the user to select and view a specific decade.
The plot for the current decade is shown below in Figure 10. While our dataset did not include genre data, we were able to draw general trends about how certain genres chart on average. We noted that many of the country artists chart lower on average (usually between spots 50 and 75), while artists in the pop or hip-hop/rap genres tend to appear near the top of the charts. This can likely be attributed to these two genres appealing to a large subset of the listening audience.
The next set of graphs below focus on word trends and usage over time. In the first graph, the top 20 words are plotted over time (see Figure 11 below). Note that in this graph, the word usage has been normalized by decade in order to account for varying total number of words in each decade. Even with normalized data, many of the words that appeared in the frequency plot in R (Figure 1) such as love, baby, and heart, also appear. One interesting trend to note is that the word love, which peaked in usage in the 1970’s and 1980’s, seems to have been used significantly less in the 2000’s. We concluded that artists must have been able to incorporate other ways to convey the word love in their music during this decade.
3. Discussion and Results
In summary, our team found that general sentiment does appear to be becoming significantly more negative (on the margin) in popular music, while overall average sentiment still appears to be very neutral. Additionally, we found that there seems to be an upward trend in the number of unique words used in songs in recent years versus songs from previous eras in history. This could suggest that people may be valuing lyrical messages more now than in the past; however, there are several other factors that could drastically be impacting this trend such as technological advances.
There also does appear to be some kind of relationship between a song’s ranking and the number of unique words used in a song. The analysis described above suggests that a higher number of unique words in a song may be indicative of a lower chart ranking compared to a song with a 14 fewer number of unique words used. This could suggest that even if a song is very popular, in order to be charted extremely high, the song may need to be less lyrically inclined. This finding could also be affected by the distribution of different genres on the charts – for instance, hip-hop/rap tends to be more lyrically heavy than other genres. Further analysis would need to be done on song genres in order to get a better idea of how a song’s genre impacts the number of unique words used and chart position.