Billboard Top 100 Lyric Analysis

By | October 1, 2018

Author: Rachel NachmanZach DeGrooteBenjamin Morris

1. Problem Statement, Objectives, and Motivation

Understanding how music influences society and/or simultaneously reflects societal values has been studied in a variety of ways (mostly qualitative) throughout modern history. Although qualitative approaches can yield a substantial amount of information, leveraging the power of data analytics could drastically enhance the quality of findings made in sociological contexts. Using the Billboard Top 100 as a proxy for what is classified as popular music, our group sought to develop a better understanding of how popular music has changed over time.

General Questions:
– Over time how has general sentiment changed in popular music?
– Who are the top artists/influencers for each decade based off of the charts?
– How has our culture changed the way we convey messages over time? If so, how can popular music show this?

2. Analysis

2.1 Data

The data we collected came from Billboard’s “Top 100”. Billboard uses sales, streaming, and radio plays to compile a list representative of the top 100 songs from each year, which is considered to be an accurate reflection of the most popular music of said year. The time range of our data is 1946 through 2016. Billboard gets their data from Nielsen Soundscan (starting from 1990), which is an information and sales tracking system. Prior to 1990, Billboard called stores to gather sales information for their calculations.

2.2 Analysis of Word Usage Using R

We began our analysis with a simple visualization of the most used words across the entire data set. Figure 1 below shows all words that appeared in the data set more the 2000 times in a bar chart. It can be seen in this graph that “love” was definitely the most frequently used word. Figure 2 is a word cloud depicting the word frequency based on the size and boldness of the word. This chart is another means of viewing the same data results in a different format.

Figure 1: Frequency bar chart, words with frequency over 2000.

Figure 2: Word cloud of top words used.

Figure 3 below dives a bit deeper into the data. This graph was generated in R and seems to indicate an increasing trend in the number of unique words per song over the years in the data set. In other words, we seem to be getting “wordier”. Our hypothesis was that new technological advances contributed to songs with more unique words, which may indicate that we, as a society, are valuing words with more complicated messages. Several lines on the graph indicate specific technological advances of the time, and it appears that after the invention of the Walkman, the number of unique words per song began to really increase.

Figure 3: Average unique words used per song over time.

Figure 4  explores the relationship between a song’s chart position and the number of unique words using a simple linear regression. Based on our initial regression, we found a significant p-value of 0.003. It’s important to note that the fit is not ideal and corresponding R-squared value are very low. This is because of the large amount of noise found in the data, and a regular line would not be able to fit the data.

Figure 4: Simple regression fit line, predicting chart position as a function of unique words used in song.

Due to the noise of the data illustrated in Figure 4, we averaged the number of unique words used for each chart position and replotted the data as in Figure 5. This resulted in a stronger R-squared value and an even more significant p-value. Although the R-squared value is still very low, it appears that there could be a relationship with chart position and the number of unique words used in a song.

Figure 5: Simple regression fit line, predicting chart position as a function of unique
words used in song (collapsed on each chart position).

2.3 Sentiment Analysis

The next portion of our analysis focused on analyzing lyrical sentiment in order to determine if there were any specific trends related to the sentiment of popular music over time. In order to accomplish this, we utilized R’s Bing and Afinn sentiment lexicons. All stop words (words that have no sentiment or meaning associated with them) were removed, and the remaining words were assigned sentiment values based on these two lexicons. Normalizing for the total number of words found in each year, the data suggests that there does seem to be a potential negative trend in overall sentiment over time. This finding is illustrated in the two plots, Figures 6 and Figure 7, found below.

Figure 6: Bing Lexicon sentiment plot over time.

Figure 7: Afinn Lexicon sentiment plot over time.

In order to test whether a trend may actually exist, another set of regression analyses were conducted, one for the Bing Lexicon sentiment results and another for the Afinn Lexicon sentiment results. Because the Bing Lexicon strictly assigns a binary “positive” or “negative” values to each word, a logistic regression was chosen. A significant p-value was found as the result of this regression, suggesting that there could be a significant downward trend in overall sentiment based on this data set. A simple linear regression was then conducted using the Afinn Lexicon sentiment results, and a significant p-value was achieved in this model, as well. The fitted line for this regression can be seen in Figure 8 below.

Figure 8: Simple linear regression fit for Afinn Lexicon, predicting normalized
sentiment as a function of time.

2.4 Artist and Word Trend Analysis using Tableau

We began our Tableau analysis by plotting the number of charted songs by artist and creating a graph that is partitioned by decade (see Figure 9 below). In order to focus on the most influential artists of each decade, we used the filter feature in Tableau with the constraint that the artist must have charted a minimum of 10 songs in that decade. Note that a few artists, including Elvis Presley and Madonna, charted more than 10 songs in multiple decades, which shows the longevity and overall success of their musical careers.

Figure 9: Tableau graph of the artists that appeared on the Billboard Top 100 Chart most frequently in each decade (minimum of 10 songs charted).

Next, we wanted to investigate how average chart position varies among different artists. For this analysis, we wanted to include additional artists, so we changed the minimum song requirement to 5 charted songs in that decade. We created a plot for each decade and implemented a drop-down menu that allows the user to select and view a specific decade.

The plot for the current decade is shown below in Figure 10. While our dataset did not include genre data, we were able to draw general trends about how certain genres chart on average. We noted that many of the country artists chart lower on average (usually between spots 50 and 75), while artists in the pop or hip-hop/rap genres tend to appear near the top of the charts. This can likely be attributed to these two genres appealing to a large subset of the listening audience.

Figure 10: Tableau bar graph of average chart position by artist for the current decade
(minimum of 5 charted songs).

The next set of graphs below focus on word trends and usage over time. In the first graph, the top 20 words are plotted over time (see Figure 11 below). Note that in this graph, the word usage has been normalized by decade in order to account for varying total number of words in each decade. Even with normalized data, many of the words that appeared in the frequency plot in R (Figure 1) such as love, baby, and heart, also appear. One interesting trend to note is that the word love, which peaked in usage in the 1970’s and 1980’s, seems to have been used significantly less in the 2000’s. We concluded that artists must have been able to incorporate other ways to convey the word love in their music during this decade.

Figure 11: Tableau graph of the top 20 words by percent usage (normalized for the
total number of words used in that decade).

3. Discussion and Results

In summary, our team found that general sentiment does appear to be becoming significantly more negative (on the margin) in popular music, while overall average sentiment still appears to be very neutral. Additionally, we found that there seems to be an upward trend in the number of unique words used in songs in recent years versus songs from previous eras in history. This could suggest that people may be valuing lyrical messages more now than in the past; however, there are several other factors that could drastically be impacting this trend such as technological advances.

There also does appear to be some kind of relationship between a song’s ranking and the number of unique words used in a song. The analysis described above suggests that a higher number of unique words in a song may be indicative of a lower chart ranking compared to a song with a 14 fewer number of unique words used. This could suggest that even if a song is very popular, in order to be charted extremely high, the song may need to be less lyrically inclined. This finding could also be affected by the distribution of different genres on the charts – for instance, hip-hop/rap tends to be more lyrically heavy than other genres. Further analysis would need to be done on song genres in order to get a better idea of how a song’s genre impacts the number of unique words used and chart position.

1+
Share this post
  • 1
  •  
  •  
  •  
  •  

7 thoughts on “Billboard Top 100 Lyric Analysis

  1. Tom Dreher

    This is a very interesting report focusing on a topic that permeates nearly everyone’s daily life: music. It is interesting to see that unique word count is on the rise even today, because a general complaint with current pop music is that it is very repetitive compared to the singer/songwriter era of the 1970’s (e.g. Drake vs. Fleetwood Mac). It would be interesting to see how artists could apply these results to their music careers to produce work that charts higher or is more lucrative. One aspect of the study that could be elaborated on more is the sentiment analysis; was not really sure what conclusions to draw from this data analysis.

  2. Dylan Weber

    Very intriguing findings! I think it’s awesome that R has so many packages that are available for use. Who knew they had ways of quantifying sentiment?! I also like the use of ggplot to supplement your visualizations in Tableau. Nice work!

  3. David Wilkins

    This might be one of the funnier analyses I’ve ever seen, although I’m quite sure you didn’t intend it to be. I was a little surprised that the number of unique words in a song has increased over time, although I could see how the rise in the popularity of rap could result in such a change. What I found funny was the linear regression model that shows as the number of unique words in a song increases, the average chart position decreases. I was also a little surprised at how high Pitbull’s average chart position is, although of course, the analysis only captures songs that enter the top 100, which could be described as a limitation or shortcoming of the analysis.

  4. Lauren Chiang

    Thank you for sharing your data analysis. I found it interesting that some of the most common words are not even words that we would consider in our every day discussion such as “ooh” and “la.” The word cloud was a great visualization to demonstrate how each of the words compared to one another. Another point that I found interesting was that there were more unique words over time but there was less sentiment words. I would have thought that these would correlate positively based on the fact that more unique words would indicate that they would have more sentiment. I was surprised to find that the word “love” has decreased since the 1970s and 1980s. The conclusion that fewer number of words used seems to correlate to a higher chart ranker makes sense to me though because the more unique words, the more difficult for people to learn the lyrics. Overall, very interesting analysis.

  5. Tyler Behle

    Thank you for sharing your analysis. I thought it very interesting that the advancement of listening technology, specifically the walkman resulted in more unique words per song. The figure showing lower amount of unique words leads to a slightly better average chart position makes sense. I would expect a song with less words would have a catchier more repeated hook. A place for future work might be to explore what topics the songs with more unique words are about. Are there more words because the artist is trying to convey a deeper meaning?

  6. Pooja Shivale Patil

    I really enjoyed this report. This post showcases the evolution of song recording and statistics related to it brilliantly. I found it interesting that the number of unique words in a song has increased over time. This is great work because many people can relate to the content in this post!

  7. Blake

    I find Figure 3 to be very interesting. Though many people think that the up and coming music genres like hip-hop and rap tend to be very repetitive in the topics that they’re addressing, there is a greater variability of their music than in years past (before the 1980’s/1990’s). In addition, to Figure 1, I’d like to see a frequency bar chart of the words used in each decade and see how those compare.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.