Author: Rachel Nachman, Zach DeGroote, Benjamin Morris
1. Problem Statement, Objectives, and Motivation
Understanding how music influences society and/or simultaneously reflects societal values has been studied in a variety of ways (mostly qualitative) throughout modern history. Although qualitative approaches can yield a substantial amount of information, leveraging the power of data analytics could drastically enhance the quality of findings made in sociological contexts. Using the Billboard Top 100 as a proxy for what is classified as popular music, our group sought to develop a better understanding of how popular music has changed over time.
General Questions:
– Over time how has general sentiment changed in popular music?
– Who are the top artists/influencers for each decade based off of the charts?
– How has our culture changed the way we convey messages over time? If so, how can popular music show this?
2. Analysis
2.1 Data
The data we collected came from Billboard’s “Top 100”. Billboard uses sales, streaming, and radio plays to compile a list representative of the top 100 songs from each year, which is considered to be an accurate reflection of the most popular music of said year. The time range of our data is 1946 through 2016. Billboard gets their data from Nielsen Soundscan (starting from 1990), which is an information and sales tracking system. Prior to 1990, Billboard called stores to gather sales information for their calculations.
2.2 Analysis of Word Usage Using R
We began our analysis with a simple visualization of the most used words across the entire data set. Figure 1 below shows all words that appeared in the data set more the 2000 times in a bar chart. It can be seen in this graph that “love” was definitely the most frequently used word. Figure 2 is a word cloud depicting the word frequency based on the size and boldness of the word. This chart is another means of viewing the same data results in a different format.
Figure 3 below dives a bit deeper into the data. This graph was generated in R and seems to indicate an increasing trend in the number of unique words per song over the years in the data set. In other words, we seem to be getting “wordier”. Our hypothesis was that new technological advances contributed to songs with more unique words, which may indicate that we, as a society, are valuing words with more complicated messages. Several lines on the graph indicate specific technological advances of the time, and it appears that after the invention of the Walkman, the number of unique words per song began to really increase.
Figure 4 explores the relationship between a song’s chart position and the number of unique words using a simple linear regression. Based on our initial regression, we found a significant p-value of 0.003. It’s important to note that the fit is not ideal and corresponding R-squared value are very low. This is because of the large amount of noise found in the data, and a regular line would not be able to fit the data.

Figure 4: Simple regression fit line, predicting chart position as a function of unique words used in song.
Due to the noise of the data illustrated in Figure 4, we averaged the number of unique words used for each chart position and replotted the data as in Figure 5. This resulted in a stronger R-squared value and an even more significant p-value. Although the R-squared value is still very low, it appears that there could be a relationship with chart position and the number of unique words used in a song.

Figure 5: Simple regression fit line, predicting chart position as a function of unique
words used in song (collapsed on each chart position).
2.3 Sentiment Analysis
The next portion of our analysis focused on analyzing lyrical sentiment in order to determine if there were any specific trends related to the sentiment of popular music over time. In order to accomplish this, we utilized R’s Bing and Afinn sentiment lexicons. All stop words (words that have no sentiment or meaning associated with them) were removed, and the remaining words were assigned sentiment values based on these two lexicons. Normalizing for the total number of words found in each year, the data suggests that there does seem to be a potential negative trend in overall sentiment over time. This finding is illustrated in the two plots, Figures 6 and Figure 7, found below.
In order to test whether a trend may actually exist, another set of regression analyses were conducted, one for the Bing Lexicon sentiment results and another for the Afinn Lexicon sentiment results. Because the Bing Lexicon strictly assigns a binary “positive” or “negative” values to each word, a logistic regression was chosen. A significant p-value was found as the result of this regression, suggesting that there could be a significant downward trend in overall sentiment based on this data set. A simple linear regression was then conducted using the Afinn Lexicon sentiment results, and a significant p-value was achieved in this model, as well. The fitted line for this regression can be seen in Figure 8 below.

Figure 8: Simple linear regression fit for Afinn Lexicon, predicting normalized
sentiment as a function of time.
2.4 Artist and Word Trend Analysis using Tableau
We began our Tableau analysis by plotting the number of charted songs by artist and creating a graph that is partitioned by decade (see Figure 9 below). In order to focus on the most influential artists of each decade, we used the filter feature in Tableau with the constraint that the artist must have charted a minimum of 10 songs in that decade. Note that a few artists, including Elvis Presley and Madonna, charted more than 10 songs in multiple decades, which shows the longevity and overall success of their musical careers.

Figure 9: Tableau graph of the artists that appeared on the Billboard Top 100 Chart most frequently in each decade (minimum of 10 songs charted).
Next, we wanted to investigate how average chart position varies among different artists. For this analysis, we wanted to include additional artists, so we changed the minimum song requirement to 5 charted songs in that decade. We created a plot for each decade and implemented a drop-down menu that allows the user to select and view a specific decade.
The plot for the current decade is shown below in Figure 10. While our dataset did not include genre data, we were able to draw general trends about how certain genres chart on average. We noted that many of the country artists chart lower on average (usually between spots 50 and 75), while artists in the pop or hip-hop/rap genres tend to appear near the top of the charts. This can likely be attributed to these two genres appealing to a large subset of the listening audience.

Figure 10: Tableau bar graph of average chart position by artist for the current decade
(minimum of 5 charted songs).
The next set of graphs below focus on word trends and usage over time. In the first graph, the top 20 words are plotted over time (see Figure 11 below). Note that in this graph, the word usage has been normalized by decade in order to account for varying total number of words in each decade. Even with normalized data, many of the words that appeared in the frequency plot in R (Figure 1) such as love, baby, and heart, also appear. One interesting trend to note is that the word love, which peaked in usage in the 1970’s and 1980’s, seems to have been used significantly less in the 2000’s. We concluded that artists must have been able to incorporate other ways to convey the word love in their music during this decade.

Figure 11: Tableau graph of the top 20 words by percent usage (normalized for the
total number of words used in that decade).
3. Discussion and Results
In summary, our team found that general sentiment does appear to be becoming significantly more negative (on the margin) in popular music, while overall average sentiment still appears to be very neutral. Additionally, we found that there seems to be an upward trend in the number of unique words used in songs in recent years versus songs from previous eras in history. This could suggest that people may be valuing lyrical messages more now than in the past; however, there are several other factors that could drastically be impacting this trend such as technological advances.
There also does appear to be some kind of relationship between a song’s ranking and the number of unique words used in a song. The analysis described above suggests that a higher number of unique words in a song may be indicative of a lower chart ranking compared to a song with a 14 fewer number of unique words used. This could suggest that even if a song is very popular, in order to be charted extremely high, the song may need to be less lyrically inclined. This finding could also be affected by the distribution of different genres on the charts – for instance, hip-hop/rap tends to be more lyrically heavy than other genres. Further analysis would need to be done on song genres in order to get a better idea of how a song’s genre impacts the number of unique words used and chart position.
This is a very interesting report focusing on a topic that permeates nearly everyone’s daily life: music. It is interesting to see that unique word count is on the rise even today, because a general complaint with current pop music is that it is very repetitive compared to the singer/songwriter era of the 1970’s (e.g. Drake vs. Fleetwood Mac). It would be interesting to see how artists could apply these results to their music careers to produce work that charts higher or is more lucrative. One aspect of the study that could be elaborated on more is the sentiment analysis; was not really sure what conclusions to draw from this data analysis.
Very intriguing findings! I think it’s awesome that R has so many packages that are available for use. Who knew they had ways of quantifying sentiment?! I also like the use of ggplot to supplement your visualizations in Tableau. Nice work!
This might be one of the funnier analyses I’ve ever seen, although I’m quite sure you didn’t intend it to be. I was a little surprised that the number of unique words in a song has increased over time, although I could see how the rise in the popularity of rap could result in such a change. What I found funny was the linear regression model that shows as the number of unique words in a song increases, the average chart position decreases. I was also a little surprised at how high Pitbull’s average chart position is, although of course, the analysis only captures songs that enter the top 100, which could be described as a limitation or shortcoming of the analysis.
Thank you for sharing your data analysis. I found it interesting that some of the most common words are not even words that we would consider in our every day discussion such as “ooh” and “la.” The word cloud was a great visualization to demonstrate how each of the words compared to one another. Another point that I found interesting was that there were more unique words over time but there was less sentiment words. I would have thought that these would correlate positively based on the fact that more unique words would indicate that they would have more sentiment. I was surprised to find that the word “love” has decreased since the 1970s and 1980s. The conclusion that fewer number of words used seems to correlate to a higher chart ranker makes sense to me though because the more unique words, the more difficult for people to learn the lyrics. Overall, very interesting analysis.
Thank you for sharing your analysis. I thought it very interesting that the advancement of listening technology, specifically the walkman resulted in more unique words per song. The figure showing lower amount of unique words leads to a slightly better average chart position makes sense. I would expect a song with less words would have a catchier more repeated hook. A place for future work might be to explore what topics the songs with more unique words are about. Are there more words because the artist is trying to convey a deeper meaning?
I really enjoyed this report. This post showcases the evolution of song recording and statistics related to it brilliantly. I found it interesting that the number of unique words in a song has increased over time. This is great work because many people can relate to the content in this post!
I find Figure 3 to be very interesting. Though many people think that the up and coming music genres like hip-hop and rap tend to be very repetitive in the topics that they’re addressing, there is a greater variability of their music than in years past (before the 1980’s/1990’s). In addition, to Figure 1, I’d like to see a frequency bar chart of the words used in each decade and see how those compare.
This is extremely interesting! I’ve never thought to analyze top songs by specific keywords. Great work, I definitely learned something.
I think this was an excellent report that was very relatable. I think it would be interesting to analyze the overall themes of the songs. I have a constant debate with my parents about how music has changed over time. I think it would be interesting to see if music has really become more explicit, sexually suggestive and violent through the years. It seems the older generation believes this to be true but I am not so convinced.
I’m curious whether the technology advancements metnioned by several other commenters are causal or merely correlations to an increased overall human intelligence (there are many studies that suggest overall human intelligence has grown with time). Perhaps the advances in technology point to an overall increase in our access to information, which would allow songwriters to have a larger pool of both words and concepts to pull from.
Another interesting thought is how the popular genres of music over the listed time periods have impacted the reported increase in wordiness. I’d bet that an analysis of rap vs big band (think Frank Sinatra) songs would show a significant increase in overall number of words used, as well as an increase in unique words used.
Finally, it would be interesting to see a detailed analysis of how song duration plays into this phenomenon. For example, this article (https://www.vox.com/2014/8/18/6003271/why-are-songs-3-minutes-long) shows a slight, but appreciable increase in song duration (up until the 2000s). The increased duration could well be a platform for more lyrics.
Great work!
This was very interesting to read and see how words have changed over time. As far as songs having more lyrics in it I believe that this is related to a couple of things:
1. How open society has become, in the past if certain words were used the song was banned from the radio or the artist was told to change the lyrics. This could affect the data.
2. Rap and Hip-Hop with rhyming is continually coming up with new words or compound words that flow easily.
3. To Dominic’s point above, and I am probably his parents age, I think society has changed and is more forgiving. Again depending on the word the producer would of probably made the artist or writer change it.
These are just some thoughts to how words could of been chosen and in turn affected your results.
This topic is really interesting. Would be great to see some analysis on the pace of song/types of lyric and their relationship on the various population in different regions.
These visualizations are really great. I actually just completed a related data visualization project, where my group and I analyzed what makes a song popular on Spotify. We took a look at the Top 100 Songs on Spotify in 2018, and had wanted to do some sort of analysis with lyrics, but we did not have enough time to do it. Very interesting findings, thanks for sharing!
My roommate was telling me about something she learned in her marketing class. The person that create the song Old Town Road did a lot of research as to what previous char toppers did to get their songs to the top, and he used this data to successfully cultivate a chart topping song.
This is extremely interesting! I love music and learning about how it influences society using data analytics was awesome to see.
Interesting to see that the most prevalent word in songs was ‘love’. I think that just says something about us as a society and how heartbreak can truly be inspiring, even as inspiring as the initial honeymoon phase. Love is the thing that is most written about, sung about, talked about, and thought about.
I thought this was very interesting too! Love seems to have a large impact and makes people listen to songs causing it to be the most popular
This was a very interesting analysis to read. After reading it, I began thinking of further applications and more analyses that can be done beyond word-use, such as type of musical instruments used in the songs or other “sounds” frequently used. Or even further, testing the pitches/tones of words in songs to analyze trends. Overall, I found this to be very eye-opening and enjoyable to learn from.
Your conclusion from Figure 10 makes sense as I have seen far less engagement with Country music compared to Hip-Hop/Rap amongst people I know. In my opinion, rap music uses a large number of slang words. I would be interested in learning if the use of slang has increased in recent years due to the increased popularity of rap. I connected this thought to Figure 3 where the average number of unique words in a song is graphed by year. If it is true that the use of slang has increased, this could at least partially explain the increasing number of unique words in songs. Also, rap songs tend to be more fast pace, so the increased number of unique words could be attributed to songs having a higher total number of words in them.
I found this research very intriguing. I liked the visuals and thought that the dataset was very vast.
I liked the research and the visuals are easy to understand. It is surprising to find out that the number of unique words in a song has increased over time.
What I thought was most interesting in your results is that music is becoming increasingly more unique. This is very surprising to me because it seems as though many popular songs, especially hip hop and pop songs (which are the most popular based on your results) seem to have very simple lyrics with repetition of words.
I am very interested with the fact that unique word usage has gone up now compared to in the past. It seems like many of the popular songs in today’s world are generally pretty repetitive. I suppose this may be because rap and hip hip has become very popular in the last 20 years or so. Faster singing can cause the need for more words, resulting in more unique words. I also agree with the statement that the lyrical message may mean more to the artists today than they did in the past. Artists appear to be wanting to send a message to the listener, not just make a good tune.
As someone who loves listening to music and thinks they know a decent bit about the industry, I loved reading your post. Some of the results surprised me like Figure 3 and how songs are getting more unique over time. While other results like Figure 1 and Figure 2, regarding word frequency made total sense. Also, I appreciated the word cloud for Figure 2; I think that was a fun way to display data and get the readers to engage with the graph. Awesome work!
I had a great time reading this report! I wouldn’t have associated a song’s popularity with the number of unique words it contains and found this take to be extremely interesting. Also interesting to note how some words have been topping max usage over multiple decades.
I was surprised to know that Bruno Mars had the best average position in the charts in 2010
What I thought was the most suprising thing about your analysis is that it is only getting more unique. What I didn’t find surprising was that love was the most common word used.
^what Richard Berry said. I believe the word cloud was a really fun way to display the data.
Thank you for your analysis on current music trends. As music streaming is become much more popular and experienced immense growth in widespread use and access, this analysis has extreme relevance today. There are some implications of today’s popular music that can potentially explain some of your results. For example, the emergence of hip-hop in massive popularity has increased lyrical diversity, which could offset some of the results for less-lyrically inclined songs being higher on the charts (resulting in your low p-values). Further, it might be worthwhile to look at not only top 100 songs but other songs, as well, since top 100 songs are all definitively popular songs already. Further, rap artists often talk about controversial topics, which could be seen as negative sentiment in many sentiment analysis programs. The negative sentiment could also be a result of the rise in emo artists, both decades ago and now in the form of emo rap.
As a music enthusiast, I enjoyed the analysis of music over time. It’s interest to see how the top genres change from decade to decade.
Music being unique was an interesting observation. Also intrigued by the fact that Bruno Mars has one of the highest standings. for this decade
I really enjoyed reading this report. The finding I found most interesting was the decrease in sentiment over time. I am curious about which factors go in to assigning sentiment values to words. I also would find it interesting to see how the trend applies to different genres of music.
Very interesting article!
I enjoyed reading this article!
I really enjoyed reading this article. It was very interesting to learn that most popular songs use less than average unique number of words. I guess I find this a little disturbing but it also makes sense. But after reading that it made sense that technology, and I believe maybe even being more global and connecting with other cultures, added a lot of words increasing the number of unique words in a song as compared to before!
Wow! This is so interesting and makes sense when you think about it! I love listening to music and seeing that most popular songs use less unique words