Author: Qi Chen
There are roughly 2 billion people around the world consuming alcoholic drinks. Among them, a great number of people are considered as “binge drinking”, which means consuming 5 or more drinks per single occasion for male and 4 or more drinks for female. Drinking alcohol has caused many problems and there are about 88,000 people died from alcohol-related causes annually. Moreover, $223.5 billion has been spent for alcohol misuse problems and 75% of the cost are related to binge drinking. Therefore, the study of binge drinking is very important for the well-being of the society. In this article, we analyze the data from BRFSS (Behavior Risk Surveillance System) for disease control, trying to discover the relationship between binge drinking and other factors and assist in decreasing injuries, deaths, and costs related to binge drinking.
2.1. Software used
In this article, we will demonstrate the use of Tableau and R to run data visualization and analysis.
2.2. Data Visualization with Tableau
Graph1 shows the proportion of people who drink alcohol and who are considered as binge drinking between 2002 and 2012. We can clearly see that the proportion increases a little in the 10 years and roughly 50% of people are alcohol consumers.
We are also interested in the alcohol consumption for different counties in the United States, which is shown in Graph 2. Red means high alcohol consumption and green means vice versa. From this figure, we see that areas near northern Midwest (for example, Wisconsin) have relatively high alcohol consumption.
Similarly, we plot the rate of alcohol related death for each state in Graph 3. Again a color closer to red means a higher rate. We see relatively high percentage of death toll in some states such as Wisconsin, North Dakota and Montana. Thus it might be a good idea to adopt more strict law against drunk driving and other dangerous behaviors related to binge drinking in those areas.
2.3. Data Analysis with R
In this part, we want to demonstrate the data analysis techniques using R. At first, we create a facet plot as shown in Graph 4 to show the correlation among different variables, including sleptim1 (sleep time), marital (marital status), avedrnk2 (average drink per day), x.age80 (age), x.rfsmok3 (smoker or not), and x.rfbing5 (binge drinking).
In this graph, red is for male and blue is for female. An interesting phenomenon is that for single people, smokers and those who binge drink consist a large proportion compare to people who are married. We can also see that 50% of smokers binge drink, but less than 30% of nonsmokers binge drink, which indicates a high correlation between smoking and binge drinking.
With R package “party”, we create a conditional interface tree taking sex (gender), x.rfsmok3 (smoke condition), x.rfbmi5 (obesity) and marital (marital status) into consideration and trying to determine the conditional probability of binge drinking under each category. Readers who are interested in the conditional interface tree could refer to this link for details.
For example, from this figure we could see that the probability for a single male smoker to binge drink is 52.9%.
2.4. Results and Summary
In this article, we demonstrate the use of Tableau and R to run data visualization and analysis regarding to binge drinking.
This article is based on a course project of Industrial Data Analytics offered by Prof. Kaibo Liu in the University of Wisconsin-Madison in Spring 2015. Thank Prof. Liu for his instruction and also thank Criss Ross, Corey Lester and Wyatt Suprise for their initial work.