All posts by Richard Holm

About Richard Holm

University of Washington PhD General Electric Six Sigma Black Belt Cofounded two successful companies Statistical consultant at a major research center Love working with meaningful data.

HighMissing

Review – Green Living

Introduction.

I started working with Tableau Desktop just after Stephen and Eileen McDaniel gave a presentation to the Seattle Tableau User’s group (March, 2013).  On Tableau’s community site they named Eileen’s Green Living Dashboard as a good example of visual data presentation.  This sounded like a good initial visualization for me to work with.   I started by doing a quick review.  It turned up several problems, which triggered a more detailed review.  Eventually, the quantity and severity of problems suggested that the visualization should never have been created. The dashboard did turn out to be a great example  ̶  one which offers reviewers many opportunities to discover problems.   The dashboard is available at the McDaniel’s site.

Summary of Review

Story

The dashboard presents a model for improving programs which promote green activities. It suggests working with experts to:

  • list important green activities which are appropriate for the locality,
  • develop a survey which measures the frequency of those activities,
  • classify the activities into meaningful categories,
  • administer the survey,
  • and suggest best practices for local sustainability programs based on the results.

The model presents 21 green activities which are pooled into these three categories: save money, easy to do, and beneficial to the environment.  The results found that “easy to do” activities were done most often, those that “save money” next and those “most beneficial to the environment” least often.  These stated outcomes are no surprise and of little interest.  However, the methodology could be quite interesting.  How were the 21 items ranked and placed into the three categories.  Did each item have a save money score in dollars and cents, an easy to do score  in minutes per day, and an environmental score in yearly pounds of carbon saved?  Unfortunately none of this information is available on the site or from the author.  We are left with So What?.

Survey

The survey method (phone, personal interview, mail, or on-line) and the response rate are not reported or available.

The respondents’ education levels are significantly higher than state norms.

The respondents’ distribution by state is significantly different from census data.

Measurement – Likert scale

The five point scale for the Likert items is poorly constructed.

There is no information on what Likert items were assigned to each Likert scale.

Measurement – Clarity of Questions

The wording of the questions is often ambiguous.

Several of the questions do not apply to all respondents.

Missing data

Analysis of missing data suggests that some of the questions were not applicable for some respondents.   There were numerous questions which were left blank (i.e. missing data) but the three pooled categories: save money, easy to do, and beneficial to the environment were calculated for each respondent.  The process used to handle missing values for pooled categories is not specified.

Transformation

The study is based on aggregated data. Unfortunately the aggregation loses all individual data and renders the dataset useless. The original data are not available.

Transparency

The original data and key information about the survey (method, response rate), the questions (what items went to which scales) and how missing values contributed to the three summary categories are not presented on the site.  Attempts to get additional information resulted in an email stating that they were too busy to respond to questions.

Communicate

There are several minor issues on the dashboard.

  • The average response is displayed with different resolution in each segment (the map shows integer values, the chart one decimal place, and table two decimal places).
  • The map and table show “Count of Persons” while the chart shows “Count of Responses”.
  • The dashboard header uses “Green Lives” while the blog and link use “Green Living”.

These issues foreshadow the lack of rigor apparent in all categories reviewed.  The review found enough serious problems to invalidate the data and the methods.  Therefore and extended review of Communication is not required.

Categories used in this review (see General topics and Special topics in Tools>Checklists) : 
Key: Red=Summarized here and detailed at link, Black=This page only, Strike through=Not reviewed
General: Argument, Communicate, Comparisons, Measurement, Research, Review, Statistics, Story, Transparency 
Special: Causality, Missing, Survey, Time-series, Transformations 

GreenLiveDashboard

Story – Green Living

All text presented with the dashboard is duplicated below. It is paraphrased and clarified in the Story section of the review summary.

Green living dashboard — green activities in the daily lives of Americans

GreenLiveDashboard

Which “green” activities are consumers performing in their everyday lives?

Activities that-

  • Save them money?
  • Are easy to do?
  • Are the most beneficial to the environment?
  • Some combination of the three?

Developing an innovative approach, we combined insights from a recent survey of consumer behavior with expert opinion on green activities and found that:

  • Activities that experts rank as the most beneficial for the environment are not always performed frequently by consumers.
  • Economic benefit to the consumer is a stronger predictor of frequently-performed activities than environmental benefit.
  • However, convenience to the consumer is the best predictor of green behavior!
  • Decision-makers for sustainability programs can tailor this method to their particular location by:
    • Compiling a list of green activities specific to their region.
    • Surveying local consumers and experts.
    • Altering which dimensions are included in assessing the importance of various green activities.
  • “Newcomer” communities can maximize the impact of launching their green programs by:
    • Prioritizing activities that are convenient and economical for the consumer.
    • Motivating consumers with educational programs and incentives.
    • Waiting until the environmental program has gotten off the ground before encouraging activities that are low in convenience and economic benefit- unless they can be financially subsidized.
  • “Veteran” communities can prioritize the activities by environmental benefit:
    • Activities that are most convenient can be financially penalized for non-compliance.
    • Less convenient activities can have incentives for performance.
RuleAsWorded

Measurement – Likert scale

Data were collected by survey with 21 Likert-type items. Each item allowed respondents to rate the frequency of performing specified “green” activities. Environmental experts combined the items into 3 scales (Most Convenient, Most Economical, and Most Environmentally Beneficial). All items use a 5 point scale which has the ordered options: Always, Regularly, Sometimes, Rarely, and Never.

We don’t have access to the survey instrument but I assume each item looked something like:

How often do you remove Roof Racks when not needed?

Always Regularly Sometimes Rarely Never

 

A model for a good 5 point survey scale is a 6 inch ruler. Scale words should be carefully selected to represent the positions of the numbers 1-5.
SixInchRule

Dictionary definitions of the survey options suggest problems with the rating scale. Regularly and Sometimes are out of order. Rarely seems closer to Never than Regularly.
value Survey options Definition (American Heritage® Dictionary) Value (based on Definition)
1 Always At all times, invariably 1
2 Regularly Customary, usual, or normal 3
3 Sometimes Now and then; from time to time; occasionally 2
4 Rarely Not often; infrequently 4.5
5 Never Not ever; on no occasion; at no time 5

The wording violates three Likert item best practices.

Best practice Violation
Symmetric Is Sometimes the midpoint between Always and Never?
Equidistant Is the difference between Always and Regularly the same as the difference between Rarely and Never?
Extremes Some responders shy away from selecting absolutes (Always, Never). Using extremes tends to make the 5 point scale more like a 3 point scale.

Based on the wording, the 6 inch rule used in the survey has:

  • the ends chopped off,
  • 2 and 3 swapped, and
  • 4 close to 5.

It may look something like this:RuleAsWorded

While these problems may not invalidate the data, the poor choice of scale words will add noise to the measurements. This is unfortunate, especially in studies like this with small sample sizes.

 

Measurement – Clarity of of Questions

Survey questions must be unambiguous. Poorly worded questions require the respondents to interpret the meaning. In essence, this has respondents answering different questions. Questions which have high responses, at either end of the scale, should be reviewed. If the extremes indicate problems the rest of the questions should also be reviewed.

HighAlwaysNever
The table below presents three questions with multiple interpretations. Each interpretation would lead to a different response.

Question Presumed intention Alternative reading
Run Appliances When Full Wait until full before starting Once it’s full I always start it
Keep Tires Inflated Maintain recommended psi +- 3 lbs. I don’t drive on flat tires
Refillable Coffee Cups Don’t use disposable cups All cups are refillable

Missing Values – Green Living

Missing values

Always look for and analyze causes of missing data. Causes can be related to respondents attributes (if you don’t have a child it is hard to answer a question on baby food) or poorly designed measurement tool (an unintelligible question is difficult to answer).

The five questions with the highest number of missing values are charted below. Read the questions and see if you come up with a potential problem. Then think up a way to test that assumption.

HighMissingPeople may not have a roof rack or a garden (for drought-resistant plants and compost questions) and they may abstain from coffee. If a question does not apply to a respondent the only options are to not answer (i.e. a missing value) or to select Never. The following chart indicates that questions with high counts of missing values also have high Never rankings, suggesting that they may be poor questions.
NeverHighForMissing

This effect is much stronger than the figure indicates. Bad data transformations, discussed below, significantly reduced the number of Never responses.

Missing values effect subsequent analyses. The Likert scales are composites of several Likert items. How are the scales calculated for when some items are missing? Two common options are to only calculate scales for respondents who answered all items in that scale or to fill in the missing values with a representative value (e.g. the average for all folks who did respond or the average of the items in that scale which the respondent answered, or the grand average of all responses, or…). Many of the 21 questions have missing values. The three scales based on those questions (Most Convenient, Most Economical, and Most Environmentally Beneficial) have no missing values. The missing item values were plugged with a representative value when pooled into the three scales. There is no information available on how the plugged values were calculated.

Transformations – Green Living

Data are often transformed before analysis. This can be as simple as normalizing or reshaping the data, or more pervasive such as done here. Each transformation needs validation. The data in this dashboard were collapsed into a single summary row for each unique combination of State, Gender, Education, and Question. Each new row summarizes the original data by saving the number of responses which were pooled together and the mean of the pooled responses.ExtractMess

As shown in the chart right, this process loses individual data. The distortion of the data becomes more pronounced as the group size increases. The left table below presents the aggregated data from Minnesota for the question “Do you use Cloth Napkins/Towels”. Row 7 is a single row representing the 4 Females with a Bachelors degree in Minnesota. The position of this group on the chart is marked with a pointer.  All we know is that there were four of them and that their answers on the cloth napkin question summed to 11 which gives a mean response of 2.75. The right hand  table below shows the seven possible combinations of the responses (numbers 1-5) which sum to 11. Any one of these could be the original data for record 7. Each of the Minnesota Female BAs are now reported as having a response of 2.75 which is not a valid response on the survey form. If the individual data were available and plotted on the chart above every point would be on one of the integer y-axis lines.

Record Gender Education Group size Mean
1 Male High school graduate 3 4.00
2 Male Some college, no degree 1 3.00
3 Male Bachelors degree 2 3.50
4 Male Masters, JD, MD or PhD 1 1.00
5 Female Some college, no degree 1 2.00
6 Female Associates degree 1 1.00
7 Female Bachelors degree 4 2.75
8 Female Masters, JD, MD or PhD 1 2.00
Which row is the original? Valid responses for record 7 in table left
1 1 1 4 5
2 1 2 3 5
3 1 2 4 4
4 1 3 3 4
5 2 2 2 5
6 2 2 3 4
7 2 3 3 3
The transformation clearly invalidates the entire study.
shows reduction of preventable diseases

Bill Gates: Too many kids are dying, but we have the solutions

I love this graph because it shows that while the number of people dying fromOriginal treemap communicable diseases is still far too high, those numbers continue to come down.  In fact, fewer kids are dying, more kids are going to school and more diseases are on their way to being eliminated.  But there remains much to do to cut down the deaths in that yellow block even more dramatically.  We have the solutions.  But we need to keep the up support where they’re being deployed, and pressure to get them into places where they’re desperately needed.

- Bill Gates is Co-Chair of the Bill and Melinda Gates Foundation.

There is an interesting discussion about Thomas Porostocky’s infographic at Stephen Few’s site Perceptual Edge.  This is my contribution to that discussion.

Related information

Washington Post 27 Dec 2013 Source of Bill Gates’s quote and a copy of the Wired graphic.
Wired, Lee Simmons 15 Nov 2013 Source of the infographic and introduction to GBD Compare.
@BillGates 18 Nov 2013 Tweet the graphic to a large following
GBD Compare Source of the data; site supports interactive data  exploration.
Perceptual Edge 10 Jan 2014 Bryan Pierce and Stephen Few redesign the graph.

Key elements of Gates’s story

  1. The number of kids dying from preventable diseases… continue[s] to decline.
  2. Those numbers are still far too high.
  3. Fewer kids are dying.
  4. We have the solutions.
  5. But we need to keep the up support where they’re being deployed, and pressure to get them into places where they’re desperately needed.
  6. More diseases are on their way to being eliminated.
  7. More kids are going to school

Reviewing the key statements and related information generated the following.

  1. Continue[s] to decline:  Suggests a time-series: line plot deaths x year.
  2. Still far too high:  Needs a comparison group.  Try Developed vs Developing countries.
  3. Fewer kids are dying:  Filter to show only kids data.  Age < 15.
  4. We have the solutions:  Low death rates for Developed countries supports this.  They also provide an achievable target.
  5. Solutions deployed vs. desperately needed:  Countries differ in commitment.  Map showing countries color coded based on outcomes.  Variation in adjacent regions suggest governance issues.  Immunizations might add support to governance being an important factor.
  6. Diseases being eliminated:  Time-series of WHO diseases targeted for elimination.  Line per disease by developed vs. developing.  This is a component of 1 and may add more noise than signal.
  7. More kids are going to school:  This is an important but unrelated story.  Drop it.

Points 1-4 tell the main story focus on them.  Then see if 5 & 6 add to or distract from the main story.

The original Wired article by Lee Simmons has a secondary focus: introducing the IHME site and its data exploration capabilities.  The availability of a good interactive data site allows the redesign to focus on cleanly telling the core story.  The redesign can direct readers to the IHME GBD Compare tool to dig into the details. Treemaps are the primary visualization tool in GBD Compare which suggests that Porostocky may have used a treemap to provide a clean transition to the sub story.

What I see when I read the key story elements.

Gates in two lines with text

While this is how I conceptualize the graph it is not how I would present it.  The following graph is my publishable version paired with a loose restatement of the Gates quote.

I love this bittersweet graph.  It plots the Years of Life Lost (YLL) due to kids dying from Gates in two lines no textpreventable diseases.  The lower rates in developed countries are a testimony to the effectiveness of basic public health practices: immunizations, medications, clean water, and neonatal care. While the number of kids dying from these diseases in developing countries is still far too high, those numbers continue to come down.  There remains much to do to cut down the deaths in developing countries and speed the decent of the red line.

The map left color codes Africa YLLs by countrycountries based on level of YLL (red = high, blue = low).  The variation amongst adjacent counties indicates that governance matters. We have the solutions.  We need to keep up the support where they’re being deployed, and pressure to get them into places where they’re desperately needed.

See Wired for an introduction to these data and GBD Compare for detailed data exploration.

Related quote

An early graph showing the prevalence of deaths from preventable diseases in 1858.  Think of all the preventable Years of Life Lost in the ensuing 150 years, the places and the causes.  Those charted below occurred in hospitals.

1024px-Nightingale-mortality
The blue…represent… deaths from preventable diseases
…the red…deaths from wounds
…the black…deaths from all other causes
— Florence Nightingale 1858

 

Xbar-s chart

Control charts: a lesson in variation

A manufacturing plant has two machine operators with different styles. At shift start, both would carefully set up the machine and begin making parts. Operator A would measure a sample of parts each half hour and tune the machine accordingly. If the mean diameter was .002” oversize, Operator A would adjust the machine to cut .002” smaller. Operator B measured and adjusted the machine only when it was restarted (e.g. after maintenance, breaks, and lunch). The plant manager noticed the different approaches and measured a sample of parts made by each operator. Operator A’s parts showed more variation. Yes, methodical Operator A was making poorer parts.

Understanding the sources of variation explains this surprising outcome. The X-bar chart  plots the average and standard deviation of the diameters of five consecutive parts taken each half hour.

Xbar-s chart

These charts represent a simple and elegant use of statistics and logic. Manufacturing processes have two primary sources of variation, common and special causes. Common variation is inherent in the system; the only way to improve it is to get a new system (e.g. buy a better machine). The rest of the variation is due to special causes, which can be controlled.

The insight at the heart of control charts is that variation within subgroups is due to common causes while variation between subgroups is special cause variation. Consecutive parts minimize special cause variation. They are made from adjacent sections of raw material, by the same operator in a similar frame of mind, at similar ambient and coolant temperatures and with the machine near the same maintenance level.

The control chart software plots the mean of each sample and uses the within subgroup variation as the estimate of common variation. These estimates are used to calculate control limits such that points outside the control limits are a reliable indication that the process should be studied and adjusted.. If a sample mean falls outside the control limits, adjust the machine. If it falls inside the limits, it is in the range of normal machine variation; leave the process alone. The chart indicates that the machine was “in control” the entire 10 hour run. No adjustments needed, yet Operator A adjusted the machine 20 times. Special cause variation is not reduced by adjusting the machine. In fact, unnecessary machine adjustments are another source of special cause variation. Operators need to know when a sample indicates that the machine is no longer running well. The control limits (0.995 and 1.005 for mean, 0.0078 for standard deviation) provide the needed screening.

It is not enough to do your best; you must know what to do, and then do your best. — W. Edwards Deming

 

Photo of two rocks, larger 25 times heavier than smaller.

Drop two rocks

On your next walk pick up two rocks which vary in size; drop them simultaneously. Aristotle, in 350 BCE, stated that the rocks would fall with a speed directly proportional to their weight.

TwoRocks

The large rock on the dinner plate is 25 times heavier than the small one. I dropped them from eye level. If Aristotle was correct, the large one should have hit the   ground about the time the small one passed my lips. Instead, they fell at the same speed and hit the ground at the same time. Apparently Aristotle never did this simple test; nor did anyone else until Galileo presented a thought experiment in 1628. For almost 2000 years the entire community of natural philosophers accepted and propagated this delusion. For good reason, this is how pre-scientific natural philosophy worked. Starting with a given set of truths, the philosopher used deductive reasoning to arrive at new truths. Aristotle’s reasoning was sound, the starting truths were wrong. Once a truth– it stayed a truth

Bias: we all have it.
In 11 of the past 20 years the Gallup poll has asked the following question:
Which of the following statements comes closest to your views on the origin and development of human beings –

  • human beings have developed over millions of years from less advanced forms of life, but God guided this process,                        (blue line,      mean =  37%)
  • human beings have developed over millions of years from less advanced forms of life, but God had no part in this process,         (orange line,  mean = 12%)
  • God created human beings pretty much in their present form at one time within the last 10,000 years or so?                                             (grey line,      mean = 45%)
  • No opinion                                                                     (yellow line,   mean =   6%)

GallopHumanOriginThe groups have maintained the same rank order in every poll for 20 years.  A Newsweek poll in 2007 had similar values and the same rank order.
The rocks I dropped as part of the Galileo story are millions of years old.  This information should eliminate the top line in the chart  and raise questions about the second line, but it hasn’t and it won’t for a very long time. The 80% of the population in the top two lines include teachers, doctors, sales people, business owners, and executives who routinely make valid data-based decisions on other issues. They are friends and neighbors. It is critical to understand that we each have beliefs which bias our data decisions.

If an analyst has made a choice, he has also made a value judgment – Jonathan Koomey