Always look for and analyze causes of missing data. Causes can be related to respondents attributes (if you don’t have a child it is hard to answer a question on baby food) or poorly designed measurement tool (an unintelligible question is difficult to answer).

The five questions with the highest number of missing values are charted below. Read the questions and see if you come up with a potential problem. Then think up a way to test that assumption.

HighMissingPeople may not have a roof rack or a garden (for drought-resistant plants and compost questions) and they may abstain from coffee. If a question does not apply to a respondent the only options are to not answer (i.e. a missing value) or to select Never. The following chart indicates that questions with high counts of missing values also have high Never rankings, suggesting that they may be poor questions.

This effect is much stronger than the figure indicates. Bad data transformations, discussed below, significantly reduced the number of Never responses.

Missing values effect subsequent analyses. The Likert scales are composites of several Likert items. How are the scales calculated for when some items are missing? Two common options are to only calculate scales for respondents who answered all items in that scale or to fill in the missing values with a representative value (e.g. the average for all folks who did respond or the average of the items in that scale which the respondent answered, or the grand average of all responses, or…). Many of the 21 questions have missing values. The three scales based on those questions (Most Convenient, Most Economical, and Most Environmentally Beneficial) have no missing values. The missing item values were plugged with a representative value when pooled into the three scales. There is no information available on how the plugged values were calculated.

