Data are often transformed before analysis. This can be as simple as normalizing or reshaping the data, or more pervasive such as done here. Each transformation needs validation. The data in this dashboard were collapsed into a single summary row for each unique combination of State, Gender, Education, and Question. Each new row summarizes the original data by saving the number of responses which were pooled together and the mean of the pooled responses.ExtractMess

As shown in the chart right, this process loses individual data. The distortion of the data becomes more pronounced as the group size increases. The left table below presents the aggregated data from Minnesota for the question “Do you use Cloth Napkins/Towels”. Row 7 is a single row representing the 4 Females with a Bachelors degree in Minnesota. The position of this group on the chart is marked with a pointer.  All we know is that there were four of them and that their answers on the cloth napkin question summed to 11 which gives a mean response of 2.75. The right hand  table below shows the seven possible combinations of the responses (numbers 1-5) which sum to 11. Any one of these could be the original data for record 7. Each of the Minnesota Female BAs are now reported as having a response of 2.75 which is not a valid response on the survey form. If the individual data were available and plotted on the chart above every point would be on one of the integer y-axis lines.

Record Gender Education Group size Mean
1 Male High school graduate 3 4.00
2 Male Some college, no degree 1 3.00
3 Male Bachelors degree 2 3.50
4 Male Masters, JD, MD or PhD 1 1.00
5 Female Some college, no degree 1 2.00
6 Female Associates degree 1 1.00
7 Female Bachelors degree 4 2.75
8 Female Masters, JD, MD or PhD 1 2.00
Which row is the original? Valid responses for record 7 in table left
1 1 1 4 5
2 1 2 3 5
3 1 2 4 4
4 1 3 3 4
5 2 2 2 5
6 2 2 3 4
7 2 3 3 3
The transformation clearly invalidates the entire study.