What is a sensible group to describe?

Context note: this is a sub-part of the fundamental concepts of statistics section of the computational literacy for humanities and social sciences course. You can use this to teach yourself some fundamental concepts of statistics. However, if you want to understand more broadly when you might want to use them, you're better off going through the whole course.

In the previous section when talking about representativeness, I noted that if one wants to get an idea of the height distribution of Finns, one might want to sample from both men and women, as well as from people from different backgrounds. In the assignment section, I questioned whether it is even possible or sensible to create a representative sample of a "language as a whole". What all of this is going towards is the question: when is it meaningful to describe a group as a single entity?

Usually, the answer is that preferably, we'd like to describe as homogeneous groups as possible. For example, if all of sex, country of origin and economic status affect adult height, we'd ideally like to calculate separate descriptions for all combinations of these. This way, we'd know not only that the average height of a Finn is 173cm (which, to really be interpreted, needs at least the knowledge that 50.68% of those taken into this calculation are females), but that the average height of Finnish females is 166cm, while the average height for Finnish males is 180cm.

This actually holds for our age at death data as well. If we split the data into sexes, we can see that the distributions for males and females are decidedly different:

Immediately looking at these distributions, one is tempted to make comparisons between them. And indeed, that is the subject of the next section.

Last updated