Computational literacy
  • Computational literacy for the humanities and social sciences
  • Three approaches to computational methods
  • History of humanities computing
  • Data processing: fundamental concepts of programming for humanists and social scientists
  • Data processing: regular expressions
  • Data analysis: fundamental concepts of statistics
    • Understanding and describing groups
    • What is average?
    • Uncertainty in describing groups
    • What is a sensible group to describe?
    • Comparing groups
    • Understanding relationships
  • Digging into a method: topic modeling
  • Final project
  • Where to continue?
  • Course instances
    • Helsinki fall 2021
    • Helsinki fall 2020
    • Helsinki fall 2019
    • Helsinki fall 2018
  • Holding area for unfinished content
    • Data
    • Easy tools for acquiring, processing and exploring data
    • Computational data analysis method literacy
    • Open, reproducible research and publishing
Powered by GitBook
On this page

Was this helpful?

  1. Data analysis: fundamental concepts of statistics

What is a sensible group to describe?

PreviousUncertainty in describing groupsNextComparing groups

Last updated 3 years ago

Was this helpful?

Context note: this is a sub-part of the section of the . You can use this to teach yourself some fundamental concepts of statistics. However, if you want to understand more broadly when you might want to use them, you're better off going through the whole course.

In the previous section when talking about representativeness, I noted that if one wants to get an idea of the height distribution of Finns, one might want to sample from both men and women, as well as from people from different backgrounds. In the assignment section, I questioned whether it is even possible or sensible to create a representative sample of a "language as a whole". What all of this is going towards is the question: when is it meaningful to describe a group as a single entity?

Usually, the answer is that preferably, we'd like to describe as homogeneous groups as possible. For example, if all of sex, country of origin and economic status affect adult height, we'd ideally like to calculate separate descriptions for all combinations of these. This way, we'd know not only that the average height of a Finn is 173cm (which, to really be interpreted, needs at least the knowledge that 50.68% of those taken into this calculation are females), but that the average height of Finnish females is 166cm, while the average height for Finnish males is 180cm.

This actually holds for our age at death data as well. If we split the data into sexes, we can see that the distributions for males and females are decidedly different:

Immediately looking at these distributions, one is tempted to make comparisons between them. And indeed, that is the subject of the .

next section
fundamental concepts of statistics
computational literacy for humanities and social sciences course
Age at death distributions for Finnish males and females