Month: March 19, 2018

Interpreting uninterpretable P-values.

Lately, I’ve been trying to learn more about open science and how it relates to research I’ve done, research I’d like to do, and how it relates to sociolinguistics in general. One topic that comes up regularly when talking about open science is pre-registration. For those who aren’t familiar with this process, pre-registration refers to publishing a detailed, time-stamped description of your research methods and analyses on some repository before ever actually looking at your data. Doing so increases transparency for the research and helps the researcher avoid P-hacking, aka data fishing1. There are apparently some arguments against pre-registering research, but I’ve yet to see any that don’t mischaracterize what pre-registration actually is, so it seems like a no brainer to do it.

But in looking into the actual mechanics behind producing a pre-registration, I ended up watching the following webinar from the Center for Open Science (COS) about using their Open Science Framework (OSF) to publish pre-registrations, which included this curious description of how to interpret P-values in different kinds of research2:

Basically, the claim is that pre-registration makes it clear which analyses are confirmatory3 and which are exploratory, which is great, but the other part of the claim is that P-values are uninterpretable in exploratory research. In other words, any P-values that are generated through analyses that weren’t pre-registered, i.e. through data fishing, are meaningless.

I can understand why this point is made, but I think it’s a bad point. Pre-registration does seem to create another level in the hierarchy of types of research — i.e. exploratory (observational, not pre-registered) > confirmatory (observational, pre-registered) > causal (experimental) — but I see no reason why P-values are uninterpretable at the exploratory level. It would seem that P-values are perfectly valid at all levels, and all that changes is how they should be interpreted, not whether they can be interpreted at all. To me, in experimental research, a P-value helps one argue for a causal relationship, whereas in confirmatory observational studies, a P-value helps one argue that some relationship exists, though not necessarily a causal one, and in exploratory observational research, a P-value simply suggests that there might be a relationship and so that potential relationship should be explored further in future research.

In the case of my thesis, I did employ P-values via Fisher’s exact test of independence, but I didn’t pre-register my analyses. That’s not to say that all my analyses were exploratory, just that I have no proof that I wasn’t data fishing. Indeed, I included variables that didn’t make any sense to include at all4, but still somehow turned out to be statistically significant, such as whether there was a relationship between the person who coded each token of my linguistic variable, (lol), and how that variable was realized. The webinar initially made me panic a bit, asking myself if it was irresponsible to have included P-values in my analyses, but after further reflection, I think it was completely justified. Most of my analyses were confirmatory anyway, even though I don’t have proof of that, and those that were arguably exploratory were still more useful to report with P-values as long as an explanation for how to interpret those P-values was also included, which is perhaps the one place where I could’ve done better.

Ultimately, while I can understand why there’s so much focus on data fishing as a negative thing, I think it’s important to not overshoot the mark. P-values can certainly be misused, but that misuse seems to come down to not providing enough information to allow the reader to properly interpret them, not to whether they were included when they shouldn’t have been.


1. I prefer the term data fishing, which can be more easily taken in both a negative and a positive way, whereas P-hacking sounds like it’s always negative to me. The Wikipedia article on data fishing gives a pretty clear explanation of what it is, for those who are unaware.
2. The webinar is really good, actually. I would suggest that anyone who’s new to open science watch the whole thing.
3. In this case, the speaker seems to be using the term “confirmatory research” as something different from “causal research”, otherwise their description doesn’t make any sense.
4. In fact, my thesis advisor didn’t see the point in me including these variables at all.

The importance of anonymizing groups under study.

It’s been a long time since I’ve written a post here, but I promise, there’s a good reason: I was finishing up my master’s thesis. However, now that it’s submitted, I can talk a bit about what I did.1

Because I made use of social network analysis to detect communities in the study, there was little motivation to class subjects by social variables like ethnic group, race, religion, etc. In fact, I wouldn’t have been able to do so if I wanted to, because I assembled the corpus from tweets sent by some 200k people. Ultimately, the only variable that I can call a social variable that I used was the number for the community to which the subject belonged.

The advantage of this situation is that I completed avoided imposing stereotypes on the subjects or minimizing the differences between their identities by avoiding classifying them with people from elsewhere. A typical example of the problem in sociolinguistics is the variable of race. Some celebrated studies, like Labov’s (1966) and Wolfram’s (1969), classified their subjects according to their races, so that one ends up identifying some as African-American, for example. Even if these subjects don’t live together nor interact, they inevitably end up being viewed as constituting a single group. From there, these groups’ diverse identities are minimized.

This problem has already been recognized in sociolinguistics, and several solutions have been proposed, mainly the implementation of the concept of communities of practice and more reliance on self-identification. For example, in Bucholtz’ (1999) study, she studied a group whose members she identified according to an activity: being a member of a club. Unfortunately, she applied a label to the member of this club; she called them “nerds”. This name links them to nerds from elsewhere, regardless of the differences between this group and other groups of nerds. She wasn’t able to avoid minimizing the identity of the group that she studied by the simple implementation of the concept of communities of practice. Likewise, Eckert (2000) relied on self-identification of her subjects as either “jock” or “burnout”, but one ends up with the same problem: even if the subjects self-identify, they can choose labels that link them to distant groups. Jocks surely exist elsewhere, but these others jocks can be exceptionally different from the jocks in Eckert’s study. So, one cannot avoid minimizing identities by the simple reliance on self-identification, either.

In my thesis, I identified communities simply with ID numbers, so I never classified the subjects with other groups to which they didn’t belong. The fact that I used social network analysis to automatically detect these communities allowed me to more easily avoid applying labels to the subjects that could minimize their identities, but this is possible in any study, even if the researcher employs classic social variables. In the same way that one anonymizes the identities of individuals, one can anonymize the identities of the groups under study. Why is it necessary to know that the races in a study are “black” and “white or that the religions are “Jewish” and “Catholic”? If a researcher is interested in the way that their subjects navigate stereotypes that are relevant to their lives, that’s one thing, but most variationist studies don’t take up this question, so most studies can do more to protect marginalized people.


1. For those who don’t know the topic of my thesis, I analyzed the use of the linguistic variable (lol), made up of lol, mdr, etc., on Twitter.


Bucholtz, M. (1999). “Why Be Normal?”: Language and Identity Practices in a Community of Nerd Girls. Language in Society, 28(2), 203–223. https://doi.org/10.1017/s0047404599002043

Eckert, P. (2000). Linguistic Variation as Social Practice: The Linguistic Construction of Identity in Belten High. Madlen, MA: Blackwell Publishers, Inc.

Labov, W. (2006). The Social Stratification of English in New York City (2nd ed.). Cambridge, England: Cambridge University Press. (Originally published in 1966)

Wolfram, W. (1969). A sociolinguistic description of Detroit negro speech. Washington, D.C: Center for Applied Linguistics.

© 2025 Josh McNeill

Theme by Anders NorenUp ↑