

Excellent point.
For very long, I have thought vocabulary alone would be enough footprint to ID someone. If you had enough sample of their writing ofc. It’s like browser fingerprints. The words you use, and how often you use them, is a fingerprint. As UnknowableNight points out, some patterns are very unique, nearly enough alone. Yet even without those, you have enough signals. Sentence length. Whether you spell colour or color. Regional expressions. Word use frequency. Whether you bring in vocabulary used mostly in a certain profession, like medicine or law. Whether you use more paragraphs or more single liners. None alone are enough. All together, with the 100 other ones smart people can figure out? Probably enough.
Long ago it would be too much effort, only good for targeted cases. Today? Maybe you can do it dragnet, seeking to ID every person who writes online.
I do not know if that happens today. Yet I do not see anything to stop it.



They probably only need a reliable IRL ID for one of them. That’s a weaker requirement than posting under your name. Your name can be discovered other ways. For example browser fingerprinting, where that fingerprint is also associated with a “KYC” login elsewhere. There is a whole industry for using non-name signals to ID people. Big data is powerful.
Ofc there are ways to frustrate that. Yet the attacker only has to win once. The defender has to win every time.
But it will be statistical in nature. They’ll have some confidence attached to it. That could be very low, or quite high. Depends on how much you have disclosed online.