Why do you talk the way you do?
You’ve lived somewhere—perhaps many somewheres. Your friends or family have influenced you. You’ve probably even thought about how you like some linguistic features and want to avoid others. People have long been aware that these factors influence why different groups speak differently, but the systematic study of dialect began in the eighteenth and nineteenth centuries, as part of the same scientific movements that gave us the Linnaean catalogue of the living world and the periodic table of the elements. While some cataloguers set out with nets to study butterflies, or burned candles inside jars to distill gases, others pored over ancient scripts and compiled lists of verbs.
But what kind of net can you use to capture living language? A German dialectologist named Georg Wenker thought he had an answer: he sent out a postal survey to schoolteachers across German-speaking Europe and asked them to translate forty sentences (such as “I will slap your ears with the cooking spoon, you monkey!”) into the local vernacular. It was a wise enough idea: teachers would be guaranteed to be able to read and write, and even if Wenker didn’t know the name of every single village teacher, surely the village post office in, say, Quedlinburg, could pass on his letter to the Quedlinburg village school. But in order to make it easy for the schoolteachers to respond, Wenker didn’t provide them with any training in phonetic notation. This meant that if one teacher wrote “Affe” (monkey) and another teacher wrote “Afe” or “Aphe,” it was anyone’s guess whether they were trying to represent the same pronunciation.
A French linguist named Jules Gilliéron thought he had a better method. Rather than send out letters like Wenker, he’d send out a trained fieldworker to administer all the surveys. Back in Paris, Gilliéron could get a start on analyzing the results as they came in. The fieldworker he selected was a grocer named Edmond Edmont, who reportedly had a particularly astute ear (it’s not clear whether this referred to the acuity of his hearing or his attention to phonetic detail, but either way, it got him the job). Gilliéron trained Edmont in phonetic notation and sent him off on a bicycle with a list of 1,500 questions, such as, “What do you call a cup?” and “How do you say the number fifty?” Over the next four years, Edmont cycled to 639 French villages, sending results back to Gilliéron periodically. In each village, he interviewed an older person who had lived in the region for their entire life, counting them as representative of the history of the area.
Both Wenker’s and Gilliéron’s dialect maps are meticulous, fascinating, and complicated, but if you know how to read them, you can trace the line between the villages in the north where French people around 1900 called Wednesday mercredi and those in the south where they called it dimècres. Or you can read Wenker’s hand-drawn map of Germany showing which regions pronounced “old” as alt, al, or oll. If you studied French or German in school, it’s easy to think that they’re each a single, unitary language, but that’s just the formal version: the maps showcase how these languages are truly constellations of dialects, hundreds of varieties that differ slightly from village to village.
But these spectacular linguistic atlases are also limited. If Edmond Edmont, toward the end of his four-year odyssey, realized that different regions also had different words for bicycle, he either had to bike back through those same 639 French villages, or make a note of it and just hope that some future scholar would undertake a second linguistic Tour de France. Georg Wenker’s project was almost too successful: he ended up with more than 44,000 completed surveys between 1876 and 1926, more than he could possibly analyze by hand. (His colleagues continued analyzing his results for decades after his death.) These days, a reasonably competent data miner can code up a county-level map of where Americans tweet “pop” versus “soda,” where they switch from “y’all” to “you guys,” or which states prefer which swear words—all in less time than it took Edmond Edmont to bike from Paris to Marseille.
As technology has advanced, so has dialectology. In the 1960s, the Dictionary of American Regional English sent out fieldworkers in “Word Wagons” (green Dodge vans outfitted with a fold-out bed, an icebox, and a gas stovetop) to record locals in over a thousand communities on briefcase-sized reel-to-reel tape recorders. In the 1990s, the creators of The Atlas of North American English let their fingers do the biking and conducted telephone interviews with 762 random people, at least two from each major urban area. In 2002, the Harvard Dialect Survey produced a linguistic questionnaire that anyone could complete online: thanks to media coverage in the New York Times, USA Today, and many other outlets, over 30,000 people did.
But if you’ve ever hung up on a telemarketer or fudged your answers to a “Which Disney Princess Are You?” quiz, you know some of the potential problems with phone and internet surveys. On the phone, researchers could record audio, but they still had to have an individual conversation with each person they surveyed. While operating a Word Wagon or a linguistic phone bank is a fascinating job for the right type of language nerd (um, hi), such nerds still need to be paid for the massive amounts of time and labour they’re putting into the interviews. Internet surveys are faster and cheaper to conduct at a huge scale, but people still don’t always accurately report on their own language usage.
It’s not completely hopeless: linguists have devised several methods for getting at more natural-sounding speech. One is to ask open-ended questions (“Could you describe your family?” rather than, “How do you pronounce ‘aunt’?”). Another is to ask about an exciting or emotional event, to get people thinking about the content rather than the words (a popular, though perhaps rather morbid, question is, “Can you tell me about a time you thought you might die?”). A third is to work with a community as an insider: many a linguist has analyzed the speech of their own children, grandparents, or extended family, or else worked with a local collaborator to conduct interviews. The Word Wagon linguists would even carry small notebooks, in case they overheard any interesting language at the grocery store, so that they’d remember to follow up on it when they got the tape recorder out.
But one particularly effective way of getting at unself-conscious speech is on the internet. Not only can researchers look at countless examples of public, informal, unselfconscious language, from videos to blog posts, but in many cases, they’re also searchable. No more hours of transcribing audio files, hoping for a few examples. Twitter is particularly valuable: even the most casual of searchers can look for a word or phrase and form an impression of how people are using it. They might notice that a lot of people who used “smol” in 2018 also appeared to be fans of anime or cute animals, or that “bae” was used primarily by African Americans until around 2014, when it started appearing in tweets by white people, only to get co-opted by brands shortly thereafter.
Dialect maps are just the beginning of our linguistic differences: every time we talk with some people more than others, we have the chance to develop a shared vocabulary, whether within families, friends, schools, workplaces, hobbies, or other organizations. Family dialects are often inspired by a cute word that comes out of a kid’s mouth (Queen Elizabeth II was apparently nicknamed “Gary” by a young Prince William, who was unable to say “Granny” yet), but the peak importance of in-group language happens at a later life stage: teenagehood.
Remember how you learned about swearing? It was probably from a kid around your age, maybe an older sibling, and not from an educator or authority figure. And you were probably in early adolescence: the stage when linguistic influence tends to shift from caregivers to peers. Linguistic innovation follows a similar pattern, and the linguist who first noticed it was Henrietta Cedergren. She was doing a study in Panama City, where younger people had begun pronouncing “ch” as “sh”—saying chica (girl) as shica. When she drew a graph of which ages were using the new “sh” pronunciation, Cedergren noticed that sixteen-year-olds were the most likely to use the new version—more likely than the twelve-year-olds were. So did that mean that “sh” wasn’t the trendy new linguistic innovation after all, since the youngest age group wasn’t really adopting it?
Cedergren returned to Panama a decade later to find out. The formerly un-trendy twelve-year-olds had grown up into hyperinnovative twenty-two-year-olds. They now had the new “sh” pronunciation at even higher levels than the original trendy cohort of sixteen-year-olds, now twenty-six-year-olds, who sounded the same as they had a decade earlier. What’s more, the new group of sixteen-year-olds was even further advanced, and the new twelve-year-olds still looked a bit behind. Cedergren figured out that twelve-year-olds still have some linguistic growth to do: they keep imitating and building on the linguistic habits of their slightly older, cooler peers as they go through their teens, and then plateau in their twenties.
In terms of swearing, that’s like saying some twelve-year-olds swear, but a lot more sixteen-year-olds do. But swearing is very socially salient (we have laws about it!) and not really changing that much. It’s been peaking in adolescence and declining through adulthood for decades. The other trendy linguistic features that we acquire in adolescence (new pronunciations like shica, and innovative uses of words like so and like) are a case of subtle social discernment rather than massive social taboo, so we tend to keep them as adults.
This age curve is important when we think about when young people start using social media: age thirteen, if you believe the terms of service of most sites and apps, or slightly younger, if you assume that some users lie about their ages. This is right at the beginning of the age range when the language of teens is tremendously influenced by the slang of their peers. Sure, little kids play games and watch videos and even ask questions of voice assistants, but their social lives are still mediated by their families and their reading level. This coincidence of peer influence and social-media access means that it’s easy to conflate how the youth are talking now with the tools that they’re using to do so. But every generation has talked slightly differently from its parents: otherwise, we’d all still be talking like Shakespeare. The question is, how much of that is influenced by technology, and how much is the linguistic evolution that would have happened regardless?
Researchers from Georgia Tech, Columbia, and Microsoft looked at how many times a person had to see a word in order to start using it, using a group of words that was distinctively popular among Twitter users in a particular city in 2013–2014. As we’d expect, they noticed that people who follow each other on Twitter are likely to pick up words from each other. But there was an important difference in how people learned different kinds of words. People sometimes picked up words that are also found in speech—like “cookout,” “hella,” “jawn,” and “phony”—from their internet friends, but it didn’t really matter how many times they saw them.
For rising words that are primarily written, not spoken—abbreviations like “tfti” (thanks for the information), “lls” (laughing like shit), and “ctfu” (cracking the fuck up) and phonetic spellings like “inna” (in a / in the) and “ard” (alright)—the number of times people saw them mattered a lot. Every additional exposure made someone twice as likely to start using them. The study pointed out that people encounter spoken slang both online and offline, so when we’re only measuring exposure via Twitter, we miss half or more of the exposures, and the trend looks murky. But people mostly encounter the written slang online, so pretty much all of those exposures become measurable for a Twitter study. The researchers also found that you’re more likely to start using a new word from Friendy McNetwork, who shares a lot of mutual friends with you, and less likely to pick it up from Rando McRandomFace, who doesn’t share any of your friends, even if you and Rando follow each other just like you and Friendy do.
But these networks aren’t formed in isolation: people tend to follow others with similar interests and demographics. One study demonstrating this looked at the geographic spread of a couple thousand words that became massively more popular on Twitter between 2009 and 2012. It found that terms tended to leapfrog from one city to another based on demographic similarity, not just geographic proximity. So slang would spread between Washington, D.C., and New Orleans (both have high proportions of black people), Los Angeles and Miami (high proportions of Hispanic people), or Boston and Seattle (high proportions of white people), but not necessarily the cities in between. For example, the abbreviation “af ” for “as fuck” (as in, “word maps are cool af ”) starts out at low levels in Los Angeles and Miami in 2009, then spreads elsewhere in California, the South, and around Chicago in 2011–2012, suggesting that it was spreading from Hispanic to African American populations. The study stops there, but we can continue: in 2014 and 2015, “af” started appearing in BuzzFeed headlines, a decent measure of when it came to be co-opted by mainstream brands capitalizing on its association with African American coolness.
Analyzing language based on social networks also complicates another traditional demographic check box: gender. The traditional finding for gender is shown in a study by the linguists Terttu Nevalainen and Helena Raumolin-Brunberg at the University of Helsinki, which looked at 6,000 personal letters written in English between 1417 and 1681. Personal letters make a great corpus because, like tweets, they don’t go through editorial standardization. Unfortunately, there are also a lot fewer of them, and they tend to overrepresent the leisured, educated classes. But they’re still the best record we have of what day-to-day English looked like back then.
The linguists examined fourteen language changes that occurred during this period, things like the eradication of “ye,” the switch from “mine eyes” to “my eyes,” and the replacement of -th with -s, making words like “hath,” “doth,” and “maketh” into “has,” “does,” and “makes.” (Pretty shocking stuff.) For eleven out of the fourteen changes, Nevalainen and Raumolin-Brunberg found that female letter-writers were changing the way they wrote faster than male letter-writers. In the three exceptional cases where the men were ahead of the women, those particular changes were linked to men’s greater access to education at the time. In other words, women are reliably ahead of the game when it comes to word-of-mouth linguistic changes.
Research in other centuries, languages, and regions continues to find that women lead linguistic change, in dozens of specific changes in specific cities and regions. Young women are also consistently on the bleeding edge of those linguistic changes that periodically sweep through media trend sections, from uptalk (the distinctive rising intonation at the end of sentences?) to the use of “like” to introduce a quotation (“And then I was like, ‘Innovation’”). The role that young women play as language disruptors is so clearly established at this point that it’s practically boring to linguists who study this topic: well-known sociolinguist William Labov estimated that women lead 90 percent of linguistic change in a paper he wrote in 1990. (I’ve attended more than a few talks at sociolinguistics conferences about a particular change in vowels or vocabulary, and it barely gets even a full sentence of explanation: “And here, as expected, we can see that the women are more advanced on this change than the men. Next slide.”) Men tend to follow a generation later: in other words, women tend to learn language from their peers; men learn it from their mothers.
What’s less clear is why. Lots of reasons have been proposed, from the fact that women still dominate the caregiving of children in the societies studied, that women may pay more attention to language to compensate for relative lack of economic power or to facilitate social mobility, and that women tend to have more social ties. But in many cases, gender (like age) seems to be a proxy for other factors related to how we socialize with each other.
Several internet studies have highlighted the importance of differentiating between gender and social context. One study, looking at a corpus of 14,000 Twitter users and guessing their gender based on the skew of their first name in census data, appeared at first glance to show clear gender differences: people with predominantly female names were more likely to use emoticons, for example, while people with male-associated names were more likely to swear. But when the researchers looked one step further, they found that the words people most often tweeted formed natural clusters into over a dozen interest groups, such as sports fans, hip-hop fans, parents, politics buffs, TV and movie fans, techies, book fans, and so on.
True, many of the groups had a gender skew, but none of them were absolute, and they also had clear associations with other demographic factors like age and race. Sometimes whole groups defied gender norms—men overall tended to swear more, but techies, a cluster that was male-dominated, didn’t swear much at all, presumably because they were using Twitter as an extension of the workplace. At the individual level, people followed the norms of their clusters rather than their genders—a woman in the sports cluster or a man in the parenting cluster tweeted like their fellow sports fans or parents, rather than like an “average woman” or “average man.” Moreover, restricting the analysis to accounts with names that showed a clear gender skew in census data excludes precisely those users that would complicate a binary view of gender, such as nonbinary people and others who’ve deliberately chosen non-census-gendered usernames.
Offline, ethnographic research has also pointed to the importance of network factors. Linguist Lesley Milroy was doing a pretty standard study of language change in a couple working-class neighbourhoods of Belfast, Northern Ireland. As with many communities, the young women were leading a linguistic change—in this case, changing the vowel in “car” to sound more like “care.” This vowel is common elsewhere in Northern Ireland, but it was new to this particular community, and it was the young women who were bringing it in. What was mystifying was how they were getting it. When Milroy asked the women who they were close to, they named friends, family, and coworkers, all from their neighbourhood—the same neighbourhood where no one else yet had this vowel change.
In a later paper with James Milroy, the two figured out why by linking linguistic change to another concept in social science: strong and weak ties. Strong ties are people you spend a lot of time with and feel close to, whom you share mutual friends with; weak ties are acquaintances whom you may or may not share mutual ties with. In the case of the Belfast study, the early-adopting young women all worked at the same store in the city centre, where people were already using the new vowel. Although they didn’t have close friends from the city centre, they did have weak-tie contact with customers, which would have often exposed them to the new vowel—more than the young men of their neighbourhood, who weren’t employed outside it. Milroy and Milroy figured that, just as your weak ties are a greater source of new information like gossip and employment opportunities than your close friends who already know the same things you do, more weak ties also leads to more linguistic change. If everyone you know already knows one another, your only source of new linguistic forms is random variation—you don’t have any weak ties to borrow from.
But weak ties can’t be the only factor. After all, it’s also clear that we talk like people in our social circles. How can both strong and weak ties be responsible for how we speak? And how can we map out exactly who says what to whom over a large population for a couple centuries, long enough for several changes to run their courses? That’s not just bicycling—that’s time travel.
Linguist Zsuzsanna Fagyal and colleagues solved both problems using a computer simulation. They made a network of 900 hypothetical people over 40,000. Each person had a certain number of ties to other people in the network and started with a randomly assigned value for a hypothetical linguistic feature, like how you might call the thing you drink water from in a school a “water fountain,” but your neighbour might call it a “drinking fountain.”
Then, at each turn, each person looked to the other people they were connected to and had a certain probability of adopting their version of the feature, like how you might start saying “drinking fountain” if you have a friend who uses the term. If you do pick it up, that word now becomes yours as well, and the people you’re connected to might pick it up from you the next round. They repeated this process 40,000 times, with three different kinds of network. In one version, the entire network was made up of close ties: everyone was well connected to the rest of the network.
In this dense network, one linguistic option caught on very quickly and stayed completely dominant for the rest of the simulation. In another version, the entire network was made up of weak ties and no one was well connected. The loose network behaved like a world of tourists: all of the options stuck around, and none of them ever became dominant. But in the most interesting simulation, they made some of the nodes highly connected “leaders” and others less connected “loners.” In this mixed network, one option would catch on for a while, but the other options would never totally disappear, and eventually one of them would become popular instead—a cycle that repeated several times. The researchers concluded that both strong and weak ties have an important role to play in linguistic change: the weak ties introduce new forms in the first place, while the strong ties spread them once they’re introduced.
The internet, then, makes language change faster because it leads to more weak ties: you can remain aware of people you don’t see anymore, and you can get to know people you never would have met otherwise. The phenomenon of a hashtag or funny video going viral is an example of the power of weak ties—when the same thing is shared only through strong ties, it ends up merely as an inside joke. But the internet doesn’t lead to the collapse of strong ties, either: the average person has a small handful of people they message on a regular basis, between four and twenty-six, depending how you count. What’s more, social-networking sites that prompt you to interact with denser ties—people you already know and friends of friends—tend to be less linguistically innovative. It’s not an accident that Twitter, where you’re encouraged to follow people you don’t already know, has given rise to more linguistic innovation (not to mention memes and social movements) than Facebook, where you primarily friend people you already know offline. Since long before Edmond Edmont hopped on a bicycle, people have been piecing together how various aspects of the human experience are reflected in how we communicate: our geography, our networks, our societies. There’s always more to be figured out, of course, but we have a pretty solid understanding of the basics of how we use language to show our identity when we’re having a conversation. But the youthful, the vernacular, and the digital sides of language are still too easily overlooked; let’s find out what we can learn when we take them seriously.
Excerpted from Because Internet: Understanding the New Rules of Language by Gretchen McCulloch. Copyright © 2019 Gretchen McCulloch. Published in the United States by Riverhead Books, an imprint of Penguin Random House LLC, New York. Reproduced by arrangement with the publisher. All rights reserved.