There's more in a tweet than 140 characters. Among the 500 million messages sent each day on Twitter, there's a tsunami of slang terms and textspeak. There are hashtags, emoticons and links. Many tweets contain geotags that identify where on earth a person stood when pressing SEND. That may sound like just a lot of noise, but for linguists making ever more sophisticated use of it all, Twitter is providing the most enormous stream of data they have ever had at their disposal.
Gone are the days when a language researcher had to interview subjects in a lab or go door to door in the hope of gaining a few insights about a limited sample of people. Academics in the U.S. and Europe are using the seven-year-old microblogging platform to put millions of examples under the microscope in an instant. "It's unprecedented," says Ben Zimmer, the executive producer at Vocabulary.com "the sheer amount of text you can look at at one time and the number of people you can analyze at once." Hidden in tweets are insights about how we portray our identity in a few short sentences. There are clues to long-standing mysteries, like how slang spreads. And there is a new form of communication to study. If language is the archive of history, as Ralph Waldo Emerson once said, social media should get its own shelf.
Data extracted from Twitter come with caveats--like the fact that some tweets are written by automated bots--and great minds are still in the early days of building computer programs that can understand tweets in all their humor and nuance. But what is being said on Twitter is invaluable to scholars as well as to researchers with agendas, like advertisers and campaign managers. "We can talk about culture and community, but language gives you a way to really observe those things," says Jacob Eisenstein, a computational linguist at Georgia Tech, "if you know how to pull that signal out of all that text." There are more people trying to locate the signal every day: upwards of 150 Twitter-based studies have come out in 2013 so far.
Language researchers have found that women are more likely to use first-person terms (like I and my) and exclamation points, especially repeated ones. Men typically share more links and use more technology-related words. But social networks matter too: a female who follows and tweets to a largely male audience is more likely to use features, like numbers, associated with the boys. And vice versa for men.
A Stanford University linguist found that older tweeters tend to use emoticons with noses--:-) instead of :)--an action tied to their preference for conventional language. Youthful "no nose" tweeters tend to use more swear words. In a study released this June, Dutch researchers at the University of Twente found that young tweeters were more apt to type all-capital words and to use expressive lengthening, like writing "niiiiiiice" instead of "nice." The older crowd is more apt to tweet well-wishing phrases like good morning and take care, to send longer tweets and to use more prepositions.