Will the Internet Always Speak English?

By Geoffrey Nunberg

In 1898, when Otto von Bismarck was an old man, a journalist asked him what he took to be the decisive factor in modern history. He answered, "The fact that the North Americans speak English." In retrospect, he was spot on the mark about the political and economic developments of the twentieth century, and up to now he seems to have been prescient about the development of the technologies that will shape the next one.

The Internet was basically an American development, and it naturally spread most rapidly among the other countries of the English-speaking world. Right now, for example, there are roughly as many Internet users in Australia as in either France or Italy, and the English-speaking world as a whole accounts for over 80 percent of top-level Internet hosts and generates close to 80 percent of Internet traffic.

It isn't surprising, then, that the Web is dominated by English. Two years ago my colleague Hinrich Schütze and I used an automatic language identification procedure to survey about 2.5 million Web pages and found that about 85 percent of the text was in English. The overall proportion of English may have diminished since then--a 1999 survey of several hundred million pages done at ExciteHome showed English with 72 percent, followed by Japanese with 7 percent and German with 5 percent, and then by French, Chinese, and Spanish, all with between 1 and 2 percent. Figures like these are invariably inexact. But there's no question that the proportion of English will remain disproportionately high for some time to come, if only because use of the Web is still growing faster in the English-speaking world than in most other language communities--in the past two years, the number of Internet hosts in English-speaking countries has increased by about 450 percent, against 420 percent in Japan, 375 percent in the French-speaking world, and 250 percent in the German-speaking world.

To a lot of observers, all of this suggests that the Internet is just one more route along which English will march on an ineluctable course of world conquest. The Sunday New York Times ran a story a while ago with the headline "World, Wide, Web: 3 English Words," and the editor of a magazine called The Futurist predicts that, thanks to new technologies, English will become the native language of a majority of the world by some time in the next century. Indeed, one linguist has suggested in all earnestness that the United Nations should simply declare English the official world language, but rename it Globalese, so as not to imply that it belongs to any one speech community anymore.

You may have the feeling that this maneuver would not allay the anxieties of speakers of other languages, who not surprisingly view the prospect of an English-dominated Web with a certain alarm. The director of a Russian Internet service provider recently described the Web as "the ultimate act of intellectual colonialism." And French President Jacques Chirac was even more apocalyptic, describing the prevalence of English on the Internet as a "major risk for humanity," which threatens to impose linguistic and cultural uniformity on the world--a perception that led the French government to mandate that all Web sites in France must provide their content in French.

On the face of things, the concern is understandable. It isn't just that English is statistically predominant on the Web. There is also the heightened impression of English dominance that's created by the ubiquitous accessibility of Web documents. If you do an AltaVista search on "Roland Barthes," for example, you'll find about nine times as many documents in English as in French. That may or may not be wildly disproportionate to the rate of print publication about Barthes, but it's bound to be disconcerting to a Parisian who is used to browsing the reassuringly Francophone shelves of bookstores and libraries.

Then too, it isn't just Anglophones who are using English on the Web. A lot of the English-language Web sites are based in non-English-speaking countries. Sometimes English is an obvious practical choice, for example in nations like Egypt, Latvia, and Turkey, where few speakers of the local language are online and the Internet is still thought of chiefly as a tool for international communication. But the tendency to use English doesn't disappear even when a lot of speakers of the local language have Internet access. Since the Web turns every document into a potentially "international" publication, there's often an incentive for publishing Web sites in English that wouldn't exist with print documents that don't ordinarily circulate outside national borders. And this in turn has made the use of English on the Web a status symbol in many nations, since it implies that you have something to say that might merit international attention.

It isn't easy to measure how many sites in nations like France, Germany, or Sweden are posting content in English, partly because it isn't clear what things to count and partly because large numbers of users in these places have accounts with addresses that can't be identified by nationality. (America Online alone has more than a million subscribers in Germany.) But the use of English is clearly extensive, if not quite as overwhelming as people sometimes believe. In our study, Schütze and I found that the proportion of English tends to be highest where the local language has a relatively small number of speakers and where competence in English is high. In Holland and Scandinavia, for example, English pages run as high as 30 percent of the total; in France and Germany, they account for around 15-20 percent; and in Latin America, they account for 10 percent or less.


Netting cultural Diversity

Still, it's a mistake to assume that any gains English makes on the Internet will have to come at the expense of other languages. The Internet is not like print or other media, where languages are in competition for finite communicative resources. A French movie theater has to choose between showing Steven Spielberg or Eric Rohmer, and a print medical journal can't print multilingual versions without substantially increasing its costs. But on the Internet, the diffusion of information is not a zero-sum game. The economics of distribution make multilingual publication on the Web much more feasible than it is in print, which is why a large number of commercial and government sites in Europe and Asia (and even, increasingly, in the United States) are making their content available in two or more languages.

Then, too, there are strong forces militating for the use of local languages on the Web. An increasing proportion of new users who are coming online in places like France or Italy are individuals and small businesses who are chiefly interested in using the Net for local communication, unlike the large firms or public institutions who have made up the first wave of adopters. An airline company or research center in Germany may have an incentive to post its Web pages in English, but a singles club or apartment rental agency does not. And as more people in a language community come online, content and service providers have a strong interest in accommodating them in their own language. Yahoo! has put up localized versions in French, Spanish, German, Danish, Norwegian, Swedish, Italian, Chinese, Korean, and Japanese, and in all of these markets it is facing competition from other portals, both American and local. By now, the speakers of major languages don't have to leave their linguistic neighborhood to consult an online newspaper or encyclopedia; hunt for jobs or housing; participate in discussion about horticulture, stocks, or soccer; or buy air tickets, books, perfume, furniture, or software. By limiting search engines and portals to resources in their own language, users can choose to ignore the sea of English content on the Web--and they are not likely to miss it much. That AltaVista search for French-language sites on "Roland Barthes," for example, turns up 498 hits. That may be many fewer than the more than 4,200 English-language sites on Barthes, but it's a lot more than most people need or have time to sort through.

This is not to say that the Internet won't have important linguistic effects. Ultimately it could be comparable to the importance of print, which first created standardized national languages and then helped to create a sense of national community around them. The mistake is to assume that the effects will be measurable in raw percentages of global language use. The "how many speakers?" games that language chauvinists like to play have always been one of the sillier manifestations of cultural rivalries--like Olympic gold medal counts, only a lot more inexact. What matters is not simply how widely a language is used, but why and when people use it and how it figures into their sense of social identity.

This is where the distinctive properties of the Net come into play. Notably, electronic communication doesn't require large capital concentrations to produce and distribute content, so it needn't entail the centralization that print and broadcast do. And also unlike print, the cost of diffusion of electronic documents doesn't increase proportionately with the distance or dispersion of the audience. To be sure, these effects are only relative. Posting a Web site that is actually accessible to hundreds of thousands of users requires a large capital investment in both technology and publicity, and overall activity tends to center on a small number of sites. A recent study by Alexa Internet showed that the top 2 percent of Web sites account for 95 percent of the total number of clicks, and other surveys suggest that the concentration of activity is becoming more marked--the proportion of users' time given to the 50 most popular Web sites has gone from 27 percent to 35 percent over the past year.

Moreover, communication between historically marginal regions is still limited by the available infrastructure--at present, for example, Hong Kong and Tokyo have roughly 50 times as much bandwidth to the United States as either does to other Asian cities. But the Internet is still a much more decentralized medium than print, particularly if we include the use of e-mail, discussion lists, and other forms that have no real print equivalents. And it's far more efficient than print or broadcast in reaching small or geographically dispersed audiences, whether we're thinking of the markets for scholarly books or medieval music or of the Welsh-speaking community.


Triumph of the Vernacular

One important consequence of all this is to make the choice of language chiefly dependent on the purpose of communication rather than on economics or geography. But this can work to the advantage of different languages in different situations. For example, English has always been the dominant medium for international trade in books, records, software, and travel arrangements. But now those transactions aren't conducted just by distributors, retailers, and travel agents, but by individuals, who consequently find themselves using English to buy things that they used to buy using their local vernacular. And there is a similar effect in science, where English is increasingly being used not just in its traditional role as the language of published research but also in informal Internet discussions of methodology, professional gossip, or theoretical speculation--the sorts of topics that used to be reserved for face-to-face conversations in the lab or lunchroom, conducted in whatever the local language happened to be.

But for other purposes, the Internet strengthens the role of national and regional languages. Take the diffusion of news. In the worlds of print and broadcast, it's only the Englishlanguage media--more specifically, the American media--that have been able to achieve anything like genuine worldwide news distribution. You can sometimes find a French television news program on cable in big cities in the United States or a three-day-old copy of Le Figaro at an international news dealer, but they aren't available in every hotel room and at every street corner the way CNN and the International Herald Tribune are in France. And for languages like Greek, Catalan, or Hindi, the circulation of information pretty much stops at national or regional borders.

With the Web, this all changes. French speakers in non-Francophone regions have access to the online versions of 20 or 30 French-language newspapers and to as many direct radio transmissions, and Web transmission of TV programming will become routine as bandwidth increases. The speakers of less widely used languages are nearly as well served--Yahoo! lists electronic versions of newspapers from Malaysia, Indonesia, Colombia, Turkey, Qatar, and about 70 or 80 other nations. At the same time, it's becoming easier for the members of these language communities to get remote access to government information, educational materials, scientific journals, and, ultimately, the digitized collections of the major national libraries. And while the Web won't seriously undermine the global dominance of American movies or music, it makes it much easier to distribute cultural products from other nations--even if a new film by Eric Rohmer or a new album by Claude Nougaro doesn't get extensive foreign distribution, individuals will be able to order or download it directly.

No less important, the Net creates new forums for informal exchanges among the members of geographically dispersed communities. At present there are discussion groups in more than 100 languages, including not just major national languages but Basque, Breton, Cambodian, Catalan, Gaelic, Hmong, Macedonian, Navaho, Swahili, Welsh, and Yoruba, among others. (One Yiddish speaker I know who's in her 40s says that before the Internet she had never had a conversation in that language with anyone younger than her parents' generation.)

These efficiencies of distribution work to the particular advantage of dispersed language communities--whether linguistic diasporas like the Indonesians, Russians, or Greeks living abroad or postcolonial populations that have up to now existed in the linguistic penumbra of the metropolis. People in the Francophone Caribbean or the Mahgreb, for example, can have much quicker and more extensive access to French-language content produced in other regions than with print or broadcast.

In many of these regions, it's true, Internet connections will chiefly benefit government agencies, universities, and major industries. But in other places, there are substantial populations in a position to take advantage of the more immediate ties to their larger linguistic community--the Hungarians in Slovakia, the Chinese in Southeast Asia, the Francophones in western Canada, or the Russians in many parts of Eastern Europe. In theory, this could lead to a closer sense of connection within language communities like these, not just between the cultural centers and the peripheries but also between distant communities that have never been in direct contact before. People in Angola can log onto the excellent Web site of the Portuguese Ministry of Culture, for example, but they can also establish easy connections with other Lusiphones in Mozambique, Brazil, or Fall River, Massachusetts.


How Universal a Net?

All of this presupposes, of course, that sufficient numbers of people in the community will have Internet access, which will be a long time coming in many parts of the world. Right now, for example, China and India each have around two million Internet users, and there are between three and four million in Hispanophone Latin America. (One reason it's hard to estimate Net use in many of these nations is that large numbers of people make use of Internet cafes or office machines and have e-mail accounts with Web services like hotmail.com.) And while the Net is growing rapidly in most of these nations, severe barriers must be overcome. There are only 10 telephones per 100 people in Latin America, for example, and only 2 per 100 in India, and while there are ambitious plans for extending the Internet via wireless communication, these face daunting technical and economic difficulties. Even where service is available, moreover, it is often expensive--monthly Internet access costs three times as much in Argentina as in the United States, five times as much in Kenya, and six times as much in Armenia, disparities that are aggravated by differences in average incomes.

Even as a medium for elite communication, of course, the Internet can play an important role, particularly in language communities that are poorly served by traditional media, whether for geographic or political reasons. In the Chinese-speaking world, for example, the Net has become an important forum for political discussion, despite the efforts of the Chinese government to restrict access to unacceptable sites and the often intemperate tone of the discussions. (When you look at discussion groups carried out in languages from Chinese to Indonesian to Spanish, you are struck by how the flame has become a universal genre.)

But what of the developed world, where the Internet is accessible to a large part of the population? Will it ultimately reshape the sense of the language community the way print did? Some enthusiasts have suggested that the Net will wind up making languages rather than nations the primary social bond. As one international marketing firm puts it: "People speaking the same language form their own online community no matter what country they happen to live in." This is stretching a point, to put it mildly. Granted, the Internet makes it possible for French, German, or English speakers in different nations to engage in daily conversation with one another, which doubtless increases their sense of linguistic connection. But these people have already been exchanging books, movies, and TV shows for a number of years, and while purists have always complained that this sort of communication dilutes the national culture, the fact is that there hasn't been any real lessening of people's sense of distinct national identities--or, from the marketer's point of view, of their distinct patterns of consumption. Belgians persist in feeling Belgian; Australians persist in feeling Australian; Austrians persist in feeling Austrian. So it's hard to believe that the nation will start to wither away just because people from different parts of the language community are wired to the Net.

Yet the Internet may have important linguistic effects even on communities like these, by altering the kind of language that matters in public life. Since the eighteenth century, most developed societies have recognized a distinction between two varieties of language. The first is the informal, rapidly changing variety that you learn in the normal course of socialization, which is adapted to private communication between individuals who have a lot of background in common. The second is a conservative and relatively formal variety used in published writing and broadcasting--a variety that requires explicit instruction and that is designed to communicate to an anonymous audience who can't be presumed to know much about the writer's circumstances or background. This variety may be loosely based on middle-class speech, but it aims at being a neutral and universal medium, and it tends to be less susceptible to regional and national variation (The Economist is a lot easier for Americans to follow than a conversation in a London pub). Traditionally, this is the form of language that we look to dictionaries to record for us and that attracts most of our critical concerns about the state of the language and its consequences for public life.

But the Internet blurs this distinction, even as it blurs the distinction between "public" and "private" communication. The language of the innumerable discussion groups and bulletin boards of the Net has much of the tone of private communication--it's informal, elliptical, and allusive. But it is conversation filtered by a battery of conventions adapted to its new function. I'm thinking of not just the rich etiquette for responding, cc'ing participants, including quotes from other messages you've received, and so forth, but the subtler ways that the informalities of private conversation are tailored for use in a semipublic forum. That's why these discussions can be so difficult for foreigners to participate in, even when they're entirely comfortable with formal written English. What's more worrisome still, they can also marginalize native speakers who aren't privy to the norms of middle-class speech, by which I mean not so much the forms and spellings of the standard language as the way people deploy it in the back-and-forth of ordinary conversation. It's one thing to know when to say, "I'm afraid I have to take issue with Ms. Price's conclusions" and another--much more difficult to get the hang of--to know when it's appropriate to say, "You've gotta be kidding."

There's a troubling paradox in all this. The forums of the Internet undoubtedly create the opportunity for a wider and more participatory public discourse than has ever before been possible. True, we may want to be a little skeptical of the visionaries' picture of these interactive forums as the nuclei of a new "electronic commons" that will wind up displacing traditional political institutions with a direct democracy--it's in their nature to be too chaotic, too fragmented, and too unreliable to bear all the burden. But they have already become important secondary media for transacting political life, both as places where the news is critically interpreted and as sources of information (sometimes correct) that the press has not adequately reported.

Yet even as they open up the discourse, these forums can also restrict and circumscribe participation in it, as the neutral language of the traditional op-ed page yields to something that has more of the tone of conversation in a Palo Alto coffee bar. This may ultimately be the most important linguistic issue raised by the technology. What does it matter how widely English or any other language is used on the Internet if the language used there has become less of a common medium for its speakers?