3ARABIZI - WHEN LOCAL ARABIC MEETS GLOBAL ENGLISH

Arabic is the official language of Jordan. Yet, Eng lish is a language of prestige among many upwardly mobile Jordanians. Sakarna (2006) dubs a h ybrid language comprised of a mixture of these two languages “Englo-Arabic”. In online conte xts, a similar hybrid language has emerged. Often popularly labeled “3arabizi” or “Arabish”, a blended word based on the words “Arabic” and “English”, this mixed code is the most commonly encountered form of language for composing forum messages on the popular Jordanian w ebsite, Mahjoob.com (http://www.mahjoob.com). The most striking feature of 3arabizi is that it is written in Latin script and uses arithmographemics i.e. numbers as l etters to represent Arabic sounds that do not occur in English. This article presents the key ort h graphical features of 3arabizi and discusses its topical occurrence when compared to both Arabic and English as observable within a purposive sample of web forum messages collected fr om Mahjoob.com.


Introduction
3arabizi is a mixed language comprised of Arabic vernacular written in Latin script and English found in Computer-mediated Communication (CMC) contexts. Despite being a relatively new form of language, on certain websites such as mahjoob.com, 3arabizi is actually more commonly used for composing forum main messages than either Arabic or English especially for certain topics. This article provides a brief description of the unique orthographic features of 3arabizi, namely its use of arithmographemes (Bianchi, 2005). This is followed by a discussion of the topical use of 3arabizi vs. both Arabic and English within the 41 topical forums on the English website of mahjoob.com. Auer (1998Auer ( , 2008) deplores a monolingual bias in code-switching research which makes a priori assumptions about the existence of distinct and discrete linguistic systems which are then mixed in the speech of bilinguals to produce code-switching. Instead, Auer (1998) posits the possible existence of mixed codes or "fused lects" as the normative code of interaction among certain groups. Such a categorization naturally blurs the lines between discrete linguistic varieties. Prior research into 3arabizi seems to point toward the existence of such a hybridized form of language which, while using the Latin script, incorporates lexicogrammatical elements from both Arabic and English (Abdallah, 2008;Al-Tamimi & Gorgis, 2007;Al Share, 2005;Palfreyman & Al Khalil, 2003;Sakarna, 2006;Warschauer, El Said, & Zohry, 2002). Further, most of these studies also point to a unique feature of this hybrid language, namely the use of numerals as graphemes in order to represent Arabic sounds which have no ready or widely agreed upon equivalents in the Latin alphabet (Abdallah, 2008;Al-Tamimi & Gorgis, 2007;Al Share, 2005;Palfreyman, 2001a;Palfreyman & Al Khalil, 2003;Warschauer et al., 2002). The author labels such numerals used as graphemes "arithmographemes" (see Bianchi, 2005). Table 1 adapted from Palfreyman and Al Khalil (2003) illustrates the use of arithmographemes in 3arabizi: As can be seen above, a certain visual similarity exists between most of the Arabic letters and the numerals selected to represent them when Arabic is written in Latin script. For instance, note that the Arabic letter < ‫>ع‬ becomes <3> in 3arabizi-style Latinization, where the Arabic letter appears to be inverted. The sound being represented here is the voiced pharyngeal affricate [ʕ], which has no commonly agreed upon graphic representation when Arabic is transliterated using Latin script. Now that the unique orthographic features of 3arabizi have been surveyed, its topic-related occurrence on the mahjoob.com website will be discussed. But first, a brief overview will be given of the mahjoob.com, the website from which the data were collected.

The Data: Mahjoob.com and its Web Forums
The Mahjoob.com is website owned by Mr. Emad Hajjaj, a popular political cartoonist from Jordan, currently living in London. The website itself is hosted in Jordan and a significant number of its users appear to be Jordanian as well. However, the site also attracts posters from across the Arabic-speaking world and from the Arab diaspora as reflected by its advertising. Mahjoob.com is organized into two parallel websites, an Arabic one, and an English one. By November 2008, the Arabic site contained 35 forums, 1,330,999 posts, 58,855 threads, and 28,025 members while the English site contained 41 forums and sub-forums, with 982,084 messages (or posts), and 13,724 members. The website is actually composed of several linked web pages, the largest of which are the forum web pages. On the main portal to the English side of mahjoob.com there is a menu that provides various links. For instance, visitors can select the Arabic link and be taken to the Arabic website of mahjoob.com where they can then enter the Arabic-language forums. Alternatively, the visitor can stay within the English website and visit its blogs, archives, and, of course, its forums.
Given this surface division of the website by language, one might expect the English side of the mahjoob.com website to feature only English-language content and the Arabic side to feature only Arabic-language content given the pre-eminent status of English on the internet noted by researchers such as Crystal (2001). Accordingly, one would expect that the English used on the English website to be written in Latin script and that the Arabic on the Arabic website be written in the Arabic script as they are conventionally written in most offline domains 1 . However, even the most superficial browsing of both the Arabic and English forums of mahjoob.com makes it clear that forum posters do not follow these well-established conventions. Consider, for instance, the following screenshot ( Figure 1) taken from the main page of the English website:  Holes (2004) does note the existence of a written "mixed" variety of Arabic-scripted Arabic composed of standard and vernacular forms in limited use among Egyptians in popular print media such as magazine editorials. He observes that this kind of colloquial written style helps authors get closer to their readership and appear more 'folksy' (2004, pp. 381-382).
The English featured here is ostensibly standard with readily understood lexis and normative spelling conventions. However, on the left-hand side of the screen shot we can see the menu entitled, "Discussions". Underneath this, there are several forums which a website visitor can choose from. Already, from the red, bolded, asymmetrical font for label "Discussions", we anticipate a more playful, less formal kind of discussion. The punctuation accompanying the subsequent discussion options such as "Men's Corner", "Men Only!!" and "Girls Talk", "No MEN!!" marked as flaming type language (see Herring, 1996) indexes informality. Upon deeper probing, it becomes clear in fact that much of the English used in the forums is actually of an informal, non-standard style. Several of the features of Netspeak (Crystal, 2001) are immediately observable: abbreviations such as "plz", neography (cf. Anis, 2007) such as "u" for "you", "coz" for "because" and "r" for "are" are all commonplace in the forum texts. Thus, there is a picture of English which is not altogether uniform: different levels of formality are indexed by distinct lexis and orthography.
Interestingly, in addition to English, other forms of language are immediately discernible on this web page such as Arabic. For instance, the website's logo and name are given as Abu Mahjoob written in Arabic alongside Mahjoob.com in English. Also, on the top menu, the rightmost link says "Arabic" written in Arabic and thus provides an option for website viewers who want to visit the Arabic side of the website. Again, on the right side menu, there are links to articles with Arabic names. The political cartoon of the website's mascot, Abu Mahjoob, is also written in Arabic-scripted Arabic albeit in a Jordanian vernacular variety.
Similar to the English observable on the English website, the Arabic found on mahjoob.com cannot be said to be of one single variety or style. For instance, discussions are often carried out in written vernacular Arabic as opposed to Modern Standard Arabic, which is the normative code for virtually all traditional written discourse in offline contexts (Holes, 2004). In this connection, it is telling that in Figure 1 above the cartoon character Abu Mahjoob "speaks" in Jordanian vernacular in the cartoon snippet just as one would expect in a "real" offline spoken context 2 . And yet, in order to create this "real life" feeling in the cartoon, the cartoonist was forced to break the conventions of Modern Standard Arabic writing by employing vernacular lexis and structure and non-standard orthography such as doubled vowels which signal to the reader the oral nature of the interaction (cf. Holes, 2004, pp. 381-382).
3arabizi is also apparent on this webpage, in the phrase "e7ke wa fadfed" (trans. "get it off your chest") found under the "discussions" menu of forums. But while scarcely noticeable on the main page, the presence of such 3arabizi is in fact much greater within the web forum main messages 3 as will be seen below. Of peripheral interest, Arabic messages are also posted within the English website and, to a far lesser extent, a smattering of English messages within the Arabic website. There are also several discussion threads on the website featuring code-mixed and script-mixed messages where posters switch between Arabic, English, and 3arabizi.
Having provided an overview of the English website of mahjoob.com, it will now be useful to describe how the data were collected and analyzed.

Method
In order to determine how widespread the use of 3arabizi was within the forums of the English-language website of Mahjoob.com, a purposive sample of all messages posted between March 2007 and May 2008 was downloaded and compiled into a corpus. This resulted in a corpus of 460,220 messages, found within 21,626 discussion threads. The discussion threads, in turn, were found within 41 topical forums.
Using language-specific wordlists based on the Arabic Gigaword and the British National Corpus (BNC) wordlists, each message was annotated to indicate whether it was composed in Arabic, English, or a mixture of the two. A third wordlist was developed to indicate messages written in 3arabizi. In the process of annotating Arabic, BNC English, and 3arabizi messages, three more types of messages were discovered: those that contained a mixture of Latin script and Arabic script (Mixed Script messages), those that contained typically single Islam-related Arabic items transliterated without arithmographemics (Salafi English messages), and those that contained items that were English mixed with items that could not be categorized (Non-BNC English messages) 4 . This resulted in the following six linguistic labels for messages in the corpus: 1. Arabic 2. BNC English 3. 3arabizi 4. Mixed Script 3 Main messages refer to all messages except the initial or seed message in a discussion thread. Wodak and Wright's (2007) earlier research into multilingual threaded discussions discovered linguistic differences between seed messages and following messages. Following their lead, the author has decided to differentiate between seed messages and main messages, arguing that the potential effect of topic on code choice would be more apparent the further removed a message was from the initial message Consequently, seed messages, which accounted for less than 5% of all messages in the corpus, were excluded from the chi-squared analyses of topic and code choice since they were found to be linguistically distinct from main messages in the present data set as well.. 4 Such items were either neologisms such as "wiki" or "Obama" not contained in the original BNC wordlist, or items from languages other than vernacular Arabic e.g. Turkish, Circassian, or transliterated Hebrew items as found in the World Talk forum, dedicated to learning other languages besides Arabic and English.

Salafi English 6. Non-BNC English
Using these six labels, all messages were annotated for linguistic content. Next, in order to determine the statistical frequencies of messages composed in each of these six groupings, the data was converted into an SPSS 5 format in which each corpus message became a unique case defined by several variables such as language of message, thread of message, topical forum of message, author of message, etc. These variables could then be used to carry out different statistical procedures such as chisquared tests of significance. Another advantage was that frequencies of messages in each language could be displayed in a table format and analyzed for statistical significance. Each message was also annotated to indicate which forum it belonged to. Since there was a relatively large number of forums (41 in total), it was desirable to code these into broader topical categories. Thus, a second step involved recoding all 41 forums into eight overarching topics adapted from Bentahila (1983) in his study of topic-related code choice involving Arabic and French. The adapted topics in the present study were 1) Humour and jokes, 2) Poetry, 3) Work and study, 4) Friends and family, 5) Local/Regional culture, nationality, and politics, 6) Hobbies and pastimes, 7) Gender and age-related forums, and 8) General discussion/opinion (see Bentahila, 1983). Initially, forums were coded into these topics based on their forum titles. However, in order to verify that these forums were in fact connected to the topics into which they had been placed, threads from all 41 forums were randomly sampled and read for topical content. This process did in fact confirm the original topical coding of most forums 6 . This codification resulted in the creation of eight overarching topics as seen below in Table 2: The messages were then recoded to reflect which of the eight overarching topics they occurred with. Once this was done, frequencies of language of main message across overarching forum topic were compiled. These frequencies were then tested for statistical significance using the cross-tab function of SPSS, applying Chi-squared with the value p-0.05.

Findings
Tabulation of language type across overarching forum topic resulted in Table 3 below: Examining this table, a number of observations can be made. First, in terms of overall code occurrence, 3arabizi is the most common form of language representing 35.5% of the total main messages, followed closely by Arabic at 32.3%. This is so despite the fact that these forums occur within the English website of mahjoob.com. Both BNC English (17.5%) and Non-BNC English (7.5%) combined account for only 25% of the main messages. Salafi English comes in fifth place with only 4.1% of all main messages. In last place, Mixed script main messages account for a mere 3.2% of main messages in the corpus.
In terms of language and topic, Arabic was found to predominate in forums related to the topics of Poetry, Humour and to a much lesser extent, Local Culture. It was anticipated that Humour-related main messages might actually be composed in 3arabizi as it is often linked stylistically with Vernacular humour. However, it was discovered that the presence of Arabic with the topic of Humour was accounted for primarily by the Joke Zone forum where Arabic was, in fact, the dominant code. Indeed, Joke Zone main messages account for well over half of all Arabic main messages in the entire corpus. A frequency wordlist of Joke Zone forum's Arabic lexis, revealed that its most frequent items were actually Vernacular Arabic items written in Arabic script as opposed Modern Standard Arabic. The Local Culture topic which included nationalistic and religious forums was also largely written in Arabic, however 3arabizi was relatively common here as well which seemed understandable given the cultural link of such topics to Vernacular Arabic. In contrast, Hobbies and Work/Study-related topics were least commonly expressed in Arabic. Instead, these same two topics were most commonly discussed using BNC English and 3arabizi, followed by Non-BNC English. Regarding Work/Study-related forums, these had an especially high percentage of Non-BNC English main messages, suggesting that technical neologisms may have been prevalent in these forums. In contrast to Arabic, BNC English was least common with both the Humour and Poetry topics. These observations reveal a pattern of Arabic and BNC English being diametrically opposed in terms of the topics with which they most frequently occur. 3arabizi was found to dominate Family and Friends, Gender/Age-related forums, Hobbies, and General Discussion forums. This finding was interesting in that it highlighted the observation that these forums were predominantly composed of message texts that were not imported from other websites, but appeared to be written instead by forum posters themselves. Thus, a pattern emerged strongly suggesting the use of 3arabizi to write personal, intimate, and general texts while Arabic and English were used more often because they were part of imported texts and perhaps also because Arabic was tied thematically to cultural and local topics while English was linked to professional and academic topics.
Interestingly, Mixed script main messages were prevalent in the Joke Zone forum as well. This suggests that material may have been imported from other Arabic websites where a certain amount of Latin script was present alongside Arabic script due to hyperlink-related strings. Meanwhile, Salafi English reveals a relatively low frequency across virtually all topics with 4.1% overall. Still, this is more than double the amount of Mixed script main messages (3.2%), highlighting the paucity of biscriptal main messages in the corpus.

Summary of Findings
To sum up, there are several clear patterns of code choice related to topic. Topical forums that are both local and formal in content feature relatively high amounts of Arabic (Arabic-scripted Arabic). In this regard, Arabic language poetry, Arabic politics and nationalism, and Islamic religion all favour the use of Arabic. The major exception to this formal Arabic usage trend is the preponderance of Vernacular Arabic language jokes in the Joke Zone forum. Regardless, thematically, all of these forums could be described as connected to local Arabic culture. In contrast, topical forums dealing with more specialized technical content especially related to fields of work and study are dominated by BNC English main messages, perhaps because sources for such content are to be found primarily in English language elsewhere on the web. 3arabizi, the most frequently encountered code in the corpus, was found to dominate topical forums which were less formal and more intimate in content as well as forums that encouraged the sharing of general discussion and opinions. In this connection, Hobby-related forums were also dominated by 3arabizi main messages, though both BNC English and Non-BNC English were also frequently featured in such forums. Again, given the nonculturally-localized nature of many of the hobbies, this was not a surprising finding.

Conclusions
This report has described the key orthographic features of 3arabizi, noting its novel features such as the use of arithmographemes in place of letters to represent Arabic sounds not found in English. One of the principal domains for the use of 3arabizi is the internet. In particular, web forums such as those found on the Jordanian website, mahjoob.com, are prime loci for the use of 3arabizi. This study has also showed that 3arabizi occurs with different topics than either Arabic or English do. In this regard, 3arabizi, as the most used code on the mahjoob.com website, was found to occur mainly within forums associated with Family and Friends, General discussions, Gender and Age groups, and Hobbies. In contrast, Arabic was found with forums related to Humour, Poetry, and Local culture, all posting areas where copied material from other websites seemed to be quite prevalent. English, despite being the designated language of the English site and its forums, was least prevalent among these three main codes, serving mainly for discussion of Work and Study-related topics. The relative scarcity of English messages challenges the notion of the primacy of English on the internet. On the other hand, the abundance in the corpus of a mixed code comprised of Vernacular Arabic written in Latin script along with elements from English suggests that the seemingly conflicting trends of globalization and localization of culture may, in fact, be working synergistically to produce new and fascinating hybrid forms of language, of which 3arabizi is a prime example.