Student’s Name: Kimiko Takase

Date: August 2001

Module: Research and Development in ICT in Education

Title: A Critical Review of the Object Text: Comparing Face-to-Face and Electronic Discussion in Second Language Classroom

Object text:

Warschauer, M. (1996). Comparing Face-to-Face and Electronic Discussion in the Second Language Classroom. CALICO Journal, 3(2), pp. 7-16.

Words: 3992 words

Contents

1 Introduction

2 Evaluation

2.1 Empirical setting

2.1.1 Research design

2.1.2 Sampling

2.1.3 Data collection

2.1.4 Data analysis

2.2 Representation of data

2.2.1 Lack of information

2.2.2 Quasi-anthropological element

3 Conclusion

References

1. Introduction

Warschauer (1996) introduces key antecedent research about electronic communication in the second language classroom which indicates that an electronic mode of communication can result in more equal participation among ESL students and that language can be more complex in electronic communications. He set five research questions to interrogate student participation and language complexity in face-to-face (f2f) and electronic discussion in small groups. These are:

1. Do students participate more equally in electronic discussions than in f2f mode?

2. Who benefits from this more equal participation in electronic mode in terms of factors such as gender, nationality, and age and language proficiency?

3. What are students’ attitudes toward these two modes?

4. Is the language which is used in electronic discussion more complex lexically and syntactically than the language used in f2f mode?

5. What other differences are observed in the language use and interaction style between the two modes?

In establishing his local empirical setting, he employed opportunity sampling. The subjects were 16 students (5 Filipinos, 5 Japanese, 4 Chinese and 2 Vietnamese) out of 20 enrolled in an advanced ESL composition class at a community college in Hawaii where he works. Four students were omitted because of their absence on the day of the study. He conducted the study during a normal 75-minute class. The students were randomly assigned to four groups of four students. They discussed two questions about “the family”, one f2f and one electronically using Daedalus Interchange, a real-time communication package.

In his collection of data, he used audio tape-recorders to record f2f discussion. The transcripts in both f2f and electronic discussion were analysed by a computerized language analysis program. TTR (type-token ratio), which counts the ratio of the number of different words divided by the total number of words per speaker, was used to analyse the transcript. The clause coordination was assessed by the use of a Coordination Index (CI). Individual participation rates were compared between the two modes. He calculated the Gini coefficient of participation inequality.

Language proficiency was measured using a SLEP (Secondary Level English Proficiency) test. A questionnaire was also used. This included six personal questions (gender, age, native language, birthplace, length of residence in the US, number of years studying English) and 19 additional questions. He used the results to correlate with IPC (Increased Participation in Computer mode) scores in terms of students’ age, time in the US, SLEP scores. The IPC score was used to measure the increase in participation rate. Qualitative analysis was used to interrogate language use and interaction style in the two modes.

2. Evaluation

2.1 Empirical setting

2.1.1 Research design

One of the major difficulties with Warschauer’s paper relates to his research design. Warshauer presents the research as an experiment. His main research question concerns the relationship between an independent variable and a dependent variable. The former is the medium for discussion – either electronic mode or f2f mode. The latter is equality of participation. He also tried to measure the correlations between the level and form of participation in the two modes and attributes and characteristics for the individual.

Bell points out that:

The experimental style does allow conclusions to be drawn about cause and effect, if the experimental design is sound, but . . . generally a large groups are needed if the many variations and ambiguities involved in human behaviour are to be controlled. (Bell, 1999; p.15)

Furthermore, in generalizing, Bell suggests that researchers need to take great care to ensure that they consider all possible causes in claiming a causal relationship. Warschauer’s research is weak here. His sample size is small and the research period is very short: 16 students in 75-minute-period class. In these circumstances, individual differences become significant. However, Warschauer does not take effective steps to minimize the problems in his research design. This is potentially fatal in terms of the validity of his paper.

One possible way in which these problems might be overcome is to control the variables properly. There are too many variables in his design and the sample is too small to do this. He could simply have compared one variable, participation rate, in the two modes. For example, he might have observed the students in f2f discussion and formed two groups according to their participation rate - high participation rate and low participation rate. Then he might have instructed the two groups to experience discussion in both f2f and electronic mode and measured participation rates. This would have enabled him to measure the change in participation rate between the two modes with a control for participation rate in the f2f mode. Another solution might have been to conduct his research on a case study basis over a longer period. This would reveal individual information about subjects in some depth, rather than presenting statistical data as group averages.

However, he tried to explore correlations between a large number of factors including nationality, language complexity, attitude, etc. He admits in his paper that the number of students involved in the study was so small that statistical results for group comparisons, IPC and students’ attitudes were not checked for significance. The sample size and period is inadequate to establish a causal relationship. However, he treats the data as if it were the result of a large-scale experiment, as if his findings from just 16 subjects were representative, and therefore generalizable. He should have either avoided this research design or should have represented his data in a different way. I shall discuss the problem of his representation of data and provide some suggestions in section 2.2.

2.1.2 Sampling

In this section, I shall point out the problem of Warschauer’s (1996) sampling in terms of representativeness, focusing on nationality, which was used as one of the variables in his study. The issue relates to the extent to which his sampling strategy enables him to take his sample as representative of a wider population. From the beginning of the study, there is a fundamental problem in sampling size if Warschauer’s intention is to examine correlations with nationality. In addition to the size being small it is not possible to distribute equal numbers of students to each group according to their nationalities. (e.g. There were only two Vietnamese and it is not possible to divide two people into 4 groups.) Nevertheless, the author states that the Chinese, Japanese and Vietnamese students group increased their participation in the electronic mode. Table 3 is titled ‘Average Percentage of Participation by Nationality’. Warschauer points out that the percentage of Vietnamese contributions moved from 28.4% in f2f mode to 30.2% in electronic mode in Table 3. However, the data was collected from only two Vietnamese. It is hardly acceptable that these two Vietnamese students in his university are representative of Vietnamese students in similar ESL classroom in the US. The same argument can be applied to the 4 Chinese, 5 Japanese and 5 Filipinos who are also taken to represent their respective national groups. The problem of representativeness leads us to doubt the generalizability of his data.

2.1.3 Data collection

There are some problems in Warschauer’s (1996) data collection and analysis in terms of reliability and validity. Firstly there is a problem with the validity of the SLEP test that he used in measuring language ability. As he admits, the SLEP test can only measure listening and writing ability. It is not a communicatively oriented test of their listening and speaking ability. Warschauer could have used other measurements whose results are more directly related to language proficiency, especially to speaking ability which seems to correlate more to the language skills needed in the f2f mode.

Secondly, the use of a tape recorder in the classroom for transcription might have been an inhibitor for students’ participation. Warschauer might have minimized this effect by using the technique of habituation in which the researcher introduces the tape recorder into the setting before actual data collection starts, as Brown and Dowling (1998) suggest. It is not clear whether he did this.

Thirdly, the discussion topics do not seem comparable. Warschauer instructed students to discuss two topics. He states these two are counter-balanced questions. However, the students’ involvement in discussion might have been different depending on their experiences in their personal lives. For example, the first question was about the relationship between parents and teenage children. If the students have their own children, their responses are more likely to be grounded in experience than otherwise. Suppose the case of this grouping: If all four subjects in Group 1 are parents and all four in Group 2 are not parents. This supposition is not trivial because the total sample size is so small that individual differences can affect the results. To minimize readers’ doubts he could have provided more information about the individuals. I shall discuss the problem of opacity in his data presentation further in the next section.

With regards to the reliability of Warschauer’s data collection, one of the other major problematic issues is his questionnaire. For example, to what extent can his questionnaire in Tables 4 and 5 be seen to be a reliable measure of attitude? There are six questions in Table 4 titled ‘Student Attitudes Towards Face-to-Face and electronic Discussion’ using Likert-Scale responses. He concludes that on the whole the students reported that:

… they could express themselves freely, comfortably, and creatively during electronic discussion, . . . and that they did not feel stress during electronic discussion. (Warschauer, 1996; no page numbers)

Warschauer calculated correlations between average attitude scores on each question with IPC. However, these six general questions are hardly adequate to measure the differences in attitudes in two modes. For example, Warschauer simply asked the students whether they feel stress in the f2f mode. It is unlikely to be able to probe the factors such as: What factors in electronic mode encourage or discourage students to participate in discussion more than in f2f? His questions should have been more explicit to investigate the differences between the two modes. Furthermore, by using the average scores for each question for the whole group of students, Warschauer is misusing the Likert-Scale method which is designed to discriminate between respondents (Brown & Dowling, 1998). Using items separately is a mistake that Brown and Dowling (1998) point out is common amongst beginning researchers.

2.1.4 Data analysis

Warschauer used the sentence as the unit of analysis to analyse language lexically and syntactically. He claimed that on both counts the electronic discussions involved significantly more complex language than the f2f modes. For lexical analysis, he used TTR. He counted the total number of different words divided by the total number of words. He claims that the longer the sentence, the higher the students’ language ability. However, the quality of sentences is not always related to the length of sentences or the total number of different words. The nature of utterance is not validly measurable in this way.

Secondly, in relation to the interpretation of turn-taking/interaction, Warschauer seems to lack the understanding that f2f and electronic discussions are two different modes of communication which have to be measured in different ways. For example, if someone only says “Yeah” in f2f, he counts it as one interaction and points to short turn-taking in f2f mode in comparison with the electronic mode as in Table 6. The problem here is that he counts every utterance. This does not seem to be a valid indicator, because in f2f the quality of interaction/conversation is different from that in the electronic mode. He does not seem to consider the fact that the asynchronicity of the electronic mode reduces the need for utterances that demonstrate that one is paying attention to the speaker.

Another example is that Warschauer ignores the difference in the types of questions observed in transcripts in f2f and electronic mode. In f2f mode, there are local questions. Someone asks one person and the other responds. (e.g. S4: What about you? S2: Me? S4: Yeah. S2:I make my own decision.) Although this happens in a group discussion, the interaction seems to be one-to-one. In the f2f mode, the one asking the question can look at an individual in order to establish the one-to-one mode. This kind of question is less likely to happen in the electronic mode because body language is not available. In the electronic mode, there are general statements starting with ‘I think’, for example. These always look like one-to-many. This may be an important difference between the two modes. But by comparing one-to-one communication with one-to-many he is not comparing like with like. Warschauer also mentions

Several features of electronic discussion - the longer turns involved, the more equal opportunity of all students to express their ideas. (Warschauer, 1996; no page numbers)

However, there is no evidence presented that attests to the relationship between long turns and high participation rate in his data.

Thirdly, Warschauer does not clarify the characteristics of language learning in the computer mode. These are not nessesarily identical with written language in traditional environments (Pachler, 2001; Negretti, 1999). Nevertheless, the differences that Warschauer’s analysis measures between the f2f and electronic modes may be explained by the differences between speech and writing which are generally different in terms of structure (Kress, 1994).

In addition, the move from the spoken to the written mode might be an influential factor relating to the increase in the participation rate for Japanese students. Warschauer claims that the Japanese greatly increased their participation rate in the computer mode. As he himself points out it might be because Japanese get very little oral communication practice. In fact, reading and writing are more focused on than speaking and listening when learning English in Japan (Kobayashi, 2001)[1]. Japanese students are generally better in these modes. The participation rate of Japanese students might have been as high in f2f mode as in the electronic mode if they had written out their answers first and read them aloud in the f2f situation. Or the participation rate might have been lower if they had been told to write spontaneously in the electronic mode (like a chat). What he measured might not be what he thought he measured. This characteristic of Japanese learners is important because the Japanese national language group consists of 5 subjects out of his total 16 subjects.

I have addressed the problems of reliability and validity in the operationalisation of Warschauer’s research question in a specific empirical setting, mainly in data collection and analysis. Brown and Dowling (1998) raise three criteria for the power of the analysis and one of them is the integrity of the concept-indicator links. They argue that this refers

… to the extent to which information is being read accurately and consistently, that is, to the validity and the reliability of the data. (Brown & Dowling, 1999; p.81).

This is what Warshauer lacked in his study at the stage of his operationalization of the research question in his empirical setting.

2.2 Representation of data

2.2.1 Lack of information

Firstly, in regards to the problem of the representation of data, Warschauer does not provide readers with enough information to assess his research. In some tables, opacity in his data makes it impossible to know the groups in terms of any of the variables. He might have represented data in a more visible way. He could, for example, have provided readers with a single tabulation which contains all of the information for each subject. For example, Table 3 could have provided individual information on participation rate (f2f and electronic mode), nationality, gender, SLEP score, time in the US, age, attribute, complexity of language, qualitative extracts, order of presenting the two questions.

One of the examples in which opacity creates a problem is when Warschauer claims:

… the study demonstrated a tendency toward more equal participation in the computer mode, with three of the four groups substantially more equal in electronic discussion and the overall participation rate twice as equal in electronic discussion as in face-to-face discussion. (Warschauer, 1996; no page numbers)

However, according to his data, one group showed no trend toward greater equalization in the computer mode in his data. As shown in his Table 1, Group 1 was the most equal group in f2f mode but turned into the least equal in electronic mode. The Gini Coefficient of participation inequality[2] changed from 0.10 to 0.30. He suggests that this could be related to the fact that Group 1 did not have any Japanese students whose participation made a significant change to the results of the electronic mode. However, readers cannot access this data from Table 1 and comment on his speculation, because there is no information about the distributions of nationalities in each group. Thus, Warschauer’s way of representing data creates a serious problem in which readers cannot assess his argument. Furthermore, in this case, he did not explain anything about the decrease of the Gini Coefficient in Group 1 which directly contradicted his major claim. As Group 1 comprised 25% of his sample, this has serious implications.

The second example is observed in his description of his method. Warschauer (1996) states that he used a ‘counter-balanced repeated measure procedures’. He considers the order of modes to be counter-balanced. However, participation rate might have been influenced by a combination of mode and question order. The problem is that Warschauer uses an experimental design but cannot satisfy the criteria this sets for the establishment of causal forms of explanation. The ‘counter-balanced repeated measures procedure’ suggests that two of the groups discuss in f2f mode first and the other two in the electronic mode. The order in which the two f2f groups discuss the two questions will be different, similarly the groups in electronic mode. The experimental design aims to eliminate the influence of question order and the order in which the two conditions are experienced. However, this does not work in his study because the four groups are not comparable. Warschauer seems to be unable to discuss this because he has shifted from an experimental form to a loosely structured observational form of investigation. The comparisons of the groups are meaningless as they are not comparable in terms of either composition or experience within the setting being observed.

The third example of the opacity of the data representation is in Warschauer’s claim that, in the correlation between the participation rate and students’ attitude, the survey supports the hypothesis that lack of oral fluency and discomfort in speaking out are important factors in determining participation in both modes. So students who think that they are not fluent tend to participate more equally in the computer mode. However, here again, Table 4 and 5 do not present any evidence about which students have high scores in fluency and how much they participated.

The fourth example is observed in another of his arguments. He states that the Japanese speak much less in the f2f discussion and that four of the five Japanese students were virtually silent in f2f mode. But the fifth Japanese student made 48% of the comments in her f2f discussion. However, he points out that this student is married to an American, and offers this as an explanation for her high level of participation. However, from Table 1, it is not possible for readers to know which student in which group the author is talking about and which student comes from which country and who is married to American etc.

Therefore, when one out of five Japanese participated a lot in f2f discussion, against his assumption, and if the author cannot ignore this exception, he resorts to an anthropological type of explanation. Nonetheless, he did not appear to have investigated the personal lives of the other Japanese students who may have had even greater exposure to English language and American culture, because personal information of students came only from his survey answers. The opacity of his data followed by an author’s unprincipled manner inevitably leads readers to doubt the reliability of the data. I shall discuss further his quasi-anthropological approach to generalizing in the next section.

2.2.2 Quasi-anthropological element

Warschauer’s wanted to examine which factors among four (SLEP listening score, SLEP reading score, time in U.S. and age) were correlated with IPC. He states that his result shows correlation between the increase in participation in the electronic mode and SLEP listening score. It is against his assumption that greater listening ability would correlate with the increase in participation in f2f mode. However he suggests that other factors such as shyness, rather than failure to understand the discussion, might have caused some students to limit discussions in the f2f mode but to participate more equally in the electronic mode. However, he has no basis upon which to make such a suggestion from his research data which does not have any description of students’ shyness. Readers do not know anything about the correlation of student’s personality and participation rate from Table 2. He should have examined: why SLEP listening scores correlated with participation rate in the electronic mode and not in the f2f mode as he had speculated; or why the SLEP writing score did not correlate with participation rate in the electronic mode. However, he simply introduced the unsupported speculation that shyness interferes with participation in f2f.

In his interpretation of data in relation to nationality and the participation rate, Warschauer again made fatal mistakes in generalizing without having any evidence. He explains the difference between Filipino and Japanese students from two points of view: learning style in the classroom and cultural factors. He states that the reason for low Japanese participation rate in the f2f mode comes from the fact that Japanese schools socialize students to listen quietly, rather than to speak up. It seems plausible, however, there are two main errors here. One is that he does not present any data or evidence about Japanese learners. In fact, Japanese students are said to be unfamiliar with discussion in general no matter whether it is in the f2f mode or in the electronic mode. Therefore, using this cultural factor to explain the difference between electronic and f2f mode is not adequate. Moreover, there may be another factor. In the f2f mode, students can see the presence of authority, a teacher-like person even if he/she is an outside observer, whereas in electronic mode, they only face computers. The absence of authority in discussion might enhance Japanese students’ willingness to express their opinions freely. This is observed in active staff meetings or student meetings from my personal experience. Warschauer’s error is that he speculated on a reason apparently drawing on a stereotype and generalized it without presenting any evidence from his data or antecedent work.

3. Conclusion

Warschauer’s research interest was in comparing the equality of ESL student participation in two modes, f2f and electronic discussion. His findings suggest that electronic discussions foster greater equality of participation than f2f discussions. However, in this paper, I have pointed out some important limitations relating to the empirical setting, representation of data and generalization.

Warschauer presents the research as an experiment. However, it was arguably inadequately conceived and operationalised. In other words, his experimental design was not sound with its small sample size and the short period in order to claim a causal relationship. His sampling strategies left the problems in representativeness. He could have minimized the problems in his research design by taking some measures, such as controlling variables in his design or providing readers with individual information about subjects in some depth as a case study basis. However, he did not.

Instead, Warschauer shifted from experimental form to a loosely structured observational form of investigation, which caused another problem in the generalizing phase. He failed to establish the relationships between concept variables, which raised the issue of the validity of his inferences of implication/causality in his generalizations. I have also questioned his methods of data collection and analysis and the opacity of his presentation of his data.

References

Bell, J. (1999) Doing Your Research Project. A guide of first-time researchers in education and social science, Third Edition, Milton Keynes: Open University Press.

Brown, A. & Dowling, P. C. (1998) Doing Research/Reading Research: a Mode of Interrogation for Education. London: Falmer Press.

Kobayashi, Y. (2001) ‘The learning of English at academic high schools in Japan: students caught between exams and internationalisation’. In Language Learning Journal. Summer 2001, No. 23, pp 67-72.

Kress, G. (1994) Learning to Write. Second Edition. London: Routledge.

Negretti, R. (1999) ‘Web-based Activities and SLA: A Conversation Analysis Research Approach’. Language Learning & Technology, 3(1). July 1999, pp.75-87.

Pachler, N. (2001) ‘Connecting schools and pupils: to what end? Issues related to the use of ICT in school-based learning’. In Leask, M. (Ed) Issues in teaching using ICT, London: Routledge Falmer.

Warschauer, M. (1996) Comparing face-to-face and electronic discussion in the second language classroom. CALICO Journal, 13(2), pp.7-26.

www.III.hawaii.edu/web/faculty/markw/comparing.html

(Last accessed 12/12/2000)

[1] This was also observed in my own teaching experience in Japan.

[2] The Gini coefficient of participation inequality for each group was computed using the number of words per speaker. It takes values between 0 and 1, where 0 means perfect equality. It is calculated as G=2/3(the sum of Xi-1/4) for a set of observed participation rates.