Abstract
For over five years, student evaluation of teaching through end of semester questionnaires (SETs) has been mandatory in Japan. Evaluation has been conceived by a centralized bureaucracy and delivered to schools as an imperative, but often without clarification of aims or purposes. This paper, utilizing a case study methodology, examines through interviews the perspectives of 6 ELT teachers working within a Japanese university about the introduction of SETs. Trends emerged which suggested that teachers saw the threat of evaluation differently. Tenured teachers saw a potential threat in the future, but part-time and limited-term contracted teachers felt disadvantaged in terms of job conditions, and extremely vulnerable to retention decisions. As a by-product of status issues, an atmosphere of mistrust, data manipulation and an “us” versus “them” viewpoint emerged, surely at odds with an effective evaluation system which should lead to improvement and be beneficial to teachers and learners. Although the study is from a Japanese perspective, the findings of the study are pertinent to wherever, generic, cross-curricular ratings forms are administered.
Key words: teacher, perceptions, student evaluation, Japan
Introduction
This case study examines the reactions of six teachers each employed under different working conditions within a recently “corporatized” former national university in western Japan to the introduction of mandatory, end-of-semester student evaluation of teaching surveys (SETs). It has been suggested that Japan has entered an “epoch-making phase in the history of higher education” (Arimoto, 1997, p.206) whereby a “Big Bang” (Goodman, 2005, p. 2) has led to a Ministry of Education initiated rush for reform. The introduction of SETs is one part of sweeping changes in the ways universities are organized in response to market forces.
The expansion in the use of SETs in Japan came as a more economic centered, more market sensitive, decentralization movement emerged at the start of a new millennium in partial response to the expected decline in the 18 year-old population (Yamada, 2001). As universities total capacity to accommodate new entrants will reach 100% in 2009 (Tsurata, 2003), universities are “now subject to buyer’s market where students are courted customers rather than supplicants for admission” (Kitamura, 1997, p. 145). In a time when the public has become more critical of government spending, a market agenda underpinning government strategy to determine accountability in education led to the introduction of SETs. The Ministry of Education (2003) claims that the disclosure of evaluation of individual faculty members reached 100% compliance at national universities from 2001, and while the Ministry of Education does not dictate the timing of evaluation, the university in this study requires SETs to be administered before summative testing in week fifteen, or the day when most students are likely to attend. This SET utilizes a Likert type 1-5 scale and asks the students nine questions which can be found in Appendix One.
Yet, Ministry of Education policy decisions to promote this form of student evaluation are made without clarification (Miyoshi, 2000), and are seemingly based on a “power over” (Stronge & Tucker, 1999, p. 340) prerogative whereby school administrators and teachers simply “implement the decisions being handed down to them” (Markee, 1997, p. 63). Thus, reform in Japanese education has been described as top-down (Gorsuch, 2000; Wada 2002), but made opaque through the “extraordinary reluctance to clarify, define, and articulate policy” by the Ministry of Education (Miyoshi, 2000, p. 681). While evaluation should be seen as “an agent of supportive program enlightenment and change” (Norris, 2006, p. 578), the rhetoric of evaluation with its numerous English terms such as “Faculty Development” and acronyms such as “FD” are little understood by school administrators (Tsurata, 2003), leading to confusion as to the evaluation aims or its purpose. For example, the timing of evaluation often in the final class does not suggest a formative, feedback loop. As Alderson (1992) notes, if evaluation is left to the end of a course it loses any opportunity to inform and influence teaching, therefore “externally imposed ideas and constraints are harnessed to managerial interests rather than the concerns of theories of professional development” (Johnson, 2000, p. 433).
We need, as Auerbach (1995) reminds us, “to look at evaluation through a new lens” (p.11) to investigate the extent student ratings have actually been accepted and used and university teachers’ understandings of the purposes of evaluation. While evaluation has been seemingly accepted by researchers in the field, in this time of educational reform in Japan, the degree of receptivity to, and the consequential validity of, SETs through non-empirical studies to suggest how evaluation can be improved has not been undertaken. The aim of this study to gain insight from teachers in a context where implementation of SETs as an evaluation tool has been much slower than the West. Most previous studies so far have been concerned with collecting “empirical evidence” away from a Japanese context, representing a positivist methodology of predicting teacher response, while this study, being interpretive, seeks to understand. Through a reflexive stance, learning about teacher views of evaluation, why teachers hold their views, and the sharing of views to expand knowledge of evaluation is the rationale underpinning this study.
What are the purposes of evaluation?
We need to consider what purposes SETs hold as the issue of credibility is crucial for receptivity of teachers to evaluation and its potential for growth. Purposes of evaluation tend to focus on the formative purpose as diagnostic feedback or the summative purpose for personnel management decisions such as for tenure, promotion, or retention. Centra (1993) contends that in America, SETs were devised initially for only formative purposes, but when such informative feedback was seen as non-threatening, administrators endorsed the use for summative purposes. This, however, alters the effects on the teacher and changes the role of the administrator. Ryan, Anderson and Birchler (1980) showed how there was considerable consensus about the use of information for formative purposes as feedback information was available only to teachers. Once a summative purpose was introduced, the incentive for high ratings led to negligible faculty improvement, a decline in faculty morale, distance between faculty and administration as well as a “decrease in faculty compliance with institutional regulations” (p. 329) to offset faculty unease over the summative use of evaluation. As evaluation forms are typically administered in the last weeks of the semester, the failure to close the feedback loop by handing evaluation results back to teachers so that they can institute improvement during the lifetime of the class means that this form of evaluation fails to provide improvement (Harvey, 1997), and would seem to suggest evaluation is for summative purposes.
Previous studies of teacher perceptions
While most previous studies suggest that faculty do find student evaluation generally worthwhile from a formative perspective (Ryan et al., 1980; Ory & Braskamp, 1981; Schmelkin et al., 1997; Simpson & Siguaw, 2000; Nasser & Fresco, 2002; Yao & Grady, 2005; Moore & Kuol, 2005), one study by Howell and Symbaluk (2001) cited numerous faculty doubts over whether accountability or effectiveness are promoted. In Ory and Braskamp (1981), faculty indicated that students’ written comments were potentially more accurate, trustworthy, useful, comprehensive, believable and valuable when used for self-improvement rather than promotion purposes, while Yao and Grady (2005) found significant differences in feedback depending on rank, field, class size and class level. The highest percentage of respondents reporting regular use of feedback came from instructors or lecturers, but overall there was resistance to changing aspects of teaching that instructors had valued over time. Ryan et al. (1980) found that three quarters of faculty collected information in a systematic manner before being required to do, yet less than half indicated they looked at results casually, infrequently, or not at all after compulsory administration. Moore and Kuol’s (2005) small study of 18 faculty in Ireland similarly found that instructors were relatively positive particularly with respect to open comments, but teachers who received largely positive feedback were less likely to use findings as a rationale for change. Schmelkin et al. (1997) investigated responses from 420 full time and adjunct faculty who showed no pattern of consensus for usefulness of ratings for either formative or summative decisions. Generally, teachers were not receptive to making changes in their classes from semester to semester, gave low responses the utility of mid-term evaluation, while the majority did not talk about evaluations before administering the questionnaires.
Most studies routinely suggest that SETs have both summative and formative purposes, but faculty are concerned about the former use. Simpson and Siguaw (2000) found that “objective” data results were used by chairpersons or other administrators in reappointment, promotion or tenure recommendations and due to validity and reliability concerns, close to 40% of faculty “selected” which student feedback to submit. In the same study, faculty were concerned over norm-referencing of data so that half of the faculty would be below average, which is consistent with Theall and Franklin’s (2001) conclusions that even if data from SETs are technically rigorous, student ratings are often “misinterpreted, misused and not accompanied by other information that allows users to make sound decisions” (p. 46).
Anxiety and resistance increases through poor use of SETs which contradicts the validity and reliability of evaluation claimed by the literature in America. Resistance led faculty to decrease class workloads (Ryan et al. 1980; Nasser & Fresco, 2002), while Simpson and Siguaw (2000) found faculty lowered grades and course standards and offered inducements. Yao and Grady (2005) noted that junior faculty members had strong motivation for using feedback in order to get positive evaluation in annual review and promotion decisions, so the major change was towards easier assignments and easier course content.
The literature in Japan
While the focus in the above studies was centered on issues of validity arising from evaluation being used in personnel decisions such as the granting of tenure, the non-renewal of contracts for adjunct (part time) staff, and for salary adjustments, there have been very few studies discussing teachers’ viewpoint and their status in Japan. This is despite the precarious position many English teachers find themselves in. While teachers may optimistically see evaluation as a teacher development tool at best, or at worst a ritual universities go through in response to Ministry of Education dictates, there have been extremely few academic articles written, only occasional comments buried away in teacher’s own homepages, and brief sensational articles in the lay press.
Worth noting however, is the comments of the Part Time Lecturers’ Union (Kansaien Daigaku Hijokin Kumiai, 2005) which deplored the perceived lack of student responsibility for their own learning, calling for an end to anonymity, and suggested that a greater emphasis is needed to ascertain students’ degree of attendance, their willingness to preview and review class material, and the degree of active participation. Implicit is a view that SETs encourage a passive student role conveying a sense that all the responsibility for teaching and course management fall squarely and solely on the teacher. The Union later cites the double standard in summative uses whereby part time teachers are dismissed while for tenured (Japanese) faculty, it is used for “little more than external publicity” where perfunctory evaluation is “just following guidelines.” The Union calls for more systematized procedures including a more accurate compilation of data, for the data to be made externally public, and prior to teachers being dismissed due to poor evaluations, calls for warnings and remedial feedback.
The credibility of data was touched on in the homepage of Kogakuin University (2002) which reported teachers’ strong views about students’ ability to fairly fill in open-ended questions. Echoing Attribution Theory (Weiner, 1992), evaluation encourages acceptance of credit for feelings of success while denying responsibility for undesired outcomes. As evaluation is often administered extremely close to the final test, expected high grades are attributed to intelligence, while potential low grades are blamed on poor instruction overlooking students’ own poor learning strategies. Similar to comments by the part time teachers’ Union above, teachers in the homepage question learners’ ability to evaluate fairly when they frequently come to class without textbooks or notebooks, where willingness to learn was low, and where around “fifty percent of students” were so apathetic that it became impossible to teach according to the syllabus. Students who blamed teachers for their own “failings” in class evaluated harshly while deliberately choosing to sit at the back of large classrooms where it was difficult to hear and see class content.
Kinoshita (2005) writing in the Asahi Shimbun also suggests the practice of anonymity should be ended so that students fill in comments honestly, while Kawanari (May 2005) reports on evaluations filled out immediately after exams being often filled with “abusive invective.” In journals, Ryan’s (1998) article warns of students’ culturally determined expectations which may lead Japanese students to “judge their teachers against standards that are literally ‘foreign’ to their native-speaking teachers” (p. 9). Shimizu (1995) suggests that Japanese students evaluate Japanese and foreign instructors by different standards as Japanese teachers’ knowledge of the subject area is seen as important, and that Japanese teachers are valued more for scholarly skills such as intelligence, knowledge and accuracy, while foreign instructors are expected to be “entertaining” (p. 8). However, Shimizu (1995) concluded that “some students may not seriously participate in classes taught by foreigners because they feel classes are trivial” (p. 8), while Ryan (1998) warned that teachers need to be aware of students’ beliefs about ideal teachers especially since evaluation has been given impetus by Education Ministry recommendations (since implemented) that tenure be replaced by a contract system.
This small section has outlined some anxieties of teachers in Japan and how they are slowly voicing their concerns over evaluation. As Pennycook (1990) notes, the increasing importance placed on evaluation represents greater control over teachers leading to standardization of the curricula, accountability and educational responsiveness to market forces, deskilling teachers and leading to pre-specified teaching procedures.
Materials and methods
This study aims to make a specific contribution to the body of knowledge of teachers about their receptivity to SETs and how it can lead to improvement. To investigate faculty perceptions, a case study approach using in-depth interviews was used so that the “stories of those living the case” (Stake, 2000, p. 437) are a “step to action” (Bassey, 2000, p. 23). The focus is not on discovering universal laws but on interpreting and understanding experiences. Through understanding a single case in its natural setting, this study, being interpretive, seeks to understand through enquiry in a real-life context instead of “contrived contexts” of experiment or survey (Bassey, 2000, p.26).Through a reflexive stance, the rationale underpinning this study is that through description (Stake, 2000), learning about teacher views, why teachers hold their views, and sharing views among participants are keys to expanding knowledge and to encourage propositions (see Punch, 1998) in other similar situations.
Research questions
This study investigates six university teachers’ perceptions of, and reactions to, SETs in Japan, how useful the common questions found on such forms are, and whether the questions are useful for formative improvement. Although an emerging design, the key research questions sought insight into
- Teachers’ attitudes about the purpose of the evaluation process
- The usefulness of the questions for instructional improvement
- The usefulness of the feedback for instructional improvement
d) The degree to which participants are affected in their daily teaching by the introduction of SETs
- Whether the use of SETs represent teachers’ conceptions of teaching
The setting
Perspectives where sought from ELT teachers from a number of categories working within one recently “incorporatized” former national university in Western Japan. The categories of teachers can be defined as:
- Full time “tenured” local and expatriate teachers
- “Limited (or “fixed”) term” contracted local and expatriate teachers
- “Part time” local and expatriate teachers
In this study, one local and one expatriate tenured faculty participated, one contracted expatriate teacher and three part timers one of whom is Japanese took part. Participant details can be seen below:

The Ministry of Education (2003) claims that the disclosure of evaluation of individual faculty members reached 100% compliance at national universities from 2001, and while the Ministry of Education does not dictate the timing of evaluation, the university in this study requires SETs to be administered before summative testing in week fifteen, or the day when most students are likely to attend. This SET survey utilizes a Likert type 1-5 scale and asks the students nine questions which can be found in Appendix One.
Data collection and analysis
As evaluation is inherently political, anonymity and confidentiality procedures were outlined and that both one-hour interviews would be tape recorded to aid verbatim transcription. After transcribing data from the first interviews in February 2005 which followed the administration of SETs in participants’ classes, the transcriptions were sent back to each participant. The second interview took place after the summer from the end of September. The time was chosen to encourage participants to reflect on the first interview, to give time to read, reflect on and comment on the transcriptions, and to hear whether teachers had changed their administrative procedures of the evaluation form at the end of the semester in July.
Lincoln and Guba’s (1985) “constant comparative method” (p.341) was helpful in guiding the analysis. Although each interview progressed down its own path according to the direction of the responses, as the interviews were semi-structured an idea of important themes and concepts was gained from the research questions. Following transcription, data were “unitized” (Lincoln & Guba, 1985) in terms of information that is the basis for defining categories. Through constantly comparing at this stage, there is, as Lincoln and Guba (1985) note, a shift from comparing incidents to whether incidents exhibit similar properties. Through constantly comparing categories for overlapping themes, subtle differences meant a new sub-category was needed or that the category needed to be redefined. Through comparing data, the relationships became more apparent and the categories more coherent.
The findings
Case 1: Koji
Koji was the only tenured local English teacher to participate. He is in his early 40s and has been evaluated “continuously” for five years, most recently in five classes: two “Writing” classes which are electives with around fifteen students, an elective “Poetry” class of fifteen students, and two compulsory “General English” classes of 40 students. In the latter classes, he tries to “act like a non-Japanese” using only English, but as students are poorly motivated, he tends to use more Japanese as the semester progresses. Koji said that he finds evaluation important to learn both the extent his teaching makes the students happy and as his ability to convey to the students his passion for the subject. When he remembers, he explains a formative purpose for evaluation to make his classes better, saying that he will “ponder” on “bad points.” Evaluation is a way to be strict with himself, to make lessons enjoyable, and to see that class content is organized and systematic in accordance with the syllabus. At this former national university, SETs reflects teaching as one of four categories of self evaluation and thus represents “institutional pressure” However, he rarely gets little beyond positive answers as students often lack motivation and can be lazy. He has observed students filling in responses “automatically” and he wished for more meaningful evaluation based on inspecting classes which he suggested is too radical for traditional, conventional teachers--too many teachers just “do what they want” in class.
Although mid-term evaluation would encourage conscientious teachers to use feedback later in the term, he says it takes time for students to be accustomed to his teaching style. He sees evaluation as questionable, with the primary function as following the “American style” as American universities are perceived as “advanced” but without substantial consideration of evaluation rationale for how results are used. He also questions socio-cultural differences whereby Japanese “put a mask on” for evaluation as silence is a “virtue and honesty is problematic. Lacking clarity, evaluation is written by “big cheeses” in power concerned with making the university more attractive and thus able to survive severe retrenchment. While he tries to do his best for the students and he cares passionately about both teaching and research, he wonders about the commitment of his colleagues so sees results utilized to “increase mobility” of teachers as “wonderful” in principle. He would accept “awful” evaluation if he is properly evaluated, but feels “theoretical anxiety” with questionnaire items and content. To encourage learner autonomy to understand both the joy of studying and the joy of developing both their interests and discovering latent talent he assigns out of class work, he challenges students involves “dragging” and “persuading” them to change their preconceived high school attitudes to learning. This is not represented in evaluation and, seeing teaching as an art, or self-presentation or drama, he worries that functional, teaching methods implicit in SETs are a “kind of restriction in a positivistic” way. Instead of teaching as a technical act following a formulaic teaching style, he sees his role either as an artist searching for interesting, unfamiliar styles or as a creator molding the sequence for students who are in unique classes.
Case 2: Ed
Ed is in his 50s and comes from America. He was employed as a “Gaikokujinkyoushi” or “Foreign teacher” employed on a limited term contract, but became tenured four years ago. He has experienced evaluation twice a year for six years. Most recently, evaluation was carried out in six classes, four compulsory “Conversation” classes of between thirty two and forty four students, and two elective “Advanced” English classes one of which had ten students and the other thirty two. He says he is told to give the evaluation on the day most students will come which is partly due to low student response rate in past years. He feels that as students do not see the results of the evaluation “in any shape or form,” so evaluation is “not a big deal” and questionnaires remained “stuffed” in their bags. This ambivalence hastened a change in administration policy from individual student responsibility to one where faculty collect the SETs data at the end of class. On the day of evaluation, he administers two evaluation forms as he assumes that the official form is “part of the course and just something they do” explaining that it is for “people above him” in the “office” and is completed in Japanese. His self-written form is for his own formative feedback, while official data go a “computer statistical center” which “crunches numbers” that are later evaluated by an “unknown” committee. Feeling that the questions were made a long time ago” by someone in “hard sciences,” he is apprehensive about how students interpret his class according to the criteria of the questions. The form emphasizes the transmission of knowledge through lecture-based teaching which, while not representing his style of teaching, reflects notions of “GP” or “good practice” a buzz word, he hears at the school. He sees minimum standards being maintained so if teachers are above this minimum then “you’re good enough.”
As his results are satisfactory, feedback only serves as a check to see if he’s in the right “ballpark,” but he is wary of the motivational effects of normatively ranking and “measuring” teachers in “league tables.” As there is an “illusion” of good practice, he just looks at the results visually, and as he thinks the free comments more accurately reflect what he is doing in class he is not wholly dismissive, but he feels his own formative feedback is more reliable and detailed. Like Koji, he feels mid-term administration is questionable as his class builds up to a climax, the big bang coming at the end with their presentations. While the feedback from SETs can take four months by which time he has forgotten the class, he still feels benefit from seeing the feedback displayed.
Case 3: Ayumi
Ayumi is in her early forties and works part time. She has experienced evaluation eight times and the last time evaluation was carried out in two classes, a TOEIC class which was a “required elective” of forty students, and an elective “Advanced” English class of fifteen students. She says she gives the evaluation after the summative, end of semester exam which she finished quickly. She does not give any explanation beyond asking the students to “spend five or ten minutes” filling out the question, but observed that students do not spend much time thinking over their answers so free comments are often perfunctory. She assumes that open-ended comments are “messages to the teacher,” but closed data being used for her evaluation, are “nerve-wracking” as she does not know “the system.” If she sought formative information, she would, like Ed, administer her own evaluations, as results from the official evaluation are returned late in the subsequent semester, losing any opportunity to inform teaching. She assumes “they” look at the data to determine contract renewal to reduce labor costs and worries over her future, seeing her career resting on the results.
While it is important to know students needs and wants, she thinks that students need to evaluate their own performance, and feels she is unable to give an “accountable” lesson in classes of over 60 students which she would like to bring to the attention of the administration. Ignoring class size is unfair as her teaching is affected, so classes should be compared across sizes, but she admits to feeling helpless. She recounted how in her previous job at a language school, evaluation purposes were clearly explained and although retention decisions were a factor, formative and summative evaluation utilized self assessment, observation, and peer-evaluation to create a fair and constructive environment where exchange feedback was encouraged.
Case 4: Melvin
Melvin was employed as a “Gaikokujinkyoushi” or “Foreign teacher” employed on a “limited term” contract at the time of the second interview. He has experienced evaluation fourteen times, most recently in six classes. Five of the classes were compulsory “Oral Communication” classes of up to seventy students and the sixth was a compulsory “Writing” class. He gives evaluation in the final class in which he does not give a test, and does not suggest any purpose when distributing the forms just pointing to the Japanese and saying “read that.” He has heard he is supposed to leave the classroom when administrating, but rarely bothers and has observed that the students do not expend much effort and feels that at around 20% the response rate is very low; the biggest response is the “lack of response.” He compares results to a “well-balanced” breakfast with students choosing scores of 4 or 5, some adding their names to comments in English. He believes that students expect him to know who favorably comments, highlighting grading influences and unclear evaluation purposes for students.
He suggests that mid term evaluation would be beneficial if results came back in a timely fashion so he could gauge student progress and adjust the remaining classes. He worries about discriminate validity which may confound evaluation as “the tradition that the students have had for six years being thrown on its ear is difficult” so that after just 7 weeks students pedagogical perceptions of high school learning may be an influence. While his scores remain high he does not feel threatened, just a little disillusioned at the lack of response, but he consciously makes the final class more “fun and interesting” to “load” the questionnaire, believing that students would judge the whole semester on that one class. He sees the temptation for tenured professors to fall into ruts and once teaching becomes repetitive, it is devalued. Forgetting how to change is hazardous especially if teachers are not compelled to change. Therefore, although an evaluation process is necessary, the lack of immediate formative encourages the belief in an “over-watching,” summative stance by the administration. While welcoming this, he adds that rigor is needed in both the administration procedure and in questionnaire design as he is more concerned with what the students think of their learning experience than what other faculty members may think, but remains unsure whether students are capable of accurate evaluation. However, student views need to be expressed in tenure decisions as universities are becoming businesses so need to have satisfied customers in what has become a buyer’s market. The meaningful relations he tries to forge with students are not expressed in the questions which reflected a limited view of education emphasizing “little aspects.” Implicitly, following the evaluation means teaching to a certain method or a set of competencies which neglects individual personalities and interpersonal ability in favor of recordable standardized teaching. Teaching goes beyond an objective “science” to foster relationships which cannot be represented on a scale. The value of evaluation is concentrated on limited, tangible and non-intellectual aspects of the “job” such as punctuality which, while important, ignore the greater responsibilities on an interpersonal level. Melvin sees ‘starting and finishing time’ as an example of overly trivial questions focusing on manifestations of overt teacher behavior. Ratings do not address the quality of learning processes nor outcomes as he sees teaching in terms of a “facilitator” role, guiding students to gain knowledge heuristically, by themselves. SETs encourage an “administrator” viewpoint emphasizing classroom organization and the presentation of learning, instead of encouraging “practical knowledge,” observing and interacting with their peers and being receptive to learners. Therefore he encourages reflection through journal writing, noting when activities “worked” and changing his classes while reflecting during his teaching. He observes how students are responding in the class and so has “a pretty good idea” of student responses and so in evaluation “the information is something I was close to guessing” if evaluation is seen as an expression of satisfaction.
Case 5: Jack
Jack is in his mid 40s and comes from Australia. He works part time and has administered SETs “about fifty times,” giving out forms most recently in four classes. These were three “English Conversation” classes of between thirty-two and forty-four students which were compulsory, and one elective “Conversation” class of twenty seven students. He gives out evaluation in the final class after the test which finishes 15 minutes before the end of class. When the students are ready to go, he hands out the evaluations without giving any explanation as he assumes the students know the purpose. He says the response from students is very low while he has a “vague suspicion” that results come back halfway through the following semester which is far too late to apply. He considers evaluation to be a chore for the students as they are required to do them in all classes so students do not “go out on a limb” so that only motivated students write comments and others respond by giving “444”. Lazy students, he suggests, “throw the average,” but he suggests that as overall student responses are similar across the curriculum, meaningful feedback is limited. The school offers little in terms of a developmental path so is “going through the motions” in terms of question content and timing of feedback; therefore, he would prefer to administer his own formative evaluation form. While the open comments, when satisfactory, give him a sense of validation, he feels that students are unused to his communicative teaching which emphasizes a process over final product, and questions students’ competence to distinguish between tasks that are “challenging” tasks and “too difficult.”
Having a pragmatic approach to the temporary nature of part time work, he can foresee a time when part time teachers may lose classes due to underperformance but at the present, he imagines the administration probably scan the data, notice that “nothing jumps out,” and so “throw it in the bin and that is the end” a practice he himself follows. As an improvement in the process would be give evaluation a purpose, he would like peer observation but suggests only we (“native speakers” (sic)) would be competent to “judge” as Japanese (“they”) are unaware of “good” teaching. He thinks that evaluating aspects of each class would be of value to determine “satisfaction” but sees teaching as an art in contrast to a craft implicit in evaluations. A craft can be learned like a set of techniques, but art is “inside” and can be developed. This inner, innate feeling tells him when students are preoccupied or tired so he can change approach to “catch the audience” which more mechanical teachers would be unable to develop.
Case 6: Eleanor
Eleanor is in her late 30s and comes from New Zealand. She works part time and has been evaluated three times in “conversation” classes. The students seem familiar with the purpose and are supplied with instruction sheet in Japanese which she assumes explains the rationale. She is uncomfortable with the questionnaire being reliant on student interpretations of definitions and as content lacks specificity, she gives students her own evaluation for formative feedback. As the students are “besieged” with evaluations and repeatedly fill out the same forms, she questions whether students can cognitively understand either the cumulative effects of their English education or her objectives and priorities so perceives evaluation as a popularity contest based on a “myriad” of notions of liking and disliking. As an ELT teacher, she feels disadvantaged with cross-curricular evaluation in a multitude of classes and teaching styles while being unaware of the “standards” expected by the administration. She questions the discriminant validity as students are speaking in language they are not “comfortable” with in activities which do not represent didactic teaching styles they are used to. She cites teachers’ non-assignment of homework and lenient grading as causing tension, while large class size limits her ability to communicate with learners which causes learner frustration. Upon receiving feedback, she first compared her ranking with others because, being employed on a yearly renewable contract, she is unaware of how data are interpreted and feels she needs to raise student awareness of potential consequence prior to evaluation. Her unease is expressed in her metaphors of “little men in darkened rooms” who compare her results and then “hatchet” those underperforming teachers. As the criteria are unclear, she would like, firstly, more objectives-based evaluation closer to her high school teaching experience in New Zealand, and secondly, an efficient feedback loop. Without this loop, students lack involvement, feedback is overdue so losing diagnostic potential.
While she thinks there are a lot of “bored, disinterested” teachers and wonders whether the lack of ongoing assessment would make peer evaluation viable, her personal growth occurs through dialogue with trusted colleagues and from day to day interaction with classes. She feels marginalized as a part time teacher because the time spent developing an interesting and worthwhile syllabus does not fit the narrow defines of “acceptable” teaching which has a knock-on effect on morale and job satisfaction. She sees the looming “2009 problem” of universal access as the legislative requirement for evaluation, so “they” need a way to “get rid” of teachers who are not a “student draw” or do not generate positive word of mouth that universities need to attract potential students. She suggests that university administrators are concerned with the framework or the publishable visible side of classroom evaluation but her job involves a group of students for whom the syllabus has little relevance. She constantly considers student engagement, monitors responses, and fine tunes her teaching while the evaluation represents the teacher standing at the front telling students what to do.
Discussion
Each participant has their own unique picture of faculty evaluation depending on their status and job security but, because of the lack of clarity of mandatory goals, overall findings suggest that:
- Student evaluations are administered haphazardly with little information given to students. Many students develop a cynical attitude and through the lack of student involvement in the feedback loop and as the use of results is often unclear to students, evaluation becomes a “perfunctory exercise of little impact” (Smith & Carney, 1990, p. 6), which jeopardizes reliability and consequential validity.
- Teachers would welcome meaningful improvement focused on improvement. Without feedback and only little information of any worth, and negligible mechanism for remedial help, the potential for teacher growth is limited which encourages complacency.
- However, although complacent, precarious working conditions faced by many ELT teachers who perceive the lack of diagnostic utility of evaluation as heightening the summative purpose and feelings of threat.
- Threat encourages doubt about the “legitimacy of data” (Schmelkin et al, 1997) as personnel committees are believed to use ratings to judge teaching effectiveness by comparing individual faculty results with departmental norms. The theoretical utility is dependent on the extent to which ratings agreed with standards of externally imposed notions of “good” teaching.
Teacher overall views on evaluation
Participants questioned the overall validity of evaluations as they are cross-curricular and used in all courses. The inappropriateness of questions for “the nature” of ELT is mentioned by Eleanor who thinks that evaluation is based on a “random set” of teaching processes that supposedly represent good teaching and it “infuriates” her that evaluation comes down to “ trivial questions that the teacher has very little control over.” This has created a “negative synergistic system” (Theall & Franklin, 1990, p.28) with teachers (us) fulminating against a recaltrant administration (them) and against other faculty (another them) believing that results are used capriciously, or only for punitive purposes. For improvement, evaluation must develop a sense of ownership so that teachers accept the validity of evaluation and have understandings of both the “mechanics” of the evaluation system and the rationale for performance criteria. As the results below show, this understanding is largely absent.
Trustworthiness and believability
Suspicion among participants about ratings used is fed by the long delay in returning results so that ratings are just “piled up in the office” (Jack) for some future time when a teacher becomes a liability to be disposed of (Eleanor). Responses are similar to Ryan et al. (1980) where teachers had interest in student evaluation prior to being required to obtain it, and subsequent receptivity to “official” evaluations noticeably fell. As participants say they are vulnerable as ELT teachers in their workplace due to evaluation being considered a factor for part time teacher retention, or contract renewal discussions, evaluation heightens this vulnerability. This vulnerability is reiterated at various stages of the interview when discussing purpose, threat, and relationships with colleagues. Unlike earlier studies (for example, Schmelkin et al., 1997), participants say that the link between teacher evaluation and actual course improvement is at best tenuous and, with no explanation from the administration, the purpose is unclear.
Threat
Similar to earlier findings (Ryan et al., 1980; Schmelkin et al.1997), participants found evaluation reduced morale, job satisfaction and personal confidence in the institutional administration which interferes with any potential formative use. Teachers who are on a contract or part time believe that ratings are linked to contract renewals, forms are read by those who are not themselves professional educators and who tend to interpret and assess teaching in terms of values external to the educational process. Participants suggested nameless, often demonized, persons echoing Gibbs’ (1996) view that “the people who sit on such panels did not gain that right by being excellent teachers or even by paying attention to teaching. They are seldom the right people to judge excellent teaching” (p. 47). The participants have little confidence in the ability of the administrators whose views are not consonant with teachers’ educational goals and conceptions of teaching.
The culture of ranking
Ed is “especially wary” of ranking teachers in league tables which emphasize “winning and losing” although there is little difference in scores. The “unintended social consequences of evaluation” (Braskamp & Ory, 1994, p. 5) where hostility and suspicion of the administration and other faculty becomes apparent has been caused by a culture of ranking instead of fostering development. This leads teachers to focus on short term and measurable results and to teach to the standards which they think they can exceed. Koji heard many rumors while Ayumi called the whole process “nerve-wracking” and she suggests the present system “condemns” teachers instead of encouraging teachers to learn from each other to improve teaching. Evaluation fails to must consider the commitment of teachers to the institution and fails to create a climate that is conducive to a high degree of teacher efficacy.
Limited improvement
While participants wished to get useful feedback from students as a dimension of good practice, many felt the potential threat of evaluation meant they were relieved to get good scores: While they continue to get high ratings, they see little need to change their practice. Bandura (1997) shows that teachers with low efficacy are pessimistic about improvement and often take a custodial view of their job.
Teacher learning
Participants make a distinction between technical top-down evaluation and the practical teaching which teachers display in their knowledge of what is educationally required in a particular situation. For many participants, evaluation questions are redundant or irrelevant to every day practice, as they can learn more from the personal day-to-day interactions with students which they use to adjust aspects of the class. Participants are not convinced of the link between evaluation and instructional improvement and instead, through SETs, good teaching is reduced to a “generic set of skills and actions” (Pratt, 1997, p.25).
New knowledge
Participants also state that the repetitive nature of evaluation diminishes the potential for new insights gain while some number suggest mid-semester evaluation should be introduced, which assumes a formative purpose of evaluation for teaching improvement during the lifetime of the course. The use of evaluation should not only lead to the improvement of teaching, but should improve the quality of teaching through preparing teachers to teach, to provide an environment where they can teach and most importantly, to motivate them to teach.
Conclusion
All of the ELT teachers who participated suggest that using student ratings as the sole criterion for evaluating teachers is flawed. Teachers, and often administrators, are unaware of the purpose of the evaluation which is not explained and often are just expected to administer without any consultation or input into the questions. In agreement with Stronge and Tucker (1999), there needs to be a dynamic relationship which enhances teachers’ ability to grow under the belief that through teacher improvement the institution is likely to be a better pace for learning. A system that aims to pinpoint poor teaching for remediation means that many teachers do not gain any new knowledge as they question the value of the information, and the lack of dialogue or feedback may be causing relationships to fracture. Participants believe evaluation is imposed so that individual teachers lose any sense of either responsibility for the effects of their work, or for autonomy in its performance. Evaluation often does not match participants’ conceptions of teaching. There is an assumption that in teaching, knowledge is transmitted which is reflected in the evaluation process. Evaluation did not address the metaphors that teachers use to describe their assertions about the nature of teaching and teachers reject a top-down model of university evaluation which tries to get teachers to comply with a set of externally generated criteria. Instead of being conduits of administrators, teachers see the importance of articulation of ideas for improvement, putting forward a dialogical approach to evaluation which involves their peers. Coming almost full circle, change only occurs when teachers receive new knowledge, value that knowledge, know how or receive practical help to change, and have the motivation to change. If teachers disagree or are threatened with the aims of the evaluation as imposed by administrators, they are likely to ignore any recommendations resulting from the evaluation.
References
Alderson, J. (1992). Guidelines for the evaluation of language education. In J. Alderson & A. Beretta (Eds.), Evaluating second language education (pp. 274-304). Cambridge: Cambridge University Press.
Arimoto, A. (1997). Market and higher education in Japan. Higher Education Policy, 10(3), 199-210.
Auerbach, E. (1995). The politics of The ESL classroom: Issues of power in pedagogical choices. In J. Tollefson (Ed.), Power and inequality in language education (pp. 9-34). Cambridge: Cambridge University Press.
Bandura, A. (1997). Self-efficacy: The exercise of control. New York: W.H. Freeman.
Bassey, M. (2000). Case study research in educational settings. Buckingham: Open University Press.
Braskamp, L., & Ory, J. (1994). Assessing faculty effectiveness. San Francisco: Jossey Bass.
Centra, J. (1993). Reflective faculty evaluation: enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass.
Gibbs, G. (1996). Promoting excellent teachers at Oxford Brookes University: From profiles to peer review in ten years. In Aylett, R. & Gregory, K. (Eds.), Evaluating teacher quality in higher education (pp. 42-55). London: The Falmer Press.
Goodman, R. (2005). W(h)ither the Japanese university? An introduction to the 2004 higher education reforms in Japan. In J. Eades, R. Goodman, & Y. Hada (Eds.), The “Big Bang” in Japanese higher education (pp. 1-31). Melbourne: Trans Pacific Press.
Gorsuch, G. (2000). EFL educational policies and educational cultures: influences on teachers’ approval of communicative activities. TESOL Quarterly, 34(4), 675-710.
Harvey, L. (1997). Student satisfaction manual. Buckingham: Society for Research into Higher Education/ Open University Press.
Howell, A., & Symbaluk, D. (2001). Published student ratings of instruction: revealing and reconciling the views of students and faculty. Journal of Educational Psychology, 93(4), 790-796.
Johnson, R. (2000). The authority of the student evaluation questionnaire. Teaching in Higher Education, 15(4), 419-434.
Kansaien Daigaku Hijokin Kumiai. (2005) Daigaku hijokin koushi jiinochousa anketto hokokusho (2002-2003). Available from http://www.hijokin.org/en2003 (accessed November 20, 2005).
Kawanari, Y. (2005, 24 May). Students gone amok with abusive critiques. The Asahi Shimbun. Available at www.asahi.com/english/Herald-asahi/TKY200505240099.html (accessed 24 May 2005).
Kinoshita, T. (2005). Gakusei shokun ni ‘seijitsusa’ o motomu. The Asahi Shimbun, 12 August 2005.
Kitamura, K. (1997). Policy issues in Japanese higher education. Higher Education, 34, 141-150.
Kogakuin University. (2002). Report on the implementation of student evaluation. Available at http://www.kogakuin.ac.jp/fd/questionnaire (accessed May 25 2005).
Lincoln, Y., & Guba, E. (1985). Naturalistic inquiry. Newbury Park: Sage.
Markee, N. (1997). Managing curricular innovation. Cambridge: Cambridge University Press.
Ministry of Education. (2003). Legislation of the national university corporation law. Available at www.mext.go.jp/english/news/2003/07/03120301.htm (accessed 24 May 2005).
Miyoshi, M. (2000). The university and the “global” economy: The cases of the United States and Japan. The South Atlantic Quarterly, 99(4), 668-697.
Moore, S., & Kuol, N. (2005). Students evaluating teachers: exploring the importance of faculty reaction to feedback on teaching. Teaching in Higher Education, 10(1), 57-73.
Murray, H. (1996). Does evaluation of teaching lead to an improvement in teaching? International Journal for Academic Development, 1(1) 8-23.
Nasser, F., & Fresko, B. (2002). Faculty views of student evaluation of college teaching. Assessment and Evaluation in Higher Education. 27(2), 187-196.
Norris, J (2006). The why (and how) of assessing student learning outcomes in college foreign language programs. The Modern Language Journal, 90, 576-583.
Ory, J., & Braskamp, L. (1981). Faculty perceptions of the quality and usefulness of three types of faculty information. Research in Higher Education, 15(3), 271-282.
Pennycook, A. (1990). Comments on Martha C. Pennington and Aileen L. Young’s “Approaches to faculty evaluation for ESL”. TESOL Quarterly, 24(3), 555-559.
Pratt, D. (1997). Reconceptualizing the evaluation of teaching in higher education. Higher Education, 34, 23-44.
Punch K. (1998). Introduction to social research. London: Sage.
Ryan, J., Anderson, J., & Birchler, A. (1980). Student evaluation: the faculty responds. Research in Higher Education, 12(4), 317-333.
Ryan, S. (1998). Student evaluation of teachers. The Language Teacher, 22(9), 9-11.
Schmelkin, S., Spencer, K., & Gellman, E. (1997). Faculty perspectives on course and teacher evaluations. Research in Higher Education, 38(5), 575-593.
Schön, D. (1983). The reflective practitioner: How professionals think in action. New York: Basic Books.
Shimizu, K. (1995). Japanese college student attitudes towards English teachers. The Language Teacher, 19(10), 5-8.
Simpson, P., & Siguaw, J. (2000). Student evaluation of teaching: An exploratory study of the faculty response. Journal of Marketing Education, 22(3), 199-213.
Smith, M., & Carney, R. (1990). Students’ perceptions of the teaching evaluation process. Paper presented at the annual meeting of the American Educational Research Association, Boston, April 1990.
Stake, R. (2000). Case studies. In N. Denzin & Y. Lincoln (Eds.), Handbook of qualitative research (pp.435-454). Thousand Oaks: Sage.
Stronge, J., & Tucker, P. (1999). The politics of teacher evaluation: A case study of new system design and implementation. Journal of Personnel Evaluation in Education, 13(4), 339-359.
Theall, M., & Franklin, J. (1990) student ratings in the context of complex evaluation systems In M. Theall & J. Franklin (Eds.), Student ratings of instruction: issues for improving practice (pp.17-34). San Francisco: Jossey Bass.
Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instruction. In M. Theall, P. Abrami, & L. Mets (Eds.), The student ratings debate: Are they valid? How can we best use them? (pp. 45-56). San Francisco: Jossey Bass.
Tsurata, Y. (2003). Globalisation and Japanese higher education. In R. Goodman, & D. Phillips (Eds.), Can the Japanese change their education system? (pp.119-151). Oxford: Symposium Books.
Wada, M. (2002). Teacher education for curricular innovation. In S. Savignon (Ed.), Interpreting communicative language teaching (pp. 31-41). New Haven: Yale University Press.
Weiner, B. (1992). Human motivation. Thousand Oaks: Sage.
Yamada, R. (2001). University reform in the post-massification era in Japan: analysis of government education policy for the 21st century. Higher Education Policy, 14, 277-291.
Yao, Y., & Grady, M. (2005). How do faculty make formative use of student evaluation feedback?: A multiple case study. Journal of Personnel Evaluation in Education, 18, 107-126.
Appendix 1: The SETs form administered at a former National University in western Japan
- What is your overall evaluation of the class?
- Did you feel the teacher’s enthusiasm towards the lesson?
- Was the textbook, suggested use of reference books, and the distribution of supplementary sources appropriate?
- Was the writing on the blackboard and the use of listening or learning devices appropriate?
- Was the lecture or the explanations easy to hear and understand?
- Was the schedule and the use of class time suitable?
- With regard to reviewing, preparation, homework and assignments, were the instructions appropriate?
- Did you fully involve yourself in reviewing, preparation, homework and assignments?
- After you have taken the class, are you aware of your understanding of the subject having been deepened?
|