Language Learning Software Evaluation: Top-down or Bottom-up?

| July 5, 2006
Title
Language Learning Software Evaluation: Top-down or Bottom-up?

Keywords: language learning and teaching, software program, software evaluation, technology, ESL/EFL, CALL (computer-assisted language learning)

Authors
Jinkyu Seam Park

Bio Data
Jinkyu Seam Park is a doctoral candidate in Language Education of Indiana University. He was a former EFL teacher in Seoul and obtained an MA degree in English Education from Korea University and another MA degree in TESOL & Applied Linguistics from Indiana University. His main areas of research include English-Korean bilingualism, second language acquisition and technology-assisted language learning.

Abstract
The dizzy speed of technological development has driven the educational market to pile up a huge number of software programs without any serious methodological concern and consideration of the application to a variety of learners. This paper reviews current software evaluation tools and their problems, followed by further specific discussion of issues related to language learning software programs. Based on the critique of dominant bottom-up approaches of current software evaluation, this paper tries to take a holistic methodological framework into serious consideration and to provide a tentative framework which addresses key areas in evaluating language learning software programs for educators and learners.
[private]

Abstract:

The dizzy speed of technological development has driven the educational market to pile up a huge number of software programs without any serious methodological concern and consideration of the application to a variety of learners. This paper reviews current software evaluation tools and their problems, followed by further specific discussion of issues related to language learning software programs. Based on the critique of dominant bottom-up approaches of current software evaluation, this paper tries to take a holistic methodological framework into serious consideration and to provide a tentative framework which addresses key areas in evaluating language learning software programs for educators and learners.

Keywords: language learning and teaching, software program, software evaluation, technology, ESL/EFL, CALL (computer-assisted language learning)

Introduction
These days, a huge amount of software provides students, teachers and parents with a better chance to have more effective language learning and teaching. However, at the same time, the ever-increasing amount itself increases the burden upon teachers and parents in making good decisions. According to the EPIE (The Educational Products Information Exchange) website1) (1999)2), there are now over 19,000 educational software programs from preschool through college from over 1,300 different publishers, and 316 software programs under the category of ESL (English as a Second Language) and also 1028 under the category of Foreign Language (which is also related to this discussion), out of 11 curriculum (subject) areas.

As the number of software programs increases, the more difficult it becomes for educators and parents to make good decisions because they have few reliable evaluation tools. To make matters worse, the current evaluation tools can hardly be trusted because of lack of empirical evidence from research on the effectiveness of software programs (Gill, Dick, Reiser, & Zahner, 1992; Heller, 1991; Jolicoeur & Berger, 1988), lack of triangulation (Smith & Keep, 1988), lack of validity and reliability or subjectivity (Jolicoeur & Berger, 1986, 1988; Gonce-Winder & Walbesser, 1987; Heller, 1991, Burston, 2003, among others), lack of consideration of educational context (Cheah & Cheung, 1999; Murray & Barnes, 1998; Smith & Keep, 1988), and lack of consideration of learning theories and language teaching methodology (Hubbard, 1992; Burston, 2003).

This paper reviews current software evaluation tools and their problems mostly from bottom-up approach, followed by further specific discussion of issues related to language learning software programs, especially focused on learning and teaching in ESL/EFL settings. This paper proposes a tentative top-down framework which addresses key areas of language learning software evaluation.

Problems in Educational Software Evaluation
About two decades ago, Jolicoeur and Berger (1986) pointed out the lack of research on the effectiveness of software programs for microcomputers. They analyzed EPIE’s software evaluation protocol which was revised based on EPIE’s textbook evaluation protocol. They also analyzed the Microsift Courseware Evaluation report, which provided similar information and so was assumed to have inter-rater reliability. However, Jolicoeur and Berger found a huge gap, indicating that their evaluations are “primarily based on the reviewers’ subjective opinions rather than on operationally defined variables” (p. 8). They also pointed out that these judgments go with Likert-like scales that tend to be subjective, which are in return the targets of their criticism. A brief description of The Educational Software Selector (TESS) will be useful to understand this problem.

Since 1983, TESS has provided “objective, high-quality information” and “the most comprehensive source of unbiased educational software information available for educators, teachers and parents” (EPIE, 1999) although it is hardly available for parents because software is usually provided for schools and too expensive for individual purchase. TESS used the following criteria: “subject, learning approach, grade level, computer platform, pricing, contact information for publisher, and evaluation citations from leading educational journals, and state evaluation agencies”. According to Kenneth Komoski (EPIE, 1995), a director of EPIE Institute, TESS database contains “11,000 citations of product reviews and evaluations from over 50 review/evaluation sources”.

With all the information in TESS database, EPIE software evaluation protocol, along with Microsift’s ratings, has little reliability and validity, according to Jolicoeur and Berger (1986), because of unclear definitions of content, instructional, and technical characteristics. Jolicoeur and Berger express their concern that educators may interpret the evaluations differently from the raters’ intentions. They also raise the issue of construct validity, both in terms of convergent and discriminating validity, which are very weak in the evaluation protocols. They point out the poor inter-rater reliability: “striking lack of agreement between EPIE and Microsift concerning the quality of specific educational software” (p. 9).

Many researchers discuss the lack of expertise in software evaluation (Heller, 1991; Burston, 2003, among others). Heller (1991) points out the lack of expertise and experience of the reviewers because evaluation is often conducted by a variety of individuals and groups from students to experts. This suggests that lack of expertise in evaluating software programs results in lack of inter-rater reliability and subsequently leads evaluators to rely on their subjective set of personal rules. This is why often times the “wow” factor is dominant in evaluating software (Murray & Barnes, 1998). Heller classifies the evaluation into three types: formal, informal evaluation and field trials, suggesting that field trials are very important and should be included in the evaluation tools. As Heller implies, empirical research available on the specific factors makes educational software effective (Jolicoeur & Berger, 1988) and can be one of the solid guidelines to select a software program. She also says that general evaluations are different according to detail, focus and length, and the wide variety of evaluations is a source of problems.

Heller raises the issues relative to time and effort spent by teachers for the evaluation. This time issue is two-fold: one is related to the time for evaluation and the other is to actual testing. Teachers have little time to use even the given evaluation tools to select the program they are going to use. Teachers indicate that they would use the evaluation models currently available if they had the time to do so (Reiser & Dick, 1990). Although empirical evidence from field tests and students’ evaluations (Gill, Dick, Reiser, & Zahner, 1992) is essential to judge the effectiveness of the program, the large number of software programs, in this case initially unscreened ones, makes teachers feel frustrated and again have little time to do these kinds of software evaluations.

Context is another issue in the evaluation of software programs. Because formal education mostly takes place “with an extremely complex social environment” (Smith & Keep, 1988), context is one of the key elements in educational software evaluation. Murray and Barnes (1998) point out the importance of “target language context” which is closely related to authenticity of linguistic input while Cheah and Cheung (1999) raise the issue of “cultural context” or language learning context including the digital divide.

All of these problems mentioned above are related to triangulation that Smith and Keep (1988) tried to achieve in their case studies. They view the evaluation process “in terms of the uses to which the resulting information will be put,” which implies “careful attention to the purposes of evaluation” (p. 152). Their point is that the effective use of the program rarely depends on the quality of the software itself but rather on the teachers who use the program, as they say:

Programs which are, from the points of view of the computer scientists or the curriculum designer, quite poor, can still give rise to brilliantly productive classroom activities, whilst the very best materials can be subverted by inept or unimaginative use. (pp. 153-154)

Thus teachers’ responsibility, and also students’ autonomy, is part of what evaluators should keep in mind while evaluating software programs.

Language Learning Software
As mentioned earlier, out of over 19,000 educational software programs, 316 software programs are under the category of ESL (English as a Second Language) and also 1028 under the category of Foreign Language. Compared with the total number of educational software programs, the percentage of the language learning software programs, both ESL and Foreign Language, is only 7.07, and that of ESL programs is 1.67. Although the evaluation is closely related to the quality rather than the quantity, it is very disappointing when we think of the details. The number 316 still means that there are a large number of ESL programs, but how many programs are available to students in each grade, in each level of proficiency, or in each skill? It is also very disappointing when we think about the variety of programs “intended for standard usage, exploitation over the web, tutorial in nature, collaborative, facilitative, etc.” (Burston, 2003, p. 36).

When language educators, especially ESL/EFL teachers, select language learning programs, they just follow their subjective judgments based on their own teaching and learning experience, along with their computer literacy. More disappointingly, most EFL teachers have no choice but to rely on language learning software programs developed in foreign countries, especially English speaking countries (Cheah & Cheung, 1999, p. 4). This reality leads the EFL/ESL teachers to rely on both their own subjective judgments and ready-made evaluation tools, although they “are thought to be the most appropriate software reviewers” (Heller, 1991, p. 286). This problem is regarded as lack of consideration of context, which might make it hard for teachers to escape from “the ‘wow’ factor” rather than adhere to “the fundamental rules” of evaluating software objectively (Murray & Barnes, 1998).

Burston (2003) discusses the problems relative to this lack of expertise more comprehensively. Second language teachers are required to have a certain level of technological competence in evaluating software, “in addition to content area and methodological expertise” (p. 35) in order to be ready to evaluate software programs. He adds teachers’ lack of critical perspectives, along with lack of formal training. Also, time lag is another possible problem for teachers to catch up with, not to mention rapid technological change.

Hubbard (1992) is one of the first researchers who raised the methodological issues. He points out that most language learning software programs do not provide users or learners with the theoretical and methodological backgrounds and few researchers raise the methodological issues backgrounded in the software. The theoretical and methodological backgrounds do not seem be considered to be important for publishers and users, and even evaluators, but they are very important for parents and teachers who want their children to learn a language more effectively. Hubbard (1992) summarizes the research in the 1980’s: the research in the 1980’s argued that drill-and-practice software, often based on behaviorist learning principles with the methodology of the 1960’s, is still dominant, at least until the end of 1980’s.

Suggestions to These Problems
Most of the evaluators suggested specific solutions to these problems in software evaluation tools. Although still subjective in most cases, they try to include three major components: content, instructional characteristics, and technical characteristics (Heller, 1991). Some of them, for example, Jolicoeur and Berger (1988), emphasize psychometric standards with “operationally defined variables” while others put an additional emphasis on field tests (Callison and Haycock, 1988; Reiser & Dick, 1990; Gill, Dick, Reiser, & Zahner, 1992) or in-service teacher education (Smith & Keep, 1988).

One of the most comprehensive lists produced by the Educational Software Evaluation Consortium shows this effort to include all the three major components. They include the 22 most common criteria collected from 16 out of 28 members of the Consortium. Instead of the technology aspect of the software, the list focuses on the content and pedagogy of the material, although it still contains ambiguous concepts such as ‘user friendly’ and ‘feedback’ (for detailed discussion about ‘user-friendly,’ Cheah & Cheung, 1999; and for ‘feedback,’ Taylor, 1987), and hands-on features rather than the effect of students’ learning. The rank order of the criteria is as follows:

1. Correctness of Content Presentation. 2. Content Presentation
3. Use of Technology 4. Integration into Classroom Use
5. Ease of Use 6. Curriculum Congruence
7.5. Interaction 7.5. Content Sequence/Levels
9. Reliability 10. Use Control of Program
11. Feedback (general) 12. Objectives
13. Motivation 14. Branching
15. Negative feedback/Help 16. Content Modification
17. Content Bias 18. Teacher Documentation
19. User Support Materials 20. Color, Sound, Graphics, Animation
21. Screen Displays 22. Management System

(Bitter & Wighton, 1987, pp. 8-9)


Noticing the time and effort limits, Taylor (1987) proposes shorter and easier guidelines, three main and two additional, focusing on objectives, feedback and learner control. He also provides recommendations for the application of the guidelines:

1. Are objectives provided and are they stated in terms of what the learner should be able to do following the instruction?
2. Is immediate, motivational feedback provided for correct responses? Is informative feedback that is provided for incorrect responses oriented toward the learner’s specific errors?
3. Is the method of control used appropriate to the type of learning task and student? For learner controlled sequence and strategy options, is advertisement provided?
4. Is the content accurate?
5. Is the pedagogical approach in consonance with the one adopted by the school system?
(pp. 239-240)

Hubbard’s (1992) framework is very comprehensive and systematic with a specific focus of language learning and teaching from a different perspective from the other educational software evaluation tools. However, he does not provide specific guidelines to evaluate language learning software programs, which means the framework is ideal rather than practical. His framework for CALL (Computer-Assisted Language Learning) is intended as a general guideline for courseware development, and hopefully for publishers. He includes the fundamental components of language teaching methods: Approach, Design, andProcedure, based on Richards and Rodgers’s (1982, 1986) model. The details of the Design part in his framework are mostly from Philips (1985). He further argues that the considerations are relevant to the developer and ought to be addressed, regardless of situations.

Although it does not include all the components of Hubbard’s framework and is also open to the criticism of subjective judgments, EPIE (Educational Products Information Exchange) Institute’s list is more general and systematic than the others:

Instructional Design Software Design
Goals and objectives Technical Quality
Content Graphics/Audio Quality
Methods/Approach User Control/Interactivity
Documentation/Support Materials Branching
Evaluation/Tests Management/Record keeping

(EPIE Institute, 1985, cited in Jolicoeur & Berger, 1986)

However, it is hard to find any evidence that they include any methodological consideration in their actual reviews. For example, the 1993 edition of TESS includes 9 to 10 program reviews in a page and ESL program reviews occupy only 2 and a half pages out of 186 in total. The list of reviews is more like a catalogue of a number of programs, which seem to be the advertisement of those listed programs. One of the few examples of ESL program reviews is as follows:

4) English Express
EPIE Id: 060004
Types: Tutorial, educational game
Grades: 5-12
Uses: School for regular curriculum, remedial, special education.
Grouping: Individual
Description: Combines interactive software with speech, visuals, sound, and text for ESL students. Offers students opportunities for listening, speaking, reading, and writing English. Includes 60 categories of words, 1,500 color images, and teacher training materials. Contact supplier for price. Needs sound digitizer.
Components: Ditto masters and more.
Configuration: Macintosh 512E, 1MB, hard disk.
Availability: E. David & Associates. (EPIE, 1993, p. 38)

 

Burston (2003) suggested a more comprehensive and clearer framework based on Hubbard’s Framework with four generic parameters:

1. Technical features
2. Activities (procedure)
3. Teacher fit (approach)
4. Leaner fit (design).

He implies that the CALICO Review is an ideal, different from other “discursive descriptions of software with little serious evaluation” (p. 35). The CALICO Review does have evidence to consider methodology mentioned by Burston. All the reviews available on the website3) are by experts in the fields. However, this review is also intrinsically limited, Burston points out, because “no matter how comprehensive a review is, actual software selection can only be made on the basis of teachers’ own local assessment, relative to their own particular curricular needs” (p.35), which in turn is closely related to all the problems, especially to lack of expertise.

Burston (2003) also suggests that the easiest way to select software is to select bundled software: “an integrated software/textbook package” because “[t]he content is age and language-learning level appropriate, the methodology is sound, and total curricular integration is assured” (p.36). However, this suggestion can have two limitations: first, it is hard to catch up with the rapid change because the software is only based on the current textbooks and not easy to fully update and adapt newer changes of technological innovations; second, it is limited in terms of authenticity because most textbooks are adapted and hard to catch authentic situations in real life.

Arguing that “any language learning software embodies basic principles of language teaching and learning” (p. 251), Murray and Barnes (1998) propose a check list that can help teachers “decide if designers of the software have implemented sound language teaching and learning approaches” (p. 251):

*Does the software incorporate manageable and meaningful input?
*How is new language introduced? Is sufficient (optional) practice possible before learners produce language?
*How does the software use the writing medium?
*Does the software attempt to create a target language context?
*Does the software perpetuate cultural stereotypes?
*How authentic and accurate is the target language used?
*Does the software incorporate suitable language learning activities?
*How practical is integration of the software into classroom context?
*How well does the software match pupils’ expectation and the needs of the course?
*Does the software cater for all learners?
*What form of assessment, learner feedback or profiling is provided?
*Is the multimedia dimension exploited with regard to grammar and language patterns?
*How are language items presented on screen to the learner?
*How clear are the instructions for users?
*What support for teachers is provided? (pp. 251-252)

 

Another new evaluation framework used as the criteria for the final round of the 2002 University of Minnesota Learning Software Design Competition shows a new list of focuses on software evaluation. Roblyer, Vye, and Wilson (2002) reviewed the software entries according to the following three criteria:

1. Does he project promote learning?
2. How well has the project been developed in terms of commonly recognized design techniques?
3. Is the product innovative enough to advance our understanding of how technology can be used effectively in education and training? (Roblyer, Vye, and Wilson, 2002, p. 7)
Although many of the trends they mention are on superficial aspects of the software, they also discuss the future of learning software:

Internet delivery or Internet enhancement
Emphasis on visual and three-dimensional problem-solving environment
Availability of visualization/modeling software
“Rich” learning environments
More apparent relative advantage (p. 9)

A couple of important things they emphasize are the use of the Internet, three-dimensional problem-solving environments, scaffolding and “more interactive, hands-on, constructivist learning” (p. 9). These are also examples that show the current trends in language learning software, along with voice recognition technology.

A New Consortium for the Future Software Evaluation
Many researchers imply that the most appropriate software reviewers could be classroom teachers. This view is potentially dangerous because few teachers have expertise in the evaluation of educational software. Before being too late, we need to have a comprehensive set of guidelines, including all the members in the fields of education. We need to have an organic cooperation between groups and include observational perspectives in the evaluation from parents, teachers, students, administrators, software designers (or publishers), and educational researchers.

However, due to the increasing burden of evaluation work, we need to think about “division of work” because the evaluation work is overwhelming to a group of evaluators, say, teachers. It is a matter of role division. We need a comprehensive framework for the actual groups participating in the evaluation: software developers, teachers, and experts. Also, both top-down and bottom-up approaches should be included in the evaluation because a top-down approach is often ignored by classroom teachers especially when it is forced. The bottom-up approach sometimes lacks a more holistic point of view to guide educators to a common goal. Although they are important and valuable, most of evaluations are based on the bottom-up approach, as Hubbard (1992) points out. We need to adopt a comprehensive framework to guide us to a common goal: “a better way to teach” (Roblyer, Vye, & Wilson, 2002) and to learn as well.

With this perspective, a new framework is made to cope with the current problems in language learning software evaluation, and hopefully it could be applied to the evaluation of other educational software programs. At least three important groups of the educational world should participate in the evaluation of language learning software programs: educators, publishers, and researchers. Students should be included because they are in the very center of the discussion. However, although they are the focus of the evaluation work and their work and response should be included in the process, the framework does not include the students as evaluators because they are not those who have power to decide the software as a distinctive group. I exclude the case of individual use of software.

The first step to make a tentative framework is to identify the problems raised by researchers because identifying problems often means searching for solutions. As seen above, six initial problems have been identified:

1. Lack of validity and reliability or subjectivity
2. Lack of research support or triangulation
3. Lack of consideration of educational context
4. Lack of consideration of learning theories and language teaching methodology
5. Lack of critical perspectives
6. Time lag

All of these problems should be addressed before we start making an evaluation tool or an evaluation framework. With all these considerations, one more can be added for the consideration of a better and more comprehensive framework:

7. Technological washback

A serious problem involves our daily attitude toward technology. We often ask ourselves or others if we should use technology or educational software for students. However, we rarely ask the question seriously from a critical point of view. Rather, we tend to change our teaching methodology according to the change in educational technology or by the features of software without serious consideration of its “real” effect, which I call here “technological washback” or “washback by technology”.

To sum up, in order to avoid a huge hasty optimistic wave that forces us to use technology, in this case, educational software, the first and most important question every evaluator should ask is: Do we have to use the software program or even technology for our students? Along with this question, the important thing to remember is the fact that the students are in the very center of this evaluation and the most important component of evaluation is the evaluation of outcomes as a continuing/recurring and dynamic process (Castellan, 1993). In any case, “pedagogy must drive technology and not vice versa” (Burston, 2003, p. 35).

Based on the problems listed above, five questions can be asked initially:

*Who evaluates language learning software? (experts, publishers, teachers)
*What do we evaluate? (content including all methodological issues)
*How can we evaluate? (method or framework)
*When do we evaluate language learning software? (context and time: before or after using it)
*Why do we evaluate language learning software? (rationale)

Based on these questions and several bottom-up approaches mentioned above, a comprehensive set of top-down framework can be proposed to meet the new needs of software evaluation. Here is a tentative concept map of the evaluation consortium:

A New Consortium of Educational (Language Learning) Software Evaluation

A. Teacher’s roles (including parents and administrators)
1. checking necessity of software use
2. considering contexts and environments
3. evaluating software programs with updated tools
4. classroom teaching or field trials
5. reflection on interaction and effects
6. professional development

B. Researchers’ roles (teacher educators’)

1. preparing methodological guidelines
2. evaluating the outcomes of educational software
3. providing psychometric standards and content information
4. collecting data from field tests
5. providing updated comprehensive guidelines
6. providing the opportunities of professional development

C. Publishers’ roles (including software designers)

1. collecting the information of teaching methodology from researchers
2. collecting evaluation tools and research results
3. collecting all the content components of the software
4. frequent update and revision
5. searching for other supporting ways (e.g., Internet)
6. collecting feedback from teachers and researchers before and after publishing
Although the biggest burden of software selection is on teachers, all of the three groups are required to work together to achieve “triangulation” of the evaluation. First of all, teachers are required to think about more effective ways of teaching and learning. What are the advantages or disadvantages of using language learning software programs? Authenticity (but with limited contexts) and a variety of input (but with limited quality of output) are among those advantages (and disadvantages). Are they more effective than the other options in terms of time and efforts spent by teachers and students, and in terms of educational effect?Also, teachers are required to think about disadvantages of the programs they use in teaching and learning. How can learner’s autonomy be achieved? Under too optimistic an umbrella of current CALL, teachers often forget their own active role as a teacher. Then, what is the teacher’s role or learner’s role in using language learning software programs? This consideration should be considered in advance and incorporated in their syllabus or lesson plans. Teachers are required to update their knowledge to catch up with the speed of change in technology through professional development to meet their students’ expectations and ever-updating needs.

Researchers are expected to actively participate in this recurring evaluation process. Although there is no agreement among educational researchers as to what is the best methodological framework for CALL, experts should be the source of methodological framework in CALL, as Hubbard (1992) suggests. Also, collecting research data for better guidelines of evaluations and teacher education is essential. A comprehensive set of guidelines for both publishers and educators should be prepared by experts based on psychometric standards (Reiser & Dick, 1990) and evidence from field tests.

Lastly, publishers are required to collect all kinds of data to develop better programs before they start their work on the software. They are required to include all the important components listed in the evaluation tools, which are supposed to be based on methodological framework and the field tests of existing software. They are also required to include educational requirements and students’ and teachers’ needs, along with teachers’ feedback of trial or evaluation versions of the software. Moreover, they are required to follow the guidelines provided by experts, which could be a set of requirements prescribed by the administrators or authorities concerned.

Conclusion
As technology develops, a lot of the burden has been on language teachers because many of the software developers assume that their software programs can replace the classroom teachers, especially in EFL settings. However, these days, people have become aware of the fact that teachers cannot be replaced by technology, and rather their role in the technology-supported educational fields is considered to be more important than ever before, against a huge wave of language learning software.

In this new position-taking as an active participant, teachers are required to cooperate with other educational stakeholders, such as researchers, publishers and possibly students in software evaluation. They are also required to stand by themselves against the extreme assumptions embedded in CALL: ‘technology can solve the educational problems’, trying their best to be ready to be an expert as a competent evaluator, from both top-down and bottom-up perspectives of software evaluation.

References
Bitter, G. G., & Wighton, D. (1987). The most important criteria used by the Educational Software Evaluation Consortium. The Computing Teacher, 14(6), 7-9.

Burston, J. (2003). Software selection: A primer on sources and evaluation. CALICO Journal, 21(1), 29-40.

Callison, D. & Haycock, G. (1988). A methodology for student evaluation of educational microcomputer software.Educational Technology, 28(1), 25-32.

Castellan, N. J. (1993). Evaluating information technology in teaching and learning. Behavior Research Methods, Instruments, & Computers, 25(2), 25-32.

Cheah, Y. M., & Cheung, W. S. (1999). What do you mean by “user-friendly”?: Pre-service teachers evaluate language software. ERIC Number ED427538.
http://www.eric.ed.gov/ERICDocs/data/ericdocs2/content_
storage_01/0000000b/80/11/52/40.pdf.

EPIE Institute. (1993, 1995), TESS: The Educational Software Selector. New York: EPIE Institute and Teachers College Press.

Gill, B., Dick, W. A., Reiser, R., & Zahner, J. E. (1992). A new model for evaluating instructional software. Educational Technology, 32(3), 39-44.

Gonce-Winder, C., & Walbesser, H. (1987). Toward quality software. Contemporary Educational Psychology, 12, 261-268.

Heller, R. S. (1991). Evaluating software: A review of the options. Computer and Education, 17(4), 285-291.

Hubbard, P. (1992). A methodological framework for CALL courseware development. In M. C. Pennington & V. Stevens (Eds.), Computers in Applied Linguistics, 39-65. Mutilingual Matters.

Jolicoeur, K., & Berger, D. E. (1986). Do We Really Know What Makes Educational Software Effective? A Call for Empirical Research on Effectiveness. Educational Technology, 26(12), 7-11.

Jolicoeur, K., & Berger, D. E. (1988). Implementing Educational Software and Evaluating Its Academic Effectiveness: Part I. Educational Technology, 28(9), 7-13.

Murray, L., & Barnes, A. (1998). Beyond the “wow” factor – evaluating multimedia language learning software from a pedagogical viewpoint. System, 26, 249-259.

Philips, M. (1985). Logical possibilities and classroom scenarios for the development of CALL. In C. Brumfit, M. Philips, & P. Skehan (eds.), Computers in English language teaching. New York: Pergamon.

Reiser, R. A., & Dick, W. (1990). Evaluating instructional software. Educational Technology Research and Development, 38(3), 43-50.

Richards, J., & Rodgers, T. (1982). Method: approach, design, procedure. TESOL Quarterly, 16, 153-68.

Richards, J., & Rodgers, T. (1986). Approaches and Methods in Language Teaching. Cambridge: Cambridge University Press.

Roblyer, M. D., Vye, N., & Wilson, B. G. (2002). Effective educational software, now and in the future. Educational Technology, 42(5), 7-10.

Smith, D., & Keep, R. (1988). Eternal triangulation: Case studies in the evaluation of educational software by classroom-based teacher groups. Computer Education, 12(1), 151-156.

Taylor, R. (1987). Selecting effective courseware: Three fundamental instructional factors. Contemporary Educational Psychology, 12, 231-243

[/private]

Category: Monthly Editions, Volume 13