![]() |
||||||||||||||
|
|
||||||||||||||
Assessment and improvement of student learning
Author: Peggy Nightingale University of New South Wales Keywords: DEETYA, nursing, assessment, learning, teaching, theology, business, integrated content, authentic assessment, evaluation, criteria, reflection, critical thinking, quality Article style and source: Original ultiBASE publication.
Contents
Assessment: two reports contrasted'The reliance on tacit values - tacit notions of quality - is giving way to calls for transparency of process, for public access to the bases of judgment, for the articulation of criteria,' the [DEETYA] report says. This is the last sentence of a newspaper report which concentrates on the workload associated with assessing students' work and on academic staffs' accounts of being unable to do the job well in the time available to them. The reporter emphasises the lack of briefing and training for examiners, the use of junior staff to cut costs, the use of short answer and multiple choice questions to accommodate larger student numbers, and wide variations in 'standards'.
The DEETYA Evaluations and Investigations Program (EIP) report which prompted this newspaper account (Warren Piper et al., 1996) is a thorough description of current practice in examining students' work. By surveying academic staff, the researchers sought to answer the following questions:
The answers to these questions did include the concerns raised in TheAustralian, but the report demonstrates a high level of commitment to 'getting it right' as well. Nevertheless, the responses of academics show some naivety about assessing and examining. For instance, examiners report that they use criterion-referenced procedures rather than norm-referenced, but that they seldom produce a written statement of criteria. More than one marker is involved in two-thirds of subjects responding to the survey. Under these circumstances:
Examination Practices and Procedures in Australian Universities
reports on courses (programs of study) as well as on subjects. Another
concern which is raised is the lack of coherence within courses. Subjects
seem to be isolated and to be adopting different philosophies of assessment
(see pp151-2 in particular).
Assessment in universities is changing - in its intent and in its methods. Traditional forms of assessment have usually focussed on ranking students according to the knowledge that they gained in a subject or course. Assessment methods were designed to let students demonstrate their knowledge in easily measurable ways so that comparisons between them were facilitated. Students' achievements were viewed in quantitative terms - 'How much do they know?' - and judgments made by assessors were often assumed to be a definitive statement of the student's ability. (Nightingale et al., 1996, p6) These traditional attitudes underpin the calls for 'transparency of process' and 'public access to the bases of judgment, for the articulation of criteria'. While transparency and clearly articulated criteria etc. are indeed desirable, it is the simplistic notions of the nature of higher education and particularly of assessment behind published reports, such as that in The Australian, which need to be challenged. The optimism of the authors of Assessing Learning in Universities is apparent in the next paragraph of their introduction: While much assessment still displays these characteristics, pressure for change has come in at least three areas. The first is a growing desire to broaden university education, to develop - and consequently to assess - a much broader range of student abilities. The second is the desire to harness the full power of assessment and feedback in support of learning. The third arises from the belief that education should lead to a capacity for independent judgment and an ability to evaluate one's own performance - and that these abilities can only be developed through involvement in the assessment process.(Nightingale et al., 1996, p6) Objectivist and constructivist theories of teaching and learningThe contrast is, in essence, between an objectivist theory of teaching and learning and a constructivist perspective, a contrast concisely but elegantly explained by John Biggs in a recent paper in Higher Education (1996). An objectivist theory of teaching and learning rests on an assumed dualism between the knower and the known; hence, knowledge exists somewhere independent of the knower, and understanding is coming to know what already exists. Teaching is the transmission of this knowledge which is decontextualised by testing it independent of particular settings. In contrast, a constructivist view of learning emphasises the learner's role as a maker of meanings. The knowledge or meaning is not 'out there' to be imposed by reality and transmitted by the teacher; 'learners arrive at meaning by actively selecting, and cumulatively constructing, their own knowledge, through both individual and social activity' (Biggs, 1996, p. 348). Working from such a perspective requires an academic to construct assessment tasks which require meaning-making, which set real problems rather than mere exercises, and for which criteria objectively specifying a single correct answer may be impossible. Thus, the goal of highly reliable assessment of student achievement (assessment tasks which yield the same results when repeated or of which different markers will make the same judgment) may need to be sacrificed, at least in part, to attempting to judge whether students have achieved higher order learning. The kind of learning which would allow a student to perform tasks they might have to do in the 'real-world' require assessment via 'authentic' tasks. Some of these may also be 'integrated'; that is, they involve demonstrating a number of skills over a series of steps or activities (an example from nursing appears below). The difficulty, of course, is that with many variables and a range of possible approaches or solutions, it may be hard to guarantee complete reliability. Another goal of assessment is validity; that is, that the task should actually test the abilities it intends to test. For instance, doctors need high level oral communication skills when they are in general practice. It would seem logical for a professional society for general practitioners to test applicants for membership by requiring them to conduct a simulated consultation with a patient in order to assess whether the applicant has the appropriate skills. The difficulty is that research has demonstrated that there is not a strong correlation between performance in different types of simulated consultations - for instance, giving bad news as opposed to managing a confrontational situation (Thomson, 1992), nor does acceptable performance in a simulated situation guarantee performance in actual practice.Authentic or integrated assessmentNevertheless, despite these flaws, it is certainly better to attempt to judge higher order skills through authentic or integrated assessment than via 'objective' testing, for instance, of the multiple-choice type. The following example is an edited version of Case Study 1 in Assessing Learning in Universities, contributed by Ina Te Wiata and Phil Ker, Auckland Institute of Technology (see Te Wiata, 1996, pp16-19 for a detailed description of this assessment strategy).
Establishing criteria for assessmentLeaving authentic assessment tasks aside for a moment, let us consider ways of establishing criteria for assessment which requires some element of subjectivity in making judgments of students' achievement. Either as an assessor or an assessee, most of us will be familiar with essay writing. Mass testing for university admission or scholarships, aptitude tests or whatever requires a high degree of inter-marker reliability. This reliability is achieved by carefully describing what will earn students different grades, briefing the markers, test-marking some sample papers and then proceeding with marking on the basis of holistic judgments. Holistic scoring judges the overall quality of the work and is based on the argument that the quality of a piece of writing is more than the sum of its parts, so the judgment will be to some extent, subjective. However, it need not be unjustifiable. Holistic criteria establish a basis for those judgments; for example, Score of 6: Superior Addresses the question fully and explores the issues thoughtfully Shows substantial depth, fullness, and complexity of thought Demonstrates clear, focused, unified and coherent organisation Is fully developed and detailed Evidences superior control of diction, syntactic variety, and transition; may have a few minor flaws ...Score of 3: Weak May distort or neglect parts of the question May be simplistic or stereotyped in thought May demonstrate problems in organisation May have generalisations without supporting detail or detail without generalisations; may be undeveloped May show patterns of flaws in language, syntax or mechanics (White, 1994, quoted by Nightingale, 1996, p216)Of course, such criteria are further clarified when the particular context and task to be judged are discussed in the markers' meetings and in marking sample essays. As one might have predicted, neither of the DEETYA projects mentioned above discovered in higher education many examples of criteria-setting, marker briefing, trial marking, and double marking as in the mass testing situations described by White (1994). However, Nightingale et al. (1996) offer many examples of carefully developed criteria and thoughtful commentary on them in case studies collected for the book. Case Study 52, 'Using Checklists', was contributed by John Ozolins, Department of Theology and Philosophy, Australian Catholic University. His checklist provides students with a rating from 1-5 and a comment on 15 items under the broad headings of Argument, Structure, and Style. On his actual practice, Ozolins writes: The assignment is assessed by giving each criterion a rating and then adding these numerical ratings to arrive at a rough grade for the assignment. Generally, this method works quite well, though sometimes the result is too high or too low when the assignment is considered globally. It is true to say that it is not the only means of determining the grade; sometimes, usually when a grade is borderline, some adjustment of the grade is made. For example, if a student is on the border between a 'D' and a 'C', other factors may come into play, such as comparison with other similar standard pieces of work by other students. In coming to write some general comments on the essay a global assessment is made and a judgment made whether the numbers accurately reflect the grade the assignment should receive. Mostly I would say that the assessment seems to fit closely the judgment I would give if global assessment only were used, but this may mean that I make the judgment first and allocate the ratings accordingly. I am sure the global assessment and the ratings are linked and it is not easy to say definitively how much they influence each other. (Ozolins, in Nightingale, 1996, p211) This case offers an example of a marker using both analytical (adding
scores on separate criteria) and holistic judgments (see above) which
are both criterion-referenced (establish criteria as above and reward
on the basis of those criteria only) and norm-referenced (determined by
comparing performance with others in group). Perhaps to a purist this
sounds too messy to be acceptable, but it reflects reality and it also
reflects carefully considered professional practice. Another contributor
to the project which resulted in Assessing Learning in Universities,
Iain Hay, described: a retreat from a commitment to attempting
totally criterion-referenced assessment. The reasons are varied and include
its inflexibility and the inability to reward effort and progress rather
than outright achievement; the possible stifling of creativity and experimentation;
difficulties in attaining inter-marker reliability; and losing sight of
the forest in trying to assess each of the trees. Thus, calls for transparency of process and accountability can be met, as long as it is accepted that so-called subjectivity does not necessarily mean people are making unreasoned, unjustifiable or illogical judgments. Biggs and Collis (1982) developed the SOLO (Structure of Observed Learning Outcomes) taxonomy which may be used to define both course objectives and assessment criteria. The taxonomy: provides a systematic way of describing how a learner's performance grows in complexity when mastering many academic tasks. Five levels may be distinguished: Prestructural. The task is not attacked appropriately; the student hasn't understood the point. Unistructural. One or a few aspects of the task are picked up and used (understanding as nominal). Multistructural. Several aspects of the task are learned but are treated separately (understanding as knowing about). Relational. The components are integrated into a coherent whole, with each part contributing to the overall meaning (understanding as appreciating relationships). Extended abstract. The integrated whole at the relational level is reconceptualised at a higher level of abstraction, which enables generalisation to a new topic or area, or is turned reflexively on oneself (understanding as far transfer, and as involving metacognition). (Biggs, 1996, 351-2)Biggs applied the SOLO taxonomy to a psychology unit offered as part of an in-service degree program to school teachers in Hong Kong. For instance, the 'performance of understanding' which would demonstrate the highest level of the taxonomy is as follows: Most desirable (extended abstract): metacognitive understanding, students able to use the taught content in order to reflect on their own teaching, evaluate their decisions made in the classroom in terms of theory, and thereby improve their decision-making and practice. Other outcomes: formulating personal theory of teaching that demonstrably drives decision-making and practice, generating new approaches to teaching on the basis of taught principles and content.(Biggs, 1996, 352) Return to authentic assessmentThe phrase 'performance of understanding' returns us to the notion of authentic assessment. An authentic assessment asks a student to do something he or she might have to do in the 'real world' after graduation; the task requires students to use the subject's content in such a way that they may demonstrate learning at the extended abstract level. That is, they must perform their understanding. For example, the major assessment task in one subject within a postgraduate program for university educators is to identify a problem in teacher and/or student communication within the educator's own class, study the relevant literature, design and execute an intervention, evaluate its success, and report on it with reference to the literature. In evaluating the project report, the teacher looks for the integration of theory and practice, the abilities to reflect and to act, etc using criteria similar to the way Biggs described 'most desirable' performance above. Another example of integrated and authentic assessment is the OSCA or OSCE (Objective Structured Clinical Assessment or Examination) commonly used in health care disciplines. Case Study 9 in Assessing Learning in Universities (Bourgeois and McFarland, University of Western Sydney - Macarthur) describes a five-station OSCA (Ryan, 1996).
This assessment exercise involved over 250 students and 12 staff members. It was conducted at the end of the first year of a nursing degree program, and it was designed to assess a variety of different subject areas. Obviously, it is not the kind of activity which can happen frequently. Nor is it the sort of activity which allows teachers and students to keep track of progress in acquiring the knowledge, skills and attitudes which are simultaneously tested in such a complex sequence of tasks. However, it is through integrated and authentic assessments that we can determine whether higher order objectives of university education are being achieved. The last statement above implies another task for teachers related to assessment, but at the other end of the process: the setting of educational goals for subjects and programs of study. In the case of the OSCA the teachers wished students to develop the following abilities:
They designed an assessment which addressed all of the abilities they set as goals for this part of their program. Appropriate assessmentIt is only when one is clear about what one wishes students to achieve that one can design appropriate assessment activities or evaluate existing ones to see if an element is missing. The latter activity is very important because there is no one factor more likely to undermine the achievement of learning objectives than inappropriate assessment. For instance, in a report on attempts to change first-year accounting students' perceptions of accounting through various innovations in teaching, Friedlan (1995, cited by Mladenovic, work-in-progress) reported changes in desired directions for a number of perceptions but a decrease in perceived importance of oral communication skills despite a substantial emphasis on classroom discussion. Friedlan hypothesised that because students were not rewarded for class participation or assessed in any other way on oral communication skills, they failed to see them as important.Student learning research has repeatedly demonstrated the impact of assessment on students' approaches to learning (see Gibbs 1992a, 1992b; Biggs 1989; Ramsden 1988, 1992). Ask them to understand the physics and chemistry of muscle contraction, but test them on the names of the muscles, and they will 'learn' the names but not be able to explain how contraction happens. Ask students to understand narrative perspective in the novel but test them on the author's background and they will know a lot about the author and little about narrative perspective.(Nightingale and O'Neil, 1994, p149-50) Describing desired learning outcomes, the abilities students are to develop, and assessing them are the two ends of the teaching process. There is a large middle comprising all of the teaching strategies and resources to support students' learning. Mladenovic's point is that students' negative perceptions of accounting will not be changed by piecemeal innovations such as introducing case-based or cooperative learning; the whole system - teaching methods, assessment and curriculum - must address the development of appropriate understanding of the nature of learning and practice in accounting if students' negative perceptions (eg. accounting is equal to book-keeping) are to be changed. A case study in Nightingale and O'Neil (1994) further illustrates the point:
In his recent article, Biggs (1996) refers to achieving congruence between all the elements in course or subject design as 'constructive alignment'. Ramsden (1992) offers a model of student learning in context. Both are emphasising that the whole system must function with coherence, that if one element is incongruent, as in the case above, desirable outcomes may be undermined. Self-examination re student assessmentIn trying to evaluate their practice teachers might ask themselves a series of questions, such as:
ConclusionThe DEETYA report cited by The Australian and quoted at the beginning of this article reveals assessment as a significant problem-area for university teachers. The goal of good assessment practice far transcends simple accountability to the students and the community. The project team at the University of New South Wales which produced Assessing Learning in Universities was convinced by their years as staff developers as well as teachers that trying to improve assessment practice would lead university teachers to consideration of the alignment of curriculum, syllabus, teaching methods, student learning approaches, and assessment so as to develop a consistent, coherent, congruent approach to their work as educators.
ReferencesBiggs, J. 1989, 'Does Learning about Learning Help Teachers with Teaching? Psychology and the Tertiary Teacher.' Inaugural Lecture, 8 Dec 1988. Supplement to the Gazette, University of Hong Kong, Vol.XXXVI, No.1, 20 March 1989.Biggs, J. 1996, 'Enhancing teaching through constructive alignment', Higher Education, 32, 347-364. Biggs, J. and Collis, K. 1982, Evaluating The Quality Of Learning: The SOLO Taxonomy. New York: Academic Press. Coorey, M. 1996, 'Not enough hours: examiners', The Australian,
13 November, p58. Dunlap, L. 1990, 'Language and power: teaching writing to third
world graduate students', in Sanyal, B. (ed), Breaking the Boundaries.
New York: Plenum Press. Friedlan, J. 1995, 'The effects of different teaching approaches on students' perceptions of the skills needed for success in accounting courses and by practicing accountants', Issues in Accounting Education, 10 (1), 47-63. Gibbs, G. 1992a, Improving Student Learning through Excellence in Teaching and Course Design. Bristol: Technical and Education Services. Gibbs, G. 1992b, 'Improving the quality of student learning through course design', in Barnett, R. (ed), Learning to Effect. Buckingham: Society for Research in Education and Open University Press Mladenovic, R. (work-in-progress) An investigation into ways of challenging introductory accounting students' negative perceptions of accounting. University of New South Wales, Faculty of Commerce and Economics. Nightingale, P. 1996, ' Module 8: Communicating', in Nightingale, P. et al., Assessing Learning in Universities. Sydney: University of New South Wales Press. pp203-266. Nightingale, P. and O'Neil, M. 1994, Achieving Quality Learning. London: Kogan Page. Nightingale, P., TeWiata, I., Toohey, S., Ryan, G., Hughes, C., Magin, D. 1996, Assessing Learning in Universities. Sydney: University of New South Wales Press. Ramsden, P. (ed) 1988, Improving Learning: New Perspectives. London: Kogan Page. Ramsden, P. 1992, Learning to Teach in Higher Education. London: Routledge. Ryan, G. 1996, 'Module 2: Solving problems and developing plans', in Nightingale, P. et al., Assessing Learning in Universities. Sydney: University of New South Wales Press. pp39-62. Te Wiata, I. 1996, 'Module 1: Thinking critically and making judgments', in Nightingale, P. et al., Assessing Learning in Universities. Sydney: University of New South Wales Press. pp13-38. Thomson, A.N. 1992, 'Can communication skills be assessed independently of their context?' Medical Education, 26, 364-7. Warren Piper, D., Nulty, D., and O'Grady, G. 1996, Examination Practices and Procedures in Australian Universities. DEETYA, Evaluations and Investigations Program report 96/5. Canberra: AGPS. White, E.M. 1994, Teaching And Assessing Writing (2nd edition, revised and expanded). SanFrancisco: Jossey Bass.
Additional resource American Association for Higher Education (AAHE) Assessment Forum's 9 Principles of Good Practice for Assessing Student Learning presented as an ultiBASE Workshop.
Peggy Nightingale Copyright © Peggy Nightingale, 1996. For uses other than personal research or study, as permitted under the Copyright Laws of your country, permission must be negotiated with the author. Any further publication permitted by the author must include full acknowledgement of first publication in ultiBASE (http://ultibase.rmit.edu.au). Please contact the Editor of ultiBASE for assistance with acknowledgement of subsequent publication. |
||||||||||||||
| Send feedback to
manager@ultibase.rmit.edu.au Copyright © 2001 Faculty of Education Language and Community Services Document URL: http://ultibase.rmit.edu.au/Articles/june97/night1.htm Last Updated: 06-June-1996 by Marita Mueller |
|
|||||||||||||