International Education Journal

welcome

contents

Back to Contents

download

Download
Article

Acrobat Reader

Download
Acrobat Reader

 

The Course Experience Questionnaire as an Institutional Performance Indicator

David D Curtis and John P Keeves
Flinders University of South Australia
david.curtis@flinders.edu.au

Abstract

Data from the 1996 Course Experience Questionnaire (CEQ) were analysed using the Rasch measurement model. This analysis indicates that 17 of the 25 CEQ items fit a unitary scale that measures course quality as perceived by graduates. Graduates are located on the interval measurement scale produced in the Rasch analysis. The interval nature of the scale renders the graduates’ scores amenable to analyses that are not wisely employed using ordered raw CEQ scores. Analysis of variance indicates that variations in graduates’ responses are attributable to field of study and institutional factors. In order to compare universities, corrections are made for the course mix of each institution to produce expected institutional scores. These are compared with observed institutional scores to determine those universities that have performed above, at, or below expectation. (Individual institutions are not identified in this analysis).

Important issues relating to the educational and statistical significance of the findings have emerged. The data collected through the CEQ do not represent a simple random sample of all graduates. Instead, the data model is a hierarchical one, with individual graduates nested within courses, which are nested within institutions. This requires analysis using multilevel analytical tools. Conventional analyses substantially underestimate the standard errors of aggregated measures (such as institutional means) and therefore report institutional differences as significant when they are not. The implications of the measurement and analytical problems for policy decisions over the distribution of funding among institutions and among courses within institutions are discussed.

Abstract

The Course Experience Questionnaire

Approaches to measurement

Analyses of the 1996 CEQ

Institutional comparisons

The hierarchical nature of the population

Summary and conclusion

References

Appendix 1: Items used in the 1996 CEQ

 
The Course Experience Questionnaire

top

The Course Experience Questionnaire (CEQ) is a survey instrument of 25 items that is posted to recent graduates of all Australian universities. It seeks to establish graduates’ perceptions of the quality of the courses that they have completed and its results are used to compare courses and institutions. Twenty-four items are statements representing views about five main aspects (clear goals, good teaching, appropriate assessment, generic skills, and appropriate workload) of the courses that graduates have recently completed. There is also a single summary item. Graduates are asked to express their opinions about their courses by indicating the extent of their agreement with these propositions by selecting one of the five response options from strongly disagree to strongly agree.

In 1996, the CEQ was distributed to 137,603 graduates of whom 93,967 replied . For the purposes of calibration, we used only those responses from graduates of undergraduate programs, and then we used only those who had completed all 25 CEQ items. This left us with the returns of 51,631 individuals.

We are concerned that the methods commonly employed for the analysis of CEQ data are less than optimal. In the methods of analysis being used for the CEQ data, each item response option is coded as: strongly disagree, -100; disagree, -50; neutral, 0; agree, +50; and strongly agree, +100. These coded response options are subject to conventional statistical analyses with means and standard deviations being computed for items and for sub-sets of items and are the basis of comparisons between courses. An assumption implicit in these forms of analysis is that the data are interval and that a particular response option indicates the same level on the underlying trait for all items. Our concern is that graduates’ responses are ordinal and therefore should be analysed differently. The current analyses produce useful information and the large numbers of cases involved in the survey probably mean that alternative analyses may not produce a substantially different picture. However, if policy decisions are to be based on the results of the survey or if the survey instrument is to be modified over time while still permitting comparisons, we contend that the alternative and superior analytical techniques that are available should be used.

The short-comings of present analytical methods have been identified . They employed an unnecessarily complex method of analysis to generate indicators of institutional performance, but they were obliged to use the raw ordinal data of the CEQ. Below, we demonstrate that the Rasch measurement model produces an interval measure that can be manipulated much more readily. Locating items on an interval scale permits items to be substituted over time and thus permits comparisons over time. We go on to show that CEQ data are only adequately represented using a hierarchical data model and that this should be acknowledged in analyses of the data.

Approaches to measurement

top

Measurement is conducted to describe phenomena with a greater degree of precision than mere indicators can provide and to facilitate comparisons over time or between cases. If the CEQ is to be used as a measure of perceived course quality, and if that measure of quality is to be compared over time or across institutions or courses, then it must acknowledge the assumptions that underpin measurement and conform to accepted criteria for measurement.

Weiss and Yoes identify four requirements made of measures. They are:

  • if a respondent holds a certain attitude, s/he will respond honestly to an item which taps that attitude;
  • the choices that respondents make among response options indicate the strength of the underlying attitude trait that they hold;
  • the responses that participants make to particular items are not influenced by the presence of other items in the instrument (local independence);
  • the pattern of responses to items will conform to a probability function.

The first three of these requirements have long been characteristic of measurement. However, the fourth requirement reflects the need to understand the basis of survey measurement and provides a foundation for estimating the precision and reliability of the measures that are derived from such observation.

Wright and Masters relate seven criteria for true measurement. They are:

  • each item should function as intended;
  • each item can be positioned on a common scale;
  • the scale should be an interval one;
  • each person can be located along the same common scale used for items;
  • the responses should form a valid response pattern for each item;
  • estimates of precision must be available for all scale measures; and
  • each item should retain its meaning and function across individuals and groups.

While these requirements and criteria apply to all genuine measurement, in the Rasch measurement model, these requirements have particular salience. In most forms of measurement it is difficult to be sure that all items contribute to a common scale. Some of the methods for ensuring this lack sensitivity, and this criterion has not been as rigorously enforced as it might have been. The use of a common scale is important since it may be necessary to use alternative items in parallel versions of an instrument, especially if it must be administered to a group on several occasions, and different items must be comparable on that measurement scale. It is also desirable that the scale be at least an interval one so that differences among subjects can be compared meaningfully and change over time validly assessed.

The Rasch measurement model meets these criteria for measurement. It requires a particular response probability function form that for each item depends upon only two variables: the strength of the individual’s affect on the trait being measured and the trait threshold required to accept that item. This gives the Rasch model some unique measurement characteristics . Items that are influenced significantly by external factors will not fit the function. Those that do fit reveal a threshold (or series of thresholds for ordered response items) and they can be used to place individuals on the scale for the underlying trait.

Analyses of the 1996 CEQ

top

Both exploratory and confirmatory factor analyses have been undertaken to examine the factor structure of the CEQ items. Data have also been analysed using the Rasch measurement model. This has enabled us to identify items that fit a common measurement scale, compute item thresholds, and compute scale scores for individuals on the underlying trait.

We do not report detailed results of these analyses since they are reported in detail in Curtis and they are similar to those reported in Waugh . In summary, from the exploratory factor analysis, we found that there is evidence for five factors that do correspond with the five sub-scales identified in the CEQ. The confirmatory factor analysis indicates that there is a single underlying factor and that the five separate factors are nested within it. We take this to indicate that there is an underlying ‘perception of course quality’ factor and that it is expressed to varying extents in the five components that have been identified as elements of course quality.

Rasch analysis indicates that eight of the 25 items misfit the measurement model, but that the 17 remaining items do form a coherent measure. The items that were used in the 1996 CEQ are shown in Appendix 1. Table 1 shows fitting and non-fitting items by their CEQ sub-scale. This suggests that the overall perception of quality is most strongly influenced by three of the five factors &endash; good teaching, generic skills, and clear goals.

 

Table 1: The CEQ sub-scale structure and Rasch item fits

 

GTS
GSS
CGS
AAS
AWS
OA

Fitting items

7
15
17
18
20
2
5
10
11
22

1
6
13
24



12
19




14
 

Non fitting items

3
9
 
8
16
4
21
23
25
From the Rasch calibration, individuals were assigned a score on the scale formed for the underlying trait from the 17 fitting items. The interval scale that results from Rasch scaling is measured in logits. These units result from the probability function used to model responses. As an interval scale it has an arbitrary origin (like commonly used temperature scales) but, since it has units that arise from the logistic model, it can be linearly transformed into a scale with a more obvious meaning. We have transformed the scale to a mean of 500 with each logit being re-scaled to 100 units. This scale has been tentatively called the Graduate Satisfaction Index (GSI) and we have used this as the basis of further analyses.

Institutional comparisons

top

Since the GSI is an interval scale, we could compare institutions by simply taking the mean GSI of all graduates for each institution. However, institutions are far from homogeneous and vary on several dimensions: some have a high proportion of graduates from technology and applied science courses, others have a greater proportion of graduates from the humanities and social sciences. In addition, it seems that institutions differ on individual characteristics such as age and gender of graduates. If, as it seems from existing analyses of CEQ data , there are differences on CEQ scores among course types, such differences should be taken into account before any attempt is made to compare institutions on CEQ scores. This was the primary purpose of the analyses undertaken by Karmel at al. .

Influence of course type

In order to ascertain whether there are differences among course types, we undertook a two way analysis of variance using individual GSI scores as the criterion measure and both course type and institution as categorical variables. We classified courses into nine broad fields of study: Agricultural Sciences; Architecture; Humanities and Social Sciences; Business; Education; Engineering; Medical Sciences; Law; and Mathematics and Science. Karmel et al. used ten categories in their analysis. They separated Veterinary Sciences which we included under Agricultural Sciences. We are quite sure that within these categories there are substantial differences between individual courses. For example, within the Medical Sciences we expect differences to be apparent between MB, BS awards, the various nursing awards, and other courses such as medical radiations, physiotherapy, and speech pathology. Our purpose was to test whether there were differences among broad course types. For other purposes a finer grained analysis would be warranted.

The analysis of variance indicated strong main effects for broad field of study and institution and some interactions. In Table 2 the national mean GSI score for each of the nine broad fields of study are shown.

 

Table 2: National GSI means by Broad Field of Study

AgSci
Arch
HSS
Bus
Educ
Eng
Med
Law
MaSci
N
766
1004
12196
10332
6172
3257
8145
1894
7725
Mean
511.50
480.90
527.06
485.71
507.08
476.14
483.09
493.80
502.30
St Dev
89.26
82.34
104.02
75.11
92.87
73.21
81.28
89.84
83.70
Institutional GSI scores

Since there are differences among broad fields of study, it is apparent that simply taking the mean GSI for all graduates of an institution would bias the institutional score in favour of those universities with high proportions of humanities graduates and against those with high proportions of engineering graduates. Thus we employed a method of correcting for the course mix of institutions in developing an institutional GSI score. We took the national average GSI score for each broad field of study and the proportion of graduates in that broad field of study to produce a weighted expected mean GSI for each institution. This is the expected institutional mean GSI score, assuming that its courses are performing at the national average. We then computed the actual GSI mean for each institution as the simple mean GSI for all graduates and tabulated the difference between actual and expected GSI means. These data are shown in rank order in Table 3. Note that we have used the 57 separate institutional codes from the raw CEQ data and that individual institutions are not identified. The data are also shown graphically in Figure 1.

 

Table 3: Actual &endash; Expected mean GSI scores for institutions in rank order

Rank
Actual-Expected GSI
Rank
Actual-Expected GSI
Rank
Actual-Expected GSI
Rank
Actual-Expected GSI
1
124.13
16
12.67
31
0.14
46
-9.17
2
74.73
17
9.43
32
 0.02
 47
 -9.86
3
 54.77
18
 8.44
 33
 -0.62
 48
 -10.46
4
 50.08
19
 8.20
 34
 -1.13
 49
 -13.26
 5
 30.05
 20
 7.83
 35
 -3.48
 50
 -14.09
 6
 24.16
 21
 7.71
 36
 -3.91
 51
 -14.91
 7
 23.67
 22
 6.55
 37
 -4.19
 52
 -15.09
 8
 23.25
 23
 6.06
 38
 -4.47
 53
 -16.43
 9
 22.19
 24
 5.88
 39
 -5.34
 54
 -17.43
 10
 16.33
 25
 4.28
 40
 -5.54
 55
 -17.62
 11
 15.61
 26
 4.11
 41
-6.61
 56
 -32.95
 12
 15.12
 27
 2.91
 42
 -6.68
 57
 -46.53
 13
 14.48
 28
 1.56
 43
-7.09
 
 
 14
 14.21
 29
 0.94
 44
 -7.29
 
 
 15
 12.96
 30
 0.78
 45
 -7.63
 
 

 

Figure 1: Differences between Actual and Expected mean GSI scores for institutions

 

Categories of institutions

On the basis of the differences between expected and actual mean GSI scores, we could classify institutions as performing above, at, or below expectation. A question arises about the points at which it is possible to discriminate the top and bottom groups from the middle one. An obvious technique is to calculate a confidence interval about the mean using its standard error. This would yield a band about 5 units on either side of the mean with 14 institutions in that band, leaving 24 in the top group and 19 in the bottom group. However, we have two concerns about this approach. The first is that the sample that we have is about 50 per cent of the 1996 population, and under these circumstances we should apply a finite population correction to the estimate of the precision. In this case, the confidence interval would be reduced to 2.5 units either side of zero. There is, however, scope for debate about whether the 1996 graduate cohort is a unique population or is a subset of the population of all graduates from Australian universities over time. To some extent this matter is only resolved by establishing the purposes of the analyses. If comparisons are to be made over time to ascertain whether there is an improvement in graduates’ perceptions of the quality of university teaching, there is a case for regarding each annual cohort as a discrete population. If we are interested only in validating a measure of graduates’ perceptions of quality, then there is a case for regarding the 1996 group as one intact convenience sample of a much larger population.

Of greater concern are sampling errors. We do not have a genuine random sample from a homogeneous population: instead we have an intact sample from a stratified heterogeneous population of graduates from different courses and from different institutions. We calculated the design effect at 3.6, which results in a confidence interval of between 10 and 20 units either side of zero, depending upon whether one chooses to apply a finite population correction. With a confidence interval of 20 units, nine institutions would lie in the above expectation group and two below. The two low performing institutions are small specialised institutions whose graduates are placed within the Humanities and Social Sciences group where the expected GSI score is very high. Their specialised courses may well be ones that are rated at the low end of the range, but are disadvantaged by being compared with all other Humanities and Social Sciences courses.

The sampling problems referred to above present a problem that must be addressed. The stratified nature of the sample however suggests a solution to the problem of comparing unlike institutions. We now turn to that matter.

The hierarchical nature of the population

top

The graduates who completed the CEQ differ on individual characteristics such as age and sex, have undertaken courses which also differ in their characteristics, and have graduated from institutions which have distinct histories and missions. It is worth noting, as Meyler did, that students’ judgements of subjects are influenced by the size of the subject, whether the subject is compulsory, and whether the subject is quantitative. Since the current CEQ items invite aggregated judgements about subjects, rather than the totality of graduates’ course experiences, we might expect to see substantial differences in the ratings of different types of courses. The stratified nature of the population suggests that we have a sample of graduates that should be considered to have three levels &endash; the individual, the course, and the institution. Given that this is the case, the forms of analysis that are reported above and that have been used by most others who have researched this area are not appropriate and that analytical tools that do recognise the hierarchical nature of the sample should be used. One such tool is HLM (Hierarchical Linear Modelling) .

We undertook a series of analyses to see if we could identify differences between institutions when individual and course level variables are separated. In order to make these analyses tractable, we confined our attention to the three South Australian universities and used a subset of data of graduates from those institutions. Under this three level model, it is argued that the score of any individual is the result of an institutional component, a course component, an individual component, and error terms that account for unexplained variation at each level.

At the level of individuals we used sex, age, non-English speaking background (NESB) status, employment status, and mode of study as explanatory variables. The regression equation for this relationship is shown below.

GSI = P0 + P1.Sex + P2.Age + P3.Nesb + P4.Emp + P5.Mode + E

That is, an individual’s GSI score can be understood as depending upon his or her own characteristics of sex, age, NESB status, employment status, and mode of study, and that there is an intercept term (P0) that is a result of variables which operate at the course and institutional levels.

At course and institutional levels we did not use explanatory variables. We included only proxy categorical (dummy) variables to separate the influences of the different broad fields of study and the different universities. In estimating the parameters in the following equations, with data from N sources, N-1 parameters can be estimated so one parameter (for a course type or institution) must be omitted from the estimation. (In the equations below, the omitted parameters are shown in brackets). However, an individual must be a member of one, and only one, category so the omitted parameter must be the complement of the sum of the estimated ones. The course level regression equation was therefore:

P0 = B00 + B01.AgSci + B02.Arch + B03.HSS + B04.Bus + B05.Educ + B06.Eng + B07.Eng + B08.Law [ + B09.MaSci] + R0

Thus the intercept term used in the first level equation (P0) is the result of the particular course type that the individual completed, an error term to represent unexplained variance (R0), and an intercept term (B00) that reflects variation at the third or institutional level.

For the institutional level, the regression equation was:

B00 = G000 + G001.Flin + G002.Adel [ + G003.UniSA] + U0

It should be noted that the three levels of the model are related through intercept terms. At the individual level, there is an intercept P0, and it is the criterion variable of the course level equation. Its intercept term, B00, is the criterion variable in the third level equation. In that equation, the parameters of interest to us are the coefficients of the categorical variables for each of the institutions. Those parameters tell us about the relative standings of the three institutions when the hierarchical nature of the sample is modelled and when variables at the individual and course levels are taken into account. Table 4 shows a summary of the results of the hierarchical analyses completed using HLM.

 

Table 4: Summary results of the hierarchical analysis of data from the three South Australian universities

 Fixed Effect
 Coefficient
 Standard Error
 T-ratio
 Sig
 Level 3 effects
 
 
 
 
 FLIN
 484.25
 13.04
 4.58
 **
 ADEL
 472.72
 5.56
 8.66
 **
 UNISA
 461.14
 5.12
 7.15
 **
 Level 2 effects
 
 
 
 
 AGSCI, B01
 -0.50
 14.58
 -0.03
 
 ARCH, B02
-3.11
 15.89
 -0.20
 
 HSS, B03
 26.60
 11.84
 2.25
 **
 BUS, B04
 -14.32
 12.17
 -1.18
 
 EDUC, B05
 25.68
12.82
 2.00
 *
 ENG, B06
 -18.53
 13.50
 -1.37
 
 MED, B07
 -9.79
 12.17
 -0.80
 
 LAW, B08
 0.50
 14.58
 0.03
 
 MASCI, B09
 6.88
 12.04
 0.57
 
 Level 1 effects
 
 
 
 
 SEX, P1
 4.77
 3.29
 1.45
 
 AGE, P2
 1.15
 0.18
 6.35
**
 NESB, P3
 -8.04
 3.99
 -2.02
 **
 UNEMP, P4
 -10.91
 5.35
-2.04
 **
An ‘*’ in the Significance column indicates p<0.10 while ‘**’ indicates p<0.05
 

From the hierarchical analyses, we found that mode of study was not significant and has been dropped from the model. Sex was only marginally significant, but we have chosen to leave it in the model as it assists in explaining some of the features that emerge from the analyses. To estimate the GSI of any individual the separate regression equations with their estimated parameters can be combined. For any individual, the institutional score is taken and added to it is the broad field of study score, and then the individual characteristic variable scores. 

GSI = Inst + BFStud + 3.81 Sex + 0.92 Age - 6.43 Nesb - 8.73 Emp + E + R0 + U0

 It is instructive to compare course and institutional means found from multilevel analysis with those found from earlier methods.

 Course type performances

When raw means of graduates from each of the nine broad fields of study are computed, no allowance is made for the characteristics of graduates from those courses. For example, graduates of engineering courses are younger than those of education awards and more of them are males. In the multilevel analysis, it is apparent that younger graduates make harsher judgements of course quality than do older ones, and males tend to make harsher judgements than do females. Table 5 shows the deviation of the raw mean from 500 (the overall mean of all graduates) and the course intercept from the multilevel analysis. While for some course types there is very little difference between the two measures of perceived course quality, for others eg, Architecture there are substantial differences. Architecture graduates are predominantly male, younger than other graduates, and experience greater difficulty in finding employment. Each of these factors is associated with significantly lower judgements of course quality. By not separating these factors, Architecture courses are perceived to rate poorly by comparison with others. For Education graduates, there is a substantial difference between the raw score deviation and the HLM intercept. This is attributed to the low representation of NESB persons and the difficulty gradates experience in finding satisfactory employment as many find only part time and short term contract work. However, when the influence of individuals’ characteristics is removed, the influence of the type of course on graduates’ perceptions of course quality is shown to be little different from the overall mean. We argue that when institutions are comparing course types with each other, it would be more sensible to use a measure that has greater meaning and that has extracted from it influences other than those due to the course itself.

 

Table 5: A comparison of Broad Field of Study means (expressed as deviations from the overall mean) with intercepts from multilevel analysis

 AgSi
 Arch
 HSS
 Bus
 Educ
 Eng
 Med
 Law
 MaSci
 Dev from o’all mean
 11.50
 -19.10
 27.06
 -14.29
 7.08
 -23.86
 -16.91
 -6.20
 2.30
 HLM dev
 -0.50
 -3.11
 26.60
 -14.32
 25.68
 -18.53
 -9.79
 0.50
 6.88
  
Institutional performance measures

It has been argued earlier in this paper that institutions can be compared using graduates’ raw GSI scores, but we have argued that failing to correct for course mix biases the measure. In the multilevel analysis, we have found that there are individual graduate characteristics that influence their judgements about courses. In order to unpack the results of the multilevel analysis, it is instructive to examine the measures that are available. These measures are presented in Table 6.

 

Table 6: A comparison of alternative measures of institutional performance derived from the GSI

 Institution

 Raw mean GSI
 Expected GSI
 Difference (raw-expected)
 Multilevel intercept

 Flinders

 526.86
 503.61
 23.25
 484.25

 Adelaide

 507.91
 503.63
 4.28
 472.72

 University of SA

 502.10
 496.04
 6.06
 461.14
First, it should be noted that under the analyses described above, the three South Australian universities perform at or above the national average. Indeed Flinders University performs well above it. Both the University of Adelaide and the University of South Australia perform slightly better than expected, but not significantly so. On the measure corrected for course mix, Flinders performs 19 points ahead of Adelaide. However, when corrected for individual graduate characteristics, its lead over Adelaide is reduced to about 12 points. This is because Flinders graduates are almost six years older than Adelaide graduates, it has a greater proportion of women graduates, and a lower proportion of NESB graduates. When Adelaide graduates’ characteristics are considered, it has a lead of 11 points over the University of South Australia.

Summary and conclusion

top

We have shown that using the Rasch measurement model, it is possible to identify items that fit a coherent scale and to convert the ordinal ratings of graduates on CEQ items to an interval measure of perceived course quality. This measure has been re-scaled to produce what we have called the Graduate Satisfaction Index (GSI). We have shown that, while it is possible to use raw GSI mean to compare institutions, this produces biased ratings because of differences in the ratings of different types of courses and because of differences in institutional course profiles. We have also shown that it is possible to correct for the influence of course type to generate a more satisfactory measure of institutional performance. However, it is clear that the problem of measuring course quality is a multilevel one and that it is necessary to examine factors in a multilevel model. In doing this, we have found that there are individual graduate characteristics that influence the judgements made about the courses that graduates have just completed. These characteristics vary among courses as well as institutions, and following multilevel analysis, we have shown that the influence of individual characteristics can be separated to develop better comparative measures of both courses and institutions, and we have done this for the three South Australian universities.

Multilevel analysis has permitted influences of variables that were previously confounded to be disaggregated. For example, in earlier studies , it was reported that employment status at the time of completing the CEQ did not influence graduates’ perceptions of their courses. By separating effects at individual and course levels, we have been able to show that employment status is significant. In the past, its influence has been masked by course type because of different rates of graduate employment from different courses.

Multilevel analysis has also enabled reliable estimates of institutional effects to be established. It is desirable that institutions now consider their relative positions and begin to explore factors that may explain these estimates. It is quite possible that decisions made within institutions on the allocation of funds to libraries and other student services or that expenditures on teaching and research activities influence graduates’ judgments of their courses. Johnson and Keeves have begun to do just these forms of analysis and are able to show that these and other decisions do influence graduates’ perceptions.

We do not suggest that such detailed analyses should be routinely undertaken. However, if analyses like these are done, we could identify salient factors (and their parameters) at each of the three levels of the model and they could be used to correct ‘raw’ course and institutional measures of course quality as perceived by graduates. If significant policy decisions such as resource allocations are to be based upon instruments like the CEQ, we suggest that better analytical techniques such as those that have been reported in this study should be employed so that those policy decisions are more soundly based.

References 

top

 Bryk, A., Raudenbush, S., & Congdon, R. (1996). HLM for Windows. Hierarchical linear and nonlinear modeling with HLM/2L and HLM/3L. (Version 4) [Multilevel analysis software]. Chicago: Scientific Software International.

Curtis, D. D. (1999). The 1996 Course Experience Questionnaire: A Re-Analysis. Unpublished Ed. D. dissertation, The Flinders University of South Australia, Adelaide.

Johnson, T. (1997). The 1996 Course Experience Questionnaire: a report prepared for the Graduate Careers Council of Australia . Parkville: Graduate Careers Council of Australia.

Johnson, T. G., & Keeves, J. P. (2000). Spending on the selling of wisdom. Issues in Educational Research, 10(1). (In press)

Karmel, T., Aungles, P., & Andrews, L. (1998, 30 September). Presentation of the Course Experience Questionnaire (CEQ). Paper presented at the Course Experience Questionnaire Symposium 1998, University of New South Wales.

Meyler, M. (1997, 8 -11 July). What do SET surveys really measure? And why does it matter? Paper presented at the HERDSA'97 Advancing International Perspectives conference, Adelaide.

Waugh, R. F. (1998). The Course Experience Questionnaire: a Rasch measurement model analysis. Higher Education Research and Development, 17(1), 45-64.

Weiss, D. J., & Yoes, M. E. (1991). Item response theory. In R. K. Hambleton & J. N. Zaal (Eds.), Advances in educational and psychological testing: theory and applications (pp. 69-95). Boston: Kluwer Academic Publishers.

Wright, B. D., & Masters, G. (1981). The measurement of knowledge and attitude (Research Memorandum 30). Chicago: University of Chicago, Department of Education, Statistical Laboratory.

Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.

 

    
Appendix 1: Items used in the 1996 CEQ

top

 Item

 Sub-scale

 Item statement

 1

CGS

 It was always easy to know the standard of work expected.

 2

GSS

 The course developed my problem solving skills.

 3

GTS

 The teaching staff of this course motivated me to do my best work.

 4 *

AWS

 The workload was too heavy.

 5

GSS

 The course sharpened my analytic skills.

 6

CGS

 I usually had a clear idea of where I was going and what was expected of me in this course.

 7

GTS

 The staff put a lot of time into commenting on my work.

 8 *

AAS

 To do well in this course all you really needed was a good memory.

 9

GSS

 The course helped me develop my ability to work as a team member.

 10

GSS

 As a result of my course, I feel confident about tackling unfamiliar problems.

 11

GS

 The course improved my skills in written communication.

 12 *

AAS

 The staff seemed more interested in testing what I had memorised than what I had understood.

 13

CGS

 It was often hard to discover what was expected of me in this course.

 14

AWS

 I was generally given enough time to understand the things I had to learn.

 15

GTS

 The staff made a real effort to understand difficulties I might be having with my work.

 16 *

AAS

 Feedback on my work was usually provided only as marks or grades.

 17

GTS

 The teaching staff normally gave me helpful feedback on how I was going.

 18

GTS

 My lecturers were extremely good at explaining things.

 19 *

AAS

 Too many staff asked me questions just about facts.

 20

GTS

 The teaching staff worked hard to make their subjects interesting.

 21 *

AWS

 There was a lot of pressure on me to do well in this course.

 22

GSS

 My course helped me to develop the ability to plan my own work.

 23 *

AWS

 The sheer volume of work to be got through in this course meant it couldn’t all be thoroughly comprehended.

 24

CGS

 The staff made it clear right from the start what they expected from students.

 25

 

 Overall, I was satisfied with the quality of this course.

* indicates a reversed item.

International Education Journal, 1 (2) 2000
http://iej.cjb.net


contents

Back to Contents

download

Download
Article

Acrobat Reader

Download
Acrobat Reader


All text and graphics © 1999-2000 Shannon Research Press
online editor