Home > Researchers > Analysis of Test Data
 

 

 










 

Test performance 2008

Each year, multiple versions of each of the six IELTS modules (Listening, Academic Reading, General Training Reading, Academic Writing, General Training Writing, and Speaking) are released for use by centres testing IELTS candidates. Reliability estimates for the objectively and subjectively scored modules used in 2008 are reported below.


Reliability of objectively-scored modules (Reading and Listening)

The reliability of Listening and Reading tests is reported using Cronbach's alpha, a reliability estimate which measures the internal consistency of the 40-item test. The following Listening and Reading material released in 2008 had sufficient candidate responses to estimate and report meaningful reliability values as follows:

 

 

The figures reported for Listening and Reading modules indicate the expected levels of reliability for tests containing 40 items. On the basis of these reliability figures, an estimate of the standard error of measurement (SEM) may be calculated for these modules using the following formula:


 

St is the standard deviation of the test

rxx' is the reliability of the test

 


Table 1 Mean, standard deviation and standard error of measurement of Listening and Reading

 

 

The SEM should be interpreted in terms of the final band scores reported for Listening and Reading modules (which are reported in half-bands).

 

Reliability of subjectively-scored modules (Writing and Speaking)

The reliability of the Writing and Speaking modules cannot be reported in the same manner as for Reading/Listening because they are not item-based; candidates' writing and speaking performances are rated by trained and standardised examiners according to detailed descriptive criteria and rating scales. The assessment criteria used for rating Writing and Speaking performance are described in the IELTS 2006 Handbook. Benchmarked example writing performances and CD-based speaking performances at different levels can be found, along with examiner comments, in the IELTS official practice materials which can be ordered from the IELTS website. User-oriented band descriptors describing levels of Writing and Speaking performance are also available on the website. In addition, a new DVD “IELTS Scores Explained” provides information specifically tailored to organizations wanting a detailed description of IELTS scores. This information helps in setting appropriate standards of English proficiency. Click here for more information.

 

Reliability of rating is assured through the face-to-face training and certification of examiners and all must undergo a retraining and recertification process every two years. A Professional Support Network (PSN) manages and standardizes the examiner cadre, including face to face examiner monitoring as well as distance monitoring (using recordings of the Speaking tests). A ‘jagged profile’ system maintains a further check on the global reliability of IELTS performance assessment. Routine targeted double marking identifies the level of divergence (i.e., jagged profile) between Writing and/or Speaking scores and Reading and Listening scores. This process allows for the identification of possible misclassified candidates. The jagged profile system is also combined with ‘Targeted sample monitoring’ to further identify possible faulty ratings by examiners. Selected centres worldwide are required to provide a sample of examiners' marked tapes and scripts. Tapes and scripts are then second-marked by a team of IELTS Principal Examiners and assistant Principal Examiners. Principal Examiners monitor for quality of both test conduct and rating, and feedback is returned to each test centre. The outcomes that emerge from these reliability measures feed back into examiner retraining and continually build on quality management and assurance systems for IELTS.

 

Experimental generalisability studies were also carried out as part of the IELTS Speaking Revision Project (1998-2001) and the IELTS Writing Revision Project (2001-2005). The study conducted for the Speaking Revision produced an inter-rater correlation of 0.77, and a g-coefficient of 0.86 for the operational single-rater condition (see article in Research Notes 4); the Writing Revision study produced an inter-rater correlation of 0.77 and g-coefficients of 0.85-0.93 for the operational single-rater condition (see article in Research Notes 16). 

 

The IELTS exam contains four components upon which an overall band score is awarded. Thus an estimate of composite reliability offers a useful measure for overall test reliability. Approaches to estimating the reliability of a composite test are discussed in Feldt & Brennan (1989: 117) and Crocker & Algina (1986: 119-121).The method used here is taken from Feldt & Brennan (1989).

 

Composite reliability estimates were carried out from the period 1st January to 20 December, 2004. To generate an appropriately cautious estimate, minimum alpha values were used for the objectively marked papers; and g-coefficients for the single rater condition on subjectively marked scores. The composite reliability estimate for the Academic module was 0.95 and produced a composite SEM of 0.21. This finding shows a 95% probability for a candidate’s true score to fall within less than half a band (0.41) of the observed score. For General Training the composite reliability was 0.95 with a SEM of 0.23. If average; rather than minimum values; are used for the objective paper alphas, the reliability for both Academic and GT versions improves slightly to 0.96.

Test taker performance 2008

Band score information

IELTS is assessed on a 9-band scale and reports scores both overall and by individual skill. Overall band scores for Academic and General Training candidates in 2008 are shown here together with scores for individual skills according to a variety of classifications. These figures are broadly in line with statistics for previous years.

N.B. for place of origin and first language, the tables show the top 40 places and languages, listed alphabetically, not in order of the size of the candidature.

Academic and General Training candidates

The following table shows the split between the Academic and General Training candidature in 2008.



 





Place of Origin

These figures show the mean overall and individual band scores achieved by 2008 Academic and General training candidates according to their place of origin.

Mean band score for the most frequent countries or regions of origin (Academic)

 

Mean band score by most frequent countries or regions of origin (General Training)

 

First language

These figures show the mean overall and individual band scores achieved by 2008 Academic and General training candidates from the top 40 first language backgrounds

Mean band scores for most common first languages (academic)

Mean band scores for most common first languages (General Training)


 

Test performance 2007

Each year, multiple versions of each of the six IELTS modules (Listening, Academic Reading, General Training Reading, Academic Writing, General Training Writing, and Speaking) are released for use by centres testing IELTS candidates. Reliability estimates for the objectively and subjectively scored modules used in 2007 are reported here.

Reliability of objectively-scored modules (Reading and Listening)

The reliability of Listening and Reading tests is reported using Cronbach's alpha, a reliability estimate which measures the internal consistency of the 40-item test. The following Listening and Reading material released in 2007 had sufficient candidate responses to estimate and report meaningful reliability values as follows:

 




 

The figures reported for Listening and Reading modules indicate the expected levels of reliability for tests containing 40 items. On the basis of these reliability figures, an estimate of the standard error of measurement (SEM) may be calculated for these modules using the following formula:



St is the standard deviation of the test

rxx' is the reliability of the test

Table 1 Mean, standard deviation and standard error of measurement of Listening and Reading



The SEM should be interpreted in terms of the final band scores reported for Listening and Reading modules (which are reported in half-bands).

Reliability of subjectively-scored modules (Writing and Speaking) The reliability of the Writing and Speaking modules cannot be reported in the same manner as for Reading/Listening because they are not item-based; candidates' writing and speaking performances are rated by trained and standardised examiners according to detailed descriptive criteria and rating scales. The assessment criteria used for rating Writing and Speaking performance are described in the IELTS 2006 Handbook. Benchmarked example writing performances and CD-based speaking performances at different levels can be found, along with examiner comments, in the IELTS official practice materials which can be ordered from the IELTS website. User-oriented band descriptors describing levels of Writing and Speaking performance are also available on the website. In addition, a new DVD “IELTS Scores Explained” provides information specifically tailored to organizations wanting a detailed description of IELTS scores. This information helps in setting appropriate standards of English proficiency. Click here for more information.

Reliability of rating is assured through the face-to-face training and certification of examiners and all must undergo a retraining and recertification process every two years. A Professional Support Network (PSN) manages and standardizes the examiner cadre, including face to face examiner monitoring as well as distance monitoring (using recordings of the Speaking tests). A ‘jagged profile’ system maintains a further check on the global reliability of IELTS performance assessment. Routine targeted double marking identifies the level of divergence (i.e., jagged profile) between Writing and/or Speaking scores and Reading and Listening scores. This process allows for the identification of possible misclassified candidates. The jagged profile system is also combined with ‘Targeted sample monitoring’ to further identify possible faulty ratings by examiners. Selected centres worldwide are required to provide a sample of examiners' marked tapes and scripts. Tapes and scripts are then second-marked by a team of IELTS Principal Examiners and assistant Principal Examiners. Principal Examiners monitor for quality of both test conduct and rating, and feedback is returned to each test centre. The outcomes that emerge from these reliability measures feed back into examiner retraining and continually build on quality management and assurance systems for IELTS.

 

Experimental generalisability studies were also carried out as part of the IELTS Speaking Revision Project (1998-2001) and the IELTS Writing Revision Project (2001-2005). The study conducted for the Speaking Revision produced an inter-rater correlation of 0.77, and a g-coefficient of 0.86 for the operational single-rater condition (see article in Research Notes 4); the Writing Revision study produced an inter-rater correlation of 0.77 and g-coefficients of 0.85-0.93 for the operational single-rater condition (see Research Notes 16: IELTS writing: Revising assessment criteria and scales, Phase

3) From 2008 it is expected that Speaking tests will be digitally recorded by IELTS centres worldwide. Cambridge ESOL has been undertaking research into the use of digital audio technology in speaking assessment for several years, including the feasibility of such technology for double marking of speaking tests. A recent study (2006) from the Digital Audio Project investigated partial double-marking of IELTS Speaking tests in live conditions. Partial rating presupposes that candidate performance in one or more parts of the Speaking test correlates adequately with performance in the Speaking test overall. The results indicated that Part 3 of the test provided the best correlation between marks on the full test and marks on a test part. Further empirical studies from the Digital Audio Project are currently examining the potential for partial double marking to provide a reliable indicator of fairness and quality assurance of the IELTS Speaking test. Performance of test materials in the Writing and Speaking modules is routinely analysed to check on the comparability of different test versions and to ensure any variation is within the acceptable limit. Mean bandscores for the Academic Writing versions released in 2006, and for which a sufficient sample size has been obtained, ranged from 5.31 to 6.07. Mean bandscores for the General Training Writing versions released in 2006 ranged from 5.53 to 5.85. Mean bandscores for Speaking versions released in 2006 ranged from 5.55 to 6.30.

Reporting IELTS Composite Reliability

The IELTS exam contains four components upon which an overall band score is awarded. Thus an estimate of composite reliability offers a useful measure for overall test reliability. Approaches to estimating the reliability of a composite test are discussed in Feldt & Brennan (1989: 117)1 and Crocker & Algina (1986: 119-121)2.The method used here is taken from Feldt & Brennan (1989).

Composite reliability estimates were carried out from the period 1st January to 20 December, 2004. To generate an appropriately cautious estimate, minimum alpha values were used for the objectively marked papers; and g-coefficients for the single rater condition on subjectively marked scores. The composite reliability estimate for the Academic module was 0.95 and produced a composite SEM of 0.21. This finding shows a 95% probability for a candidate’s true score to fall within less than half a band (0.41) of the observed score. For General Training the composite reliability was 0.95 with a SEM of 0.23. If average; rather than minimum values; are used for the objective paper alphas, the reliability for both Academic and GT versions improves slightly to 0.96.

1 Feldt L.S & Brennan R. L. (1989) Reliability. In Linn (Ed): Educational Measurement, 3rd Edition. American Council on Education: Macmillan
2 Crocker L. & Algina J. (1986) Introduction to classical and modern test theory. Orlando, FL: Harcourt Brace Jovanovitch.

Test taker performance 2007

IELTS is assessed on a nine-band scale and reports scores both overall and by individual skill. Overall Band Scores for Academic and General Training candidates in 2007 are reported here together with scores for the individual skills.
General Training candidates achieved higher scores in Listening and Speaking relative to their performance in Reading and Writing. On average, mean scores for Academic candidates showed less variation across the skills, but the Writing module was the most challenging.

Almost four-fifths of candidates (75.8%) took the Academic Reading and Writing modules with just over a fifth (24.2%) taking the General Training Reading and Writing modules. Both Academic and General Training candidates take the same Listening and Speaking modules. Overall, the IELTS candidature during the year was 46.1% female and 53.9% male. Of candidates taking the Academic Reading modules 49.1% were Female and 50.9% male; 37% of candidates taking the General Training modules were female and 63% were male.

 



 





 

 

Percentile ranks 2007

Frequency distributions by percentage

The following tables show the distribution of scores achieved by various groups of candidates, which may be of interest as an indication of how an individual candidate has performed relative to other members of a grouping to which he or she belongs, though the categories reported here are necessarily very broad.

Frequency distribution by reason for taking IELTS



 

Academic candidates: Top 20 First Languages
Band score by %



GT candidates: Top 20 First Languages
Band score by %

 

Academic candidates: Top 20 - Country of origin
Band score by %



GT candidates: Top 20 - Country of origin
Band score by %