The Programme for International Student Assessment (PISA) is a worldwide study by the Organisation for Economic Co-operation and Development (OECD) in member and non-member nations of 15-year-old school pupils' scholastic performance on mathematics, science, and reading. It was first performed in 2000 and then repeated every three years. Its aim is to provide comparable data with a view to enabling countries to improve their education policies and outcomes. It measures problem solving and cognition in daily life.
The 2015 version of the test was published on 6 December 2016.
PISA, and similar international standardised assessments of educational attainment are increasingly used in the process of education policymaking at both national and international levels.
PISA was conceived to set in a wider context the information provided by national monitoring of education system performance through regular assessments within a common, internationally agreed framework; by investigating relationships between student learning and other factors they can “offer insights into sources of variation in performances within and between countries”.
Until the 1990s, few European countries used national tests. In the 1990s, ten countries / regions introduced standardised assessment, and since the early 2000s, ten more followed suit. By 2009, only five European education systems had no national student assessments.
The impact of these international standardised assessments in the field of educational policy has been significant, in terms of the creation of new knowledge, changes in assessment policy, and external influence over national educational policy more broadly.
Data from international standardised assessments can be useful in research on causal factors within or across education systems. Mons notes that the databases generated by large-scale international assessments have made possible the carrying out, on an unprecedented scale, of inventories and comparisons of education systems in more than 40 countries and on themes ranging from the conditions for learning in mathematics and reading, to institutional autonomy and admissions policies. They allow typologies to be developed that can be used for comparative statistical analyses of education performance indicators, thereby identifying the consequences of different policy choices. They have generated new knowledge about education: PISA findings have challenged deeply embedded educational practices, such as the early tracking of students into vocational or academic pathways.
Barroso and de Carvalho find that PISA provides a common reference connecting academic research in education and the political realm of public policy, operating as a mediator between different strands of knowledge from the realm of education and public policy. However, although the key findings from comparative assessments are widely shared in the research community the knowledge they create does not necessarily fit with government reform agendas; this leads to some inappropriate uses of assessment data.
Emerging research suggests that international standardised assessments are impacting upon national assessment policy and practice. PISA is being integrated in national policies and practices on assessment, evaluation, curriculum standards and performance targets; its assessment frameworks and instruments are being used as best-practice models for improving national assessments; many countries have explicitly incorporated and emphasise PISA-like competencies in revised national standards and curricula; others use PISA data to complement national data and validate national results against an international benchmark.
More important than its influence on countries' policy of student assessment, is the range of ways in which PISA is influencing countries education policy choices.
Policy-makers in most participating countries see PISA as an important indicator of system performance; PISA reports can define policy problems and set the agenda for national policy debate; policymakers seem to accept PISA as a valid and reliable instrument for internationally benchmarking system performance and changes over time; most countries - irrespective of whether they performed above, at, or below the average PISA score - have begun policy reforms in response to PISA reports.
Against this, it should be noted that impact on national education systems varies markedly. For example, in Germany, the results of the first PISA assessment caused the so-called ‘PISA shock’: a questioning of previously accepted educational policies; in a state marked by jealously guarded regional policy differences, it led ultimately to an agreement by all Länder to introduce common national standards and even an institutionalised structure to ensure that they were observed. In Hungary, by comparison, which shared similar conditions to Germany, PISA results have not led to significant changes in educational policy.
Because many countries have set national performance targets based on their relative rank or absolute PISA score, PISA assessments have increased the influence of their (non-elected) commissioning body, the OECD, as an international education monitor and policy actor, which implies an important degree of ‘policy transfer’ from the international to the national level; PISA in particular is having “an influential normative effect on the direction of national education policies”. Thus, it is argued that the use of international standardised assessments has led to a shift towards international, external accountability for national system performance; Rey contends that PISA surveys, portrayed as objective, third-party diagnoses of education systems, actually serve to promote specific orientations on educational issues.
National policy actors refer to high-performing PISA countries to “help legitimise and justify their intended reform agenda within contested national policy debates”. PISA data can be are “used to fuel long-standing debates around pre-existing conflicts or rivalries between different policy options, such as in the French Community of Belgium”. In such instances, PISA assessment data are used selectively: in public discourse governments often only use superficial features of PISA surveys such as country rankings and not the more detailed analyses. Rey (2010:145, citing Greger, 2008) notes that often the real results of PISA assessments are ignored as policymakers selectively refer to data in order to legitimise policies introduced for other reasons.
In addition, PISA's international comparisons can be used to justify reforms with which the data themselves have no connection; in Portugal, for example, PISA data were used to justify new arrangements for teacher assessment (based on inferences that were not justified by the assessments and data themselves); they also fed the government’s discourse about the issue of pupils repeating a year, (which, according to research, fails to improve student results). In Finland, the country's PISA results (that are in other countries deemed to be excellent) were used by Ministers to promote new policies for ‘gifted’ students. Such uses and interpretations often assume causal relationships that cannot legitimately be based upon PISA data which would normally require fuller investigation through qualitative in-depth studies and longitudinal surveys based on mixed quantitative and qualitative methods, which politicians are often reluctant to fund.
Recent decades have witnessed an expansion in the uses to which PISA and similar assessments are put, from assessing students’ learning, to connecting “the educational realm (their traditional remit) with the political realm”. This raises the question whether PISA data are sufficiently robust to bear the weight of the major policy decisions that are being based upon them, for, according to Breakspear, PISA data have “come to increasingly shape, define and evaluate the key goals of the national / federal education system”. This implies that those who set the PISA tests – e.g. in choosing the content to be assessed and not assessed – are in a position of considerable power to set the terms of the education debate, and to orient educational reform in many countries around the globe.
PISA stands in a tradition of international school studies, undertaken since the late 1950s by the International Association for the Evaluation of Educational Achievement (IEA). Much of PISA's methodology follows the example of the Trends in International Mathematics and Science Study (TIMSS, started in 1995), which in turn was much influenced by the U.S. National Assessment of Educational Progress (NAEP). The reading component of PISA is inspired by the IEA's Progress in International Reading Literacy Study (PIRLS).
The PISA mathematics literacy test asks students to apply their mathematical knowledge to solve problems set in real-world contexts. To solve the problems students must activate a number of mathematical competencies as well as a broad range of mathematical content knowledge. TIMSS, on the other hand, measures more traditional classroom content such as an understanding of fractions and decimals and the relationship between them (curriculum attainment). PISA claims to measure education's application to real-life problems and lifelong learning (workforce knowledge).
In the reading test, "OECD/PISA does not measure the extent to which 15-year-old students are fluent readers or how competent they are at word recognition tasks or spelling." Instead, they should be able to "construct, extend and reflect on the meaning of what they have read across a wide range of continuous and non-continuous texts."
PISA is sponsored, governed, and coordinated by the OECD, but paid for by participating countries.
The students tested by PISA are aged between 15 years and 3 months and 16 years and 2 months at the beginning of the assessment period. The school year pupils are in is not taken into consideration. Only students at school are tested, not home-schoolers. In PISA 2006, however, several countries also used a grade-based sample of students. This made it possible to study how age and school year interact.
To fulfill OECD requirements, each country must draw a sample of at least 5,000 students. In small countries like Iceland and Luxembourg, where there are fewer than 5,000 students per year, an entire age cohort is tested. Some countries used much larger samples than required to allow comparisons between regions.
Each student takes a two-hour handwritten test. Part of the test is multiple-choice and part involves fuller answers. There are six and a half hours of assessment material, but each student is not tested on all the parts. Following the cognitive test, participating students spend nearly one more hour answering a questionnaire on their background including learning habits, motivation, and family. School directors fill in a questionnaire describing school demographics, funding, etc. In 2012 the participants were, for the first time in the history of large-scale testing and assessments, offered a new type of problem, i.e. interactive (complex) problems requiring exploration of a novel virtual device.
In selected countries, PISA started experimentation with computer adaptive testing.
Countries are allowed to combine PISA with complementary national tests.
Germany does this in a very extensive way: On the day following the international test, students take a national test called PISA-E (E=Ergänzung=complement). Test items of PISA-E are closer to TIMSS than to PISA. While only about 5,000 German students participate in the international and the national test, another 45,000 take only the latter. This large sample is needed to allow an analysis by federal states. Following a clash about the interpretation of 2006 results, the OECD warned Germany that it might withdraw the right to use the "PISA" label for national tests.
From the beginning, PISA has been designed with one particular method of data analysis in mind. Since students work on different test booklets, raw scores must be 'scaled' to allow meaningful comparisons. Scores are thus scaled so that the OECD average in each domain (mathematics, reading and science) is 500 and the standard deviation is 100. This is true only for the initial PISA cycle when the scale was first introduced, though, subsequent cycles are linked to the previous cycles through IRT scale linking methods.
This generation of proficiency estimates is done using a latent regression extension of the Rasch model, a model of item response theory (IRT), also known as conditioning model or population model. The proficiency estimates are provided in the form of so-called plausible values, which allow unbiased estimates of differences between groups. The latent regression, together with the use of a Gaussian prior probability distribution of student competencies allows estimation of the proficiency distributions of groups of participating students. The scaling and conditioning procedures are described in nearly identical terms in the Technical Reports of PISA 2000, 2003, 2006. NAEP and TIMSS use similar scaling methods.
All PISA results are tabulated by country; recent PISA cycles have separate provincial or regional results for some countries. Most public attention concentrates on just one outcome: the mean scores of countries and their rankings of countries against one another. In the official reports, however, country-by-country rankings are given not as simple league tables but as cross tables indicating for each pair of countries whether or not mean score differences are statistically significant (unlikely to be due to random fluctuations in student sampling or in item functioning). In favorable cases, a difference of 9 points is sufficient to be considered significant.
PISA never combines mathematics, science and reading domain scores into an overall score. However, commentators have sometimes combined test results from all three domains into an overall country ranking. Such meta-analysis is not endorsed by the OECD, although official summaries sometimes use scores from a testing cycle's principal domain as a proxy for overall student ability.
PISA 2015 was presented on 6 December 2016, with results for around 540,000 participating students in 72 countries, with Singapore emerging as the top performer in all categories.
|Period||Focus||OECD countries||Partner countries||Participating students||Notes|
|2000||Reading||28||4 + 11||265,000||The Netherlands disqualified from data analysis. 11 additional non-OECD countries took the test in 2002.|
|2003||Mathematics||30||11||275,000||UK disqualified from data analysis. Also included test in problem solving.|
|2006||Science||30||27||400,000||Reading scores for US disqualified from analysis due to misprint in testing materials.|
|2009||Reading||34||41 + 10||470,000||10 additional non-OECD countries took the test in 2010.|
China's participation in the 2012 test was limited to Shanghai, Hong Kong, and Macao as separate entities. In 2012, Shanghai participated for the second time, again topping the rankings in all three subjects, as well as improving scores in the subjects compared to the 2009 tests. Shanghai's score of 613 in mathematics was 113 points above the average score, putting the performance of Shanghai pupils about 3 school years ahead of pupils in average countries. Educational experts debated to what degree this result reflected the quality of the general educational system in China, pointing out that Shanghai has greater wealth and better-paid teachers than the rest of China. Hong Kong placed second in reading and science and third in maths.
China is expected to participate in 2018 as an entire unit. In 2015, four provinces Jiangsu, Guangdong, Beijing, and Shanghai, with a total population of over 230 million, participated as a single entity. The 2015 Beijing-Shanghai-Jiangsu-Guangdong cohort scored a median 518 in science in 2015, while the 2012 Shanghai cohort scored a median 580.
Critics of PISA counter that in Shanghai and other Chinese cities, most children of migrant workers can only attend city schools up to the ninth grade, and must return to their parents' hometowns for high school due to hukou restrictions, thus skewing the composition of the city's high school students in favor of wealthier local families. A population chart of Shanghai reproduced in The New York Times shows a steep drop off in the number of 15-year-olds residing there. According to Schleicher, 27% of Shanghai's 15-year-olds are excluded from its school system (and hence from testing). As a result, the percentage of Shanghai's 15-year-olds tested by PISA was 73%, lower than the 89% tested in the US. Following the 2015 testing, OECD published in depth studies on the education systems of a selected few countries including China.
Finland, which received several top positions in the first tests, fell in all three subjects in 2012, but remained the best performing country overall in Europe, achieving their best result in science with 545 points (5th) and worst in mathematics with 519 (12th) in which the country was outperformed by four other European countries. The drop in mathematics was 25 points since 2003, the last time mathematics was the focus of the tests. For the first time Finnish girls outperformed boys in the subject, but only narrowly. It was also the first time pupils in Finnish-speaking schools did not perform better than pupils in Swedish-speaking schools. Minister of Education and Science Krista Kiuru expressed concern for the overall drop, as well as the fact that the number of low-performers had increased from 7% to 12%.
India participated in the 2009 round of testing but pulled out of the 2012 PISA testing, in August 2012, with the Indian government attributing its action to the unfairness of PISA testing to Indian students. The Indian Express reported on 9/3/2012 that "The ministry (of education) has concluded that there was a socio-cultural disconnect between the questions and Indian students. The ministry will write to the OECD and drive home the need to factor in India's "socio-cultural milieu". India's participation in the next PISA cycle will hinge on this". The Indian Express also noted that "Considering that over 70 nations participate in PISA, it is uncertain whether an exception would be made for India".
In June 2013, the Indian government, still concerned with the future prospect of fairness of PISA testing relating to Indian students, again pulled India out from the 2015 round of PISA testing.
Sweden's result dropped in all three subjects in the 2012 test, which was a continuation of a trend from 2006 and 2009. In mathematics, the nation had the sharpest fall in mathematic performance over 10 years among the countries that have participated in all tests, with a drop in score from 509 in 2003 to 478 in 2012. The score in reading showed a drop from 516 in 2000 to 483 in 2012. The country performed below the OECD average in all three subjects. The leader of the opposition, Social Democrat Stefan Löfven, described the situation as a national crisis. Along with the party's spokesperson on education, Ibrahim Baylan, he pointed to the downward trend in reading as most severe.
In the 2012 test, as in 2009, the result was slightly above average for the United Kingdom, with the science ranking being highest (20). England, Wales, Scotland and Northern Ireland also participated as separated entities, showing the worst result for Wales which in mathematics was 43 of the 65 countries and economies. Minister of Education in Wales Huw Lewis expressed disappointment in the results, said that there was no "quick fixes", but hoped that several educational reforms that have been implemented in the last few years would give better results in the next round of tests. The United Kingdom had a greater gap between high- and low-scoring students than the average. There was little difference between public and private schools when adjusted for socio-economic background of students. The gender difference in favour of girls was less than in most other countries, as was the difference between natives and immigrants.
Writing in the Daily Telegraph, Ambrose Evans-Pritchard warned against putting too much emphasis on the UK's international ranking, arguing that an overfocus on scholarly performances in East Asia might have contributed to the area's low birthrate, which he argued could harm the economic performance in the future more than a good PISA score would outweigh.
In 2013, the Times Educational Supplement (TES) published an article, "Is PISA Fundamentally Flawed?" by William Stewart, detailing serious critiques of PISA's conceptual foundations and methods advanced by statisticians at major universities.
In the article, Professor Harvey Goldstein of the University of Bristol was quoted as saying that when the OECD tries to rule out questions suspected of bias, it can have the effect of "smoothing out" key differences between countries. "That is leaving out many of the important things,” he warned. "They simply don't get commented on. What you are looking at is something that happens to be common. But (is it) worth looking at? PISA results are taken at face value as providing some sort of common standard across countries. But as soon as you begin to unpick it, I think that all falls apart."
Queen's University Belfast mathematician Dr. Hugh Morrison stated that he found the statistical model underlying PISA to contain a fundamental, insoluble mathematical error that renders Pisa rankings "valueless". Goldstein remarked that Dr. Morrison's objection highlights “an important technical issue” if not a “profound conceptual error”. However, Goldstein cautioned that PISA has been "used inappropriately", contending that some of the blame for this "lies with PISA itself. I think it tends to say too much for what it can do and it tends not to publicise the negative or the weaker aspects.” Professors Morrison and Goldstein expressed dismay at the OECD's response to criticism. Morrison said that when he first published his criticisms of PISA in 2004 and also personally queried several of the OECS's "senior people" about them, his points were met with “absolute silence” and have yet to be addressed. “I was amazed at how unforthcoming they were,” he told TES. “That makes me suspicious.” “Pisa steadfastly ignored many of these issues,” he says. “I am still concerned.”
Professor Kreiner agreed: “One of the problems that everybody has with PISA is that they don’t want to discuss things with people criticising or asking questions concerning the results. They didn’t want to talk to me at all. I am sure it is because they can’t defend themselves.
The American result of 2012 was average in science and reading, but lagged behind in mathematics compared to other developed nations. There was little change from the previous test in 2009. The result was described as “a picture of educational stagnation” by Education Secretary Arne Duncan, who said the result was not compatible with the American goal of having the world's best educated workers. Randi Weingarten of the American Federation of Teachers stated that an overemphasis on standardised tests contributed to the lack of improvement in education performance. Dennis Van Roekel of the National Education Association said a failure to address poverty among students had hampered progress.
About 9% of the U.S. students scored in the top two mathematics levels compared to 13% in all countries and economies.
For the first time, three U.S. states participated in the tests as separate entities, with Massachusetts scoring well above both the American and international averages, particularly in reading. An approximate corresponding OECD ranking is shown along with the United States average.
In 2015, the results from Malaysia were found by the OECD to have not met the minimum response rate. Opposition politician Ong Kian Ming said the education ministry tried to oversample high-performing students in rich schools.
Although PISA and TIMSS officials and researchers themselves generally refrain from hypothesizing about the large and stable differences in student achievement between countries, since 2000, literature on the differences in PISA and TIMSS results and their possible causes has emerged. Data from PISA have furnished several economists, notably Eric Hanushek, Ludger Woessmann, Heiner Rindermann, and Stephen J. Ceci, with material for books and articles about the relationship between student achievement and economic development, democratization, and health; as well as the roles of such single educational factors as high-stakes exams, the presence or absence of private schools, and the effects and timing of ability tracking.
PISA 2006 reading literacy results are not reported for the United States because of an error in printing the test booklets. Furthermore, as a result of the printing error, the mean performance in mathematics and science may be misestimated by approximately 1 score point. The impact is below one standard error.