April 4, 2023

What to do now ChatGPT Can Ace Exams

Assessing students has always been a key aspect of education, but recently the release of Large Language Models has caused concern amongst educators that existing approaches will not work. One such model, ChatGPT, is able to pass standardised tests and perform well on continuous assessment tasks like essays. This raises the question of how we should adapt our approach to assessments.

The first thing to recognize is that ChatGPT is not a replacement for traditional assessments but rather a tool that can raise the baseline standard of attainment. Like a calculator or a Google search, it can help students access information more efficiently and accurately, but it does not replace the need for critical thinking, analysis, and creativity. As with the arrival of prior new tools, we need to change how we assess. This is a challenge the education sector has faced many times: responses which we used to credit, like learning lists of capital cities or doing certain types of numerical sums are no longer assessed, or noteworthy. The key difference this time is the pace of change. 

We need to be explicit about why we are examining

There are several good reasons for making students sit a test, but key amongst them are: 

  1. Checking fluency and assessing the students ability to access information in their working memory 
  2. To credit performance and rank students based on the outcome for some other purpose, like securing jobs 

Assessing Fluency 

Testing can be a helpful tool for directing learning. By checking what a student knows and whether they can apply the information we can focus additional attention on areas of weakness. 

If students fake the answers on tests like these then their use as a knowledge check is destroyed. In contrast, if it is clear that these tests are solely to direct learning, with no value placed on the results, then students' incentives should be aligned to the teachers. Both can view these checkins as stepping stones towards a final goal, be that a higher final grade or a better standard of learning - or hopefully both. 

A great example is discussed by Ben Golub in this Tweet thread

It is crucial that students do not feel judged for submitting the wrong answers. We need to be explicit that we are testing for the purpose of directing learning in order to remove the incentive for students to use tools like ChatGPT to obscure their true level of knowledge. Where teachers are not able to build that expectation then tests and environment can be designed to minimise the risk. This could be done by testing skills which are harder for current language models, like applying knowledge or formulating problems, or by testing in closed environments where students are not permitted to use phones or the internet. Other changes, like making these tests no notice would require students to maintain necessary knowledge in the working memory, rather than cramming for tests, or searching information online when needed. 

Ranking and crediting performance 

Sometimes, in contrast, we test in order to rank students’ abilities and to sort them into streams, between universities or within the job market. Sometimes we award prizes. It is this type of assessment that, on the surface, feels most threatened by Generative AI. However, we will see that they need not be. 

Again we must be explicit about why we are testing. In this case, we assess how good a student is at a task, say writing a critical essay, in order to establish how good they will be at doing similar challenges in the future. It follows that we should make the test as similar to the future tasks as possible, anything else will produce a false measure and be less informative. 

Some people may view exams more as measures of general aptitude, or intelligence. If this is really what we want to take from exam results then surely it would be better to test for this directly, rather than via a proxy? If ChatGPT is now able to outperform students on essay based assessments they probably weren’t much use as measures of general intelligence or aptitude in the first place. 

Assessing students, whilst denying them access to tools that are out there feels like a distortion and unrepresentative of the results they can produce in the world. It would be like assessing engineers, but forcing them to do calculations with pen and paper. Not predictive of future results and therefore not useful. Instead, we should simulate real world situations and assess students' ability to perform, whilst using all available tools, on skills such as forming and defending opinions, analysis of novel sources, being concise and identifying key points in an argument. We could even go further and assess their capacity to be interesting or show good taste and judgement. All these would be useful predictors of success in employment or at advanced levels of education, where crucially they will have access to current tools like ChatGPT and those which emerge in the coming years. 

Assessing these skills would require assessors to go beyond a mark scheme and use their judgement to evaluate students' work.  In the past, producing a 2000-word essay with clear structure and a coherent argument was considered creditworthy. However, with the availability of ChatGPT and other tools, this is now something that - given the right training - anyone should be able to do. As such, we need to raise our baseline of performance and credit only work above this level. 

Being much more demanding of students and examiners may feel daunting. However, viewed differently we are on the cusp of the greatest jump in the quality of written work the education system has ever seen. Adapting will take time, but will eventually allow students - and the rest of us - to focus on higher order skills, like synthesis and adapting an argument for a specific audience. 

Conclusion 

Adapting to change caused by technology is a crucial skill for the next generation. As educators and assessors, we need to model this adaptability for current students. We cannot resist the change brought about by technology but rather need to find ways to incorporate it into our approach to education and assessment, and quickly. Failure to do so would be a disservice to our students, as they need to develop the skills necessary to navigate a rapidly changing world.

The impact of ChatGPT on assessment should not be viewed as a threat but rather as an opportunity to improve our approach to education. We should adjust our expectations of what constitutes creditworthy work and test skills that require critical thinking, creativity, and problem-solving. By doing so, we can ensure that our evaluations are not only fair and accurate but also prepare students for the challenges of the future.

Related posts

Buying Generative AI in Government

PUBLIC and Paradigm Junction have teamed up to build this practical guide for public buyer and suppliers to effectively navigate the process of 'Buying GenAI'. Examining critical factors of the procurement process - defining scope, securing finance, evaluating suppliers, defining IP, managing contracts - this report provides usable insights into what makes GenAI different and how Government can engage with it confidently.

Computers aren't supposed to be able to do that

18 months ago he would have been right. To me, it had stopped being remarkable they now can.

Introduction to Futures and Foresight for Emerging Technologies

No one is sure about how the capabilities of AI will develop, let alone how business, government and society will respond to them. But this doesn’t mean that you should stand still. Tools exist for helping decision makers make smart choices in the face of uncertainty about the future, where traditional forecasts are liable to falter.

Apple Vision Pro - Seeing The World Through A New Lens

The Vision Pro isn’t only an AR/VR play from Apple - it’s a first bid to equip us with the tools we will want to be using in the future world of work.