THANH HOA DEPARTMENT OF EDUCATION AND TRAININGNGUYEN TRAI HIGH SCHOOL INITIATIVE IN TEACHING INNOVATION SOME EXPERIENCE IN UTILIZING AI TO ENHANCE THE QUALITY OF TESTS AND ASSESSMENTS AT
Trang 1THANH HOA DEPARTMENT OF EDUCATION AND TRAINING
NGUYEN TRAI HIGH SCHOOL
INITIATIVE IN TEACHING INNOVATION
SOME EXPERIENCE IN UTILIZING AI
TO ENHANCE THE QUALITY OF TESTS AND ASSESSMENTS
AT NGUYEN TRAI HIGH SCHOOL
Implemented by: Cao Xuân Hoàng Position: Head of the English Group Institution: Nguyen Trai High School – Thanh Hoa City
Field: English Language Teaching
THANH HÓA, NĂM 2025
Trang 21 INTRODUCTION 1
1.1 Rationale: 1
The importance of fair, accurate, and innovative assessments in English language teaching 1
1.2 Research objectives: 2
1.3 Research Subjects 3
1.4 Research methods -Scope of the study 3
1.5 Innovations of the Initiative 3
2 CONTENT 4
2.1 Theoretical background 4
2.2 Current situation 4
2.3 Solutions and implementation 6
2.3.1 Basic prompt engineering 6
2.3.1.1 Assign a role 6
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment 6 2.3.1.3 Applying iterative prompting and task decomposition in AI-assisted teaching and content design 7
2.3.1.4 Enhancing clarity and usability through output formatting prompts 7
2.3.1.5 Enhancing English test design through step-by-step prompting 8
2.3.2 Practical application 9
2.4 Results and discussion 17
2.4.1 Observed outcomes 17
2.4.6 Challenges and limitations 18
3 CONCLUSION AND RECOMMENDATIONS 20
3.1 Conclusion 20
3.2 Recommendations 20
DECLARATION OF ORIGINALITY AND RESPONSIBILITY 22
REFERENCES 23
LIST OF RECOGNIZED EXPERIENCE INITIATIVES 24
APPENDIX 1 25
Test Matrix for English Graduation Exam 25
APPENDIX 2 1
OUTCOME TEST 1
APPENDIX 3 9
Technical specification for high school graduation english exam 9
Trang 31 INTRODUCTION 1.1 Rationale:
The importance of fair, accurate, and innovative assessments in English
Traditional assessment methods, however, often face limitations such assubjectivity in grading, time-consuming test construction, and a lack of adaptivedifficulty to cater to diverse learner needs Moreover, conventional paper-basedtests may not fully capture students’ communicative competence, particularly inareas such as critical thinking, creativity, and interactive skills (Hamp-Lyons,2017) This gap highlights the necessity for innovation in assessment design,wheretechnology, particularly Artificial Intelligence (AI), can play a transformative role
Innovative assessments powered by artificial intelligence (AI) presentsignificant advantages in modern educational contexts, particularly throughenhanced fairness, greater accuracy, and adaptive capabilities By leveraging AI togenerate randomized, unbiased questions and automate scoring processes, thesesystems minimize human subjectivity, fostering equitable evaluation standards.Additionally, machine learning algorithms enhance accuracy by analyzingextensive datasets to refine question difficulty levels and ensure alignment withpredefined learning outcomes, thereby improving the validity of assessments.Furthermore, AI-driven adaptive testing dynamically adjusts content based on real-time student responses, enabling personalized evaluations that more preciselymeasure individual competencies (Zhai et al., 2021) Collectively, these featuresposition AI-enhanced assessments as transformative tools for advancingeducational measurement practices
At Nguyen Trai High School, where large class sizes and varying studentproficiency levels present challenges, integrating AI into test design and evaluation
Trang 4could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 51.3 Research Subjects
This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices
1.4 Research methods -Scope of the study
This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance
This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students
The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations
1.5 Innovations of the Initiative
This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized
AI training protocols aligned with national exam formats and Bloom’s Taxonomy
It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models
Trang 6could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 72 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws
Trang 8for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.
2.3 Solutions and implementation
2.3.1 Basic prompt engineering
Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results
2.3.1.1 Assign a role.
Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model
to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility
You are an expert from the Ministry of Education and Training, specializing
in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment
Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing
AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when
Trang 9could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 10could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 11generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.
2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design
AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity
Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport
2.3.1.4 Enhancing clarity and usability through output formatting prompts
One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses
Trang 121.3 Research Subjects
This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices
1.4 Research methods -Scope of the study
This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance
This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students
The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations
1.5 Innovations of the Initiative
This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized
AI training protocols aligned with national exam formats and Bloom’s Taxonomy
It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models
Trang 132 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws
Trang 14generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.
2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design
AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity
Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport
2.3.1.4 Enhancing clarity and usability through output formatting prompts
One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses
Trang 15Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.
Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks
Limited customization and quality control
Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2
time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives
Ad Hoc experimentation with AI tools
A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments
Consequences of current practices
The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors
in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need
Trang 16generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.
2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design
AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity
Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport
2.3.1.4 Enhancing clarity and usability through output formatting prompts
One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses
Trang 17Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.
Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks
Limited customization and quality control
Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2
time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives
Ad Hoc experimentation with AI tools
A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments
Consequences of current practices
The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors
in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need
Trang 18for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.
2.3 Solutions and implementation
2.3.1 Basic prompt engineering
Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results
2.3.1.1 Assign a role.
Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model
to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility
You are an expert from the Ministry of Education and Training, specializing
in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment
Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing
AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when
Trang 19Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.
Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks
Limited customization and quality control
Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2
time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives
Ad Hoc experimentation with AI tools
A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments
Consequences of current practices
The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors
in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need
Trang 201.3 Research Subjects
This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices
1.4 Research methods -Scope of the study
This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance
This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students
The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations
1.5 Innovations of the Initiative
This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized
AI training protocols aligned with national exam formats and Bloom’s Taxonomy
It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models
Trang 21could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 22Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.
Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks
Limited customization and quality control
Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2
time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives
Ad Hoc experimentation with AI tools
A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments
Consequences of current practices
The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors
in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need
Trang 232 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws
Trang 24for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.
2.3 Solutions and implementation
2.3.1 Basic prompt engineering
Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results
2.3.1.1 Assign a role.
Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model
to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility
You are an expert from the Ministry of Education and Training, specializing
in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment
Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing
AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when
Trang 25Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.
Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks
Limited customization and quality control
Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2
time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives
Ad Hoc experimentation with AI tools
A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments
Consequences of current practices
The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors
in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need
Trang 262 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws
Trang 27that are not only relevant in content but also organized in a way that aligns withclassroom needs, lesson objectives, or assessment criteria.
The benefits of this technique are particularly notable in educationalcontexts where structured information enhances student comprehension, supportscomparative analysis, or facilitates easier integration into teaching materials Forexample, requesting vocabulary to be listed in a table with columns for word,meaning, and example sentence helps students visualize and internalizeinformation more effectively Moreover, output formatting promotes consistencyacross different prompts, making it easier for teachers to review and adapt AI-generated content Overall, this technique increases the practicality of AI tools ineducation by improving the readability, accessibility, and pedagogical value of thegenerated outputs
2.3.1.5 Enhancing English test design through step-by-step prompting
In the context of improving English test design with the support of artificialintelligence, the use of step-by-step prompting offers significant advantages interms of clarity, precision, and pedagogical value This technique involvesinstructing the AI to generate responses or perform tasks by breaking them downinto clear, logical steps rather than providing a complete answer all at once Whenapplied to the process of creating English test questions, such as erroridentification, sentence transformation, or reading comprehension, this methodhelps ensure that each component of the task is addressed systematically andaccurately
By guiding the AI to explain its reasoning or construction process step bystep, educators can more easily monitor the appropriateness of the language, thealignment with curriculum standards, and the cognitive level of each question Forexample, when designing a sentence transformation item, prompting the AI to firstanalyze the grammatical structure, then identify the transformation rule, and finallyconstruct the answer ensures a more valid and instructional output Thisincremental approach not only minimizes errors but also makes the test generationprocess more transparent and customizable Ultimately, step-by-step promptingempowers teachers to harness AI effectively in designing high-quality Englishassessments that better support student learning and exam readiness
Trang 28generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.
2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design
AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity
Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport
2.3.1.4 Enhancing clarity and usability through output formatting prompts
One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses
Trang 292 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws
Trang 30could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 31for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.
2.3 Solutions and implementation
2.3.1 Basic prompt engineering
Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results
2.3.1.1 Assign a role.
Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model
to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility
You are an expert from the Ministry of Education and Training, specializing
in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment
Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing
AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when
Trang 32that are not only relevant in content but also organized in a way that aligns withclassroom needs, lesson objectives, or assessment criteria.
The benefits of this technique are particularly notable in educationalcontexts where structured information enhances student comprehension, supportscomparative analysis, or facilitates easier integration into teaching materials Forexample, requesting vocabulary to be listed in a table with columns for word,meaning, and example sentence helps students visualize and internalizeinformation more effectively Moreover, output formatting promotes consistencyacross different prompts, making it easier for teachers to review and adapt AI-generated content Overall, this technique increases the practicality of AI tools ineducation by improving the readability, accessibility, and pedagogical value of thegenerated outputs
2.3.1.5 Enhancing English test design through step-by-step prompting
In the context of improving English test design with the support of artificialintelligence, the use of step-by-step prompting offers significant advantages interms of clarity, precision, and pedagogical value This technique involvesinstructing the AI to generate responses or perform tasks by breaking them downinto clear, logical steps rather than providing a complete answer all at once Whenapplied to the process of creating English test questions, such as erroridentification, sentence transformation, or reading comprehension, this methodhelps ensure that each component of the task is addressed systematically andaccurately
By guiding the AI to explain its reasoning or construction process step bystep, educators can more easily monitor the appropriateness of the language, thealignment with curriculum standards, and the cognitive level of each question Forexample, when designing a sentence transformation item, prompting the AI to firstanalyze the grammatical structure, then identify the transformation rule, and finallyconstruct the answer ensures a more valid and instructional output Thisincremental approach not only minimizes errors but also makes the test generationprocess more transparent and customizable Ultimately, step-by-step promptingempowers teachers to harness AI effectively in designing high-quality Englishassessments that better support student learning and exam readiness
Trang 33generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.
2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design
AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity
Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport
2.3.1.4 Enhancing clarity and usability through output formatting prompts
One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses
Trang 34could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.
1.2 Research objectives:
This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:
First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum
Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels
Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection
Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams
Trang 351.3 Research Subjects
This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices
1.4 Research methods -Scope of the study
This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance
This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students
The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations
1.5 Innovations of the Initiative
This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized
AI training protocols aligned with national exam formats and Bloom’s Taxonomy
It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models
Trang 36for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.
2.3 Solutions and implementation
2.3.1 Basic prompt engineering
Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results
2.3.1.1 Assign a role.
Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model
to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility
You are an expert from the Ministry of Education and Training, specializing
in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….
2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment
Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing
AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when
Trang 372 CONTENT 2.1 Theoretical background
The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such
as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development
2.2 Current situation
At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students
Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:
Heavy dependence on external sources
Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:
Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws