1. Trang chủ
  2. » Luận Văn - Báo Cáo

Skkn cấp tỉnh some experience in utilizing ai to enhance the quality of tests and assessments at nguyen trai high school

75 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Some experience in utilizing ai to enhance the quality of tests and assessments at nguyen trai high school
Tác giả Cao Xuân Hoàng
Trường học Nguyen Trai High School
Chuyên ngành English Language Teaching
Thể loại Sáng kiến kinh nghiệm
Năm xuất bản 2025
Thành phố Thanh Hoa
Định dạng
Số trang 75
Dung lượng 2,54 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

THANH HOA DEPARTMENT OF EDUCATION AND TRAININGNGUYEN TRAI HIGH SCHOOL INITIATIVE IN TEACHING INNOVATION SOME EXPERIENCE IN UTILIZING AI TO ENHANCE THE QUALITY OF TESTS AND ASSESSMENTS AT

Trang 1

THANH HOA DEPARTMENT OF EDUCATION AND TRAINING

NGUYEN TRAI HIGH SCHOOL

INITIATIVE IN TEACHING INNOVATION

SOME EXPERIENCE IN UTILIZING AI

TO ENHANCE THE QUALITY OF TESTS AND ASSESSMENTS

AT NGUYEN TRAI HIGH SCHOOL

Implemented by: Cao Xuân Hoàng Position: Head of the English Group Institution: Nguyen Trai High School – Thanh Hoa City

Field: English Language Teaching

THANH HÓA, NĂM 2025

Trang 2

1 INTRODUCTION 1

1.1 Rationale: 1

The importance of fair, accurate, and innovative assessments in English language teaching 1

1.2 Research objectives: 2

1.3 Research Subjects 3

1.4 Research methods -Scope of the study 3

1.5 Innovations of the Initiative 3

2 CONTENT 4

2.1 Theoretical background 4

2.2 Current situation 4

2.3 Solutions and implementation 6

2.3.1 Basic prompt engineering 6

2.3.1.1 Assign a role 6

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment 6 2.3.1.3 Applying iterative prompting and task decomposition in AI-assisted teaching and content design 7

2.3.1.4 Enhancing clarity and usability through output formatting prompts 7

2.3.1.5 Enhancing English test design through step-by-step prompting 8

2.3.2 Practical application 9

2.4 Results and discussion 17

2.4.1 Observed outcomes 17

2.4.6 Challenges and limitations 18

3 CONCLUSION AND RECOMMENDATIONS 20

3.1 Conclusion 20

3.2 Recommendations 20

DECLARATION OF ORIGINALITY AND RESPONSIBILITY 22

REFERENCES 23

LIST OF RECOGNIZED EXPERIENCE INITIATIVES 24

APPENDIX 1 25

Test Matrix for English Graduation Exam 25

APPENDIX 2 1

OUTCOME TEST 1

APPENDIX 3 9

Technical specification for high school graduation english exam 9

Trang 3

1 INTRODUCTION 1.1 Rationale:

The importance of fair, accurate, and innovative assessments in English

Traditional assessment methods, however, often face limitations such assubjectivity in grading, time-consuming test construction, and a lack of adaptivedifficulty to cater to diverse learner needs Moreover, conventional paper-basedtests may not fully capture students’ communicative competence, particularly inareas such as critical thinking, creativity, and interactive skills (Hamp-Lyons,2017) This gap highlights the necessity for innovation in assessment design,wheretechnology, particularly Artificial Intelligence (AI), can play a transformative role

Innovative assessments powered by artificial intelligence (AI) presentsignificant advantages in modern educational contexts, particularly throughenhanced fairness, greater accuracy, and adaptive capabilities By leveraging AI togenerate randomized, unbiased questions and automate scoring processes, thesesystems minimize human subjectivity, fostering equitable evaluation standards.Additionally, machine learning algorithms enhance accuracy by analyzingextensive datasets to refine question difficulty levels and ensure alignment withpredefined learning outcomes, thereby improving the validity of assessments.Furthermore, AI-driven adaptive testing dynamically adjusts content based on real-time student responses, enabling personalized evaluations that more preciselymeasure individual competencies (Zhai et al., 2021) Collectively, these featuresposition AI-enhanced assessments as transformative tools for advancingeducational measurement practices

At Nguyen Trai High School, where large class sizes and varying studentproficiency levels present challenges, integrating AI into test design and evaluation

Trang 4

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 5

1.3 Research Subjects

This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices

1.4 Research methods -Scope of the study

This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance

This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students

The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations

1.5 Innovations of the Initiative

This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized

AI training protocols aligned with national exam formats and Bloom’s Taxonomy

It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models

Trang 6

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 7

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Trang 8

for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.

2.3 Solutions and implementation

2.3.1 Basic prompt engineering

Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results

2.3.1.1 Assign a role.

Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model

to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility

You are an expert from the Ministry of Education and Training, specializing

in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment

Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing

AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when

Trang 9

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 10

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 11

generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.

2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design

AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity

Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport

2.3.1.4 Enhancing clarity and usability through output formatting prompts

One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses

Trang 12

1.3 Research Subjects

This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices

1.4 Research methods -Scope of the study

This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance

This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students

The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations

1.5 Innovations of the Initiative

This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized

AI training protocols aligned with national exam formats and Bloom’s Taxonomy

It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models

Trang 13

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Trang 14

generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.

2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design

AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity

Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport

2.3.1.4 Enhancing clarity and usability through output formatting prompts

One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses

Trang 15

Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.

Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks

Limited customization and quality control

Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2

time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives

Ad Hoc experimentation with AI tools

A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments

Consequences of current practices

The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors

in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need

Trang 16

generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.

2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design

AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity

Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport

2.3.1.4 Enhancing clarity and usability through output formatting prompts

One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses

Trang 17

Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.

Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks

Limited customization and quality control

Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2

time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives

Ad Hoc experimentation with AI tools

A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments

Consequences of current practices

The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors

in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need

Trang 18

for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.

2.3 Solutions and implementation

2.3.1 Basic prompt engineering

Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results

2.3.1.1 Assign a role.

Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model

to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility

You are an expert from the Ministry of Education and Training, specializing

in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment

Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing

AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when

Trang 19

Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.

Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks

Limited customization and quality control

Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2

time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives

Ad Hoc experimentation with AI tools

A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments

Consequences of current practices

The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors

in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need

Trang 20

1.3 Research Subjects

This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices

1.4 Research methods -Scope of the study

This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance

This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students

The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations

1.5 Innovations of the Initiative

This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized

AI training protocols aligned with national exam formats and Bloom’s Taxonomy

It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models

Trang 21

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 22

Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.

Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks

Limited customization and quality control

Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2

time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives

Ad Hoc experimentation with AI tools

A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments

Consequences of current practices

The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors

in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need

Trang 23

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Trang 24

for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.

2.3 Solutions and implementation

2.3.1 Basic prompt engineering

Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results

2.3.1.1 Assign a role.

Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model

to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility

You are an expert from the Ministry of Education and Training, specializing

in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment

Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing

AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when

Trang 25

Academic inconsistencies: Borrowed content may not align with theschool’s curriculum, leading to mismatches in difficulty levels, lexical scope, orgrammatical focus.

Security vulnerabilities: Students can exploit online availability of reusedquestions, searching for answers in advance or sharing them across peer networks

Limited customization and quality control

Even when teachers modify sourced materials, the process remains consuming and prone to errors, such as unintentional duplication of flawed oroutdated questions, inadequate differentiation between proficiency levels (e.g., A2

time-vs B1 CEFR benchmarks), contextual misalignment with Nguyen Trai HighSchool’s learning objectives

Ad Hoc experimentation with AI tools

A small number of teachers have begun experimenting with AI-generatedquestions using tools such as ChatGPT or Quizizz AI However, initialimplementations have exposed several notable limitations First, the accuracy of AIoutputs remains inconsistent, with some questions exhibiting linguisticallyunnatural phrasing or grammatical errors Second, there is a misalignment betweenthe generated content and the actual curriculum, as AI tools often fail to accountfor the school’s syllabus or localized pedagogical objectives Finally, an over-reliance on AI-generated material without proper human verification can lead toquestionable validity, particularly in the context of high-stakes assessments

Consequences of current practices

The reliance on outdated and inefficient assessment methods has led to threecritical repercussions First, reduced assessment validity persists because testsfrequently fail to align with intended learning outcomes, resulting in evaluationsthat inaccurately measure student proficiency Second, increased workload burdensteachers, who must dedicate excessive time to manually identify and correct errors

in externally sourced or AI-generated materials, diverting energy frominstructional innovation Finally, academic integrity is compromised, as studentsexploit recycled or easily accessible online content, undermining the credibility ofassessments and fostering a culture of shortcuts over genuine learning.Collectively, these issues highlight systemic flaws that hinder educational quality,teacher efficacy, and student accountability, necessitating urgent reforms to restorerigor and fairness in evaluation practices.This context underscores the urgent need

Trang 26

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Trang 27

that are not only relevant in content but also organized in a way that aligns withclassroom needs, lesson objectives, or assessment criteria.

The benefits of this technique are particularly notable in educationalcontexts where structured information enhances student comprehension, supportscomparative analysis, or facilitates easier integration into teaching materials Forexample, requesting vocabulary to be listed in a table with columns for word,meaning, and example sentence helps students visualize and internalizeinformation more effectively Moreover, output formatting promotes consistencyacross different prompts, making it easier for teachers to review and adapt AI-generated content Overall, this technique increases the practicality of AI tools ineducation by improving the readability, accessibility, and pedagogical value of thegenerated outputs

2.3.1.5 Enhancing English test design through step-by-step prompting

In the context of improving English test design with the support of artificialintelligence, the use of step-by-step prompting offers significant advantages interms of clarity, precision, and pedagogical value This technique involvesinstructing the AI to generate responses or perform tasks by breaking them downinto clear, logical steps rather than providing a complete answer all at once Whenapplied to the process of creating English test questions, such as erroridentification, sentence transformation, or reading comprehension, this methodhelps ensure that each component of the task is addressed systematically andaccurately

By guiding the AI to explain its reasoning or construction process step bystep, educators can more easily monitor the appropriateness of the language, thealignment with curriculum standards, and the cognitive level of each question Forexample, when designing a sentence transformation item, prompting the AI to firstanalyze the grammatical structure, then identify the transformation rule, and finallyconstruct the answer ensures a more valid and instructional output Thisincremental approach not only minimizes errors but also makes the test generationprocess more transparent and customizable Ultimately, step-by-step promptingempowers teachers to harness AI effectively in designing high-quality Englishassessments that better support student learning and exam readiness

Trang 28

generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.

2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design

AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity

Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport

2.3.1.4 Enhancing clarity and usability through output formatting prompts

One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses

Trang 29

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Trang 30

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 31

for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.

2.3 Solutions and implementation

2.3.1 Basic prompt engineering

Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results

2.3.1.1 Assign a role.

Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model

to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility

You are an expert from the Ministry of Education and Training, specializing

in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment

Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing

AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when

Trang 32

that are not only relevant in content but also organized in a way that aligns withclassroom needs, lesson objectives, or assessment criteria.

The benefits of this technique are particularly notable in educationalcontexts where structured information enhances student comprehension, supportscomparative analysis, or facilitates easier integration into teaching materials Forexample, requesting vocabulary to be listed in a table with columns for word,meaning, and example sentence helps students visualize and internalizeinformation more effectively Moreover, output formatting promotes consistencyacross different prompts, making it easier for teachers to review and adapt AI-generated content Overall, this technique increases the practicality of AI tools ineducation by improving the readability, accessibility, and pedagogical value of thegenerated outputs

2.3.1.5 Enhancing English test design through step-by-step prompting

In the context of improving English test design with the support of artificialintelligence, the use of step-by-step prompting offers significant advantages interms of clarity, precision, and pedagogical value This technique involvesinstructing the AI to generate responses or perform tasks by breaking them downinto clear, logical steps rather than providing a complete answer all at once Whenapplied to the process of creating English test questions, such as erroridentification, sentence transformation, or reading comprehension, this methodhelps ensure that each component of the task is addressed systematically andaccurately

By guiding the AI to explain its reasoning or construction process step bystep, educators can more easily monitor the appropriateness of the language, thealignment with curriculum standards, and the cognitive level of each question Forexample, when designing a sentence transformation item, prompting the AI to firstanalyze the grammatical structure, then identify the transformation rule, and finallyconstruct the answer ensures a more valid and instructional output Thisincremental approach not only minimizes errors but also makes the test generationprocess more transparent and customizable Ultimately, step-by-step promptingempowers teachers to harness AI effectively in designing high-quality Englishassessments that better support student learning and exam readiness

Trang 33

generating a reading comprehension test, teachers can input a scanned image ortext-based example of a previous exam that includes a passage, five multiple-choice questions, and an answer key formatted according to Nguyen Trai HighSchool’s standards The AI then analyzes these exemplars to infer structuralpatterns, lexical complexity, and question types, ensuring generated contentadheres to institutional expectations.

2.3.1.3 Applying iterative prompting and task decomposition in assisted teaching and content design

AI-In the process of effectively utilizing large language models such asChatGPT for teaching and educational content development, the technique ofiterative prompting plays a critical role in optimizing output quality This methodinvolves providing an initial prompt, then continuously refining, clarifying, orexpanding the prompt based on the model’s responses, gradually guiding it towardthe desired result Iterative prompting allows educators to better control thecontent, tone, and accuracy of the AI-generated output, while also leveraging themodel’s capacity for reasoning and creativity

Moreover, an important principle when working with AI is thedecomposition of complex tasks into simpler, more manageable steps Languagemodels tend to produce more accurate and coherent responses when instructionsare clear, specific, and broken down into smaller components, rather than beingpresented with a broad or overly complex task This approach not only minimizeserrors but also enables teachers to monitor and adjust the AI’s performance at eachstage When combined, task decomposition and iterative prompting create aneffective interaction loop that enhances the practical use of AI in education,particularly in lesson planning, exam preparation, and individualized studentsupport

2.3.1.4 Enhancing clarity and usability through output formatting prompts

One of the most effective techniques for improving the clarity, consistency,and usability of AI-generated content is the use of output formatting prompts Thisapproach involves explicitly instructing the AI model to present its response in aparticular structure or layout, such as a list, table, bullet points, or standardizedparagraph format, depending on the intended educational use By specifying thedesired format in the prompt, educators can guide the model to produce responses

Trang 34

could significantly improve assessment quality while optimizing teachers’workload Thus, exploring AI applications in ELT assessments is not merely atechnological advancement but a pedagogical necessity to foster more valid,reliable, and equitable evaluation practices.

1.2 Research objectives:

This study aims to explore the potential of advanced AI tools, specificallyDeepSeek, ChatGPT, and Gemini 2.5 Flash,to enhance the quality, validity, andreliability of English language assessments at Nguyen Trai High School Theresearch objectives are threefold:

First, to investigate how DeepSeek, a domain-specific AI model optimizedfor educational data analysis, can streamline test creation by automating questiongeneration, and detecting biases or inconsistencies in existing assessments Byleveraging its deep learning algorithms, the study will evaluate its capacity toproduce contextually relevant reading passages, grammar exercises, andvocabulary tasks tailored to the school’s curriculum

Second, to assess the utility of ChatGPT in diversifying test formats whilemaintaining pedagogical rigor This objective focuses on the tool’s ability togenerate open-ended questions, situational dialogues, and critical thinking promptsthat reflect real-world language use Additionally, the study will examineChatGPT’s role in refining teacher-generated questions by rephrasing ambiguousitems, and adjusting difficulty levels

Third, to analyze the efficiency of Gemini 2.5 Flash The research will testits capacity to rapidly generate multiple unique test variants, and adapt questiondifficulty dynamically This objective also evaluates Gemini’s potential to reducecheating risks through algorithmic randomization and real-time plagiarismdetection

Fourth, to develop and share practical strategies for integrating AI into testcreation and evaluation, with a focus on prompt engineering techniques that enableteachers to generate exam-aligned content efficiently This includes designingstandardized command templates for AI tools (DeepSeek, ChatGPT, Gemini 2.5Flash) to produce specific question types required in Nguyen Trai High School’scentralized exams

Trang 35

1.3 Research Subjects

This study focuses on Grade 10 to Grade 12 students at Nguyen Trai HighSchool and the English teachers responsible for designing and administeringassessments It also includes trainee teachers from Hồng Đức Universityparticipating in AI-assisted test creation under supervision The research examineshow these stakeholders engage with AI tools to enhance the quality, fairness, andefficiency of English language testing practices

1.4 Research methods -Scope of the study

This research employs a combination of qualitative and applied researchmethods Data were collected through classroom observations, teacher interviews,AI-generated test analyses, and pilot implementations across multiple classes.Feedback from teachers and students was also gathered to assess practicaleffectiveness and pedagogical relevance

This research focuses exclusively on English language assessments atNguyen Trai High School, targeting the design and evaluation of tests for Periodiccentralized exams (Midterm I, Final I, Midterm II, Final II), 12th-grade benchmarktests, English Olympiads for gifted students

The study examines the application of AI tools (DeepSeek, ChatGPT,Gemini 2.5 Flash) to generate, refine, and validate assessment materials for Grades10–12, adhering to the school’s curriculum framework and Vietnam’s nationalEnglish education standards It does not extend to other subjects or non-academicevaluations

1.5 Innovations of the Initiative

This initiative introduces a systematic approach to AI-assisted test design byapplying advanced prompt engineering techniques across multiple AI platforms(DeepSeek, ChatGPT, Gemini 2.5 Flash) Unlike previous practices that relied onstatic test banks or ad hoc content generation, the initiative develops standardized

AI training protocols aligned with national exam formats and Bloom’s Taxonomy

It is also among the first at the high school level in Vietnam to document areplicable process for generating full-length, curriculum-aligned English testsusing free AI tools, while integrating real-world classroom feedback and teachertraining This dual emphasis on innovation and scalability sets the initiative apartfrom traditional test design models

Trang 36

for a systematic, AI-enhanced approach to test design, one that balances efficiencywith precision while adhering to curricular and ethical standards.

2.3 Solutions and implementation

2.3.1 Basic prompt engineering

Through practical experience in developing exam materials with AIassistance, I have identified several effective techniques that can be universallyapplied across DeepSeek, ChatGPT, and Gemini to produce consistent, high-quality results

2.3.1.1 Assign a role.

Assigning a role to an AI language model such as Deepseek, ChatGPT is aprompt engineering technique that enhances the relevance, tone, and contextualappropriateness of its responses By explicitly specifying a role,such as "a highschool English teacher" or "an academic writing tutor", users can guide the model

to adopt the expected voice, knowledge scope, and communicative register alignedwith that persona This approach mirrors principles in discourse analysis andpragmatics, wherein the assumed identity of a speaker influences language choices,formality, and information structure In educational contexts, assigning roles isparticularly effective for generating content that matches the cognitive andlinguistic level of target learners It also reduces ambiguity, enabling the AI toalign its output with the goals and expectations of the user, thereby improving bothaccuracy and pedagogical utility

You are an expert from the Ministry of Education and Training, specializing

in designing high-stakes national English exams for upper secondary students in Vietnam ); "You are an English exam designer working for the Ministry of Education and Training….”; "You are an experienced English teacher who has been preparing students for the national high school graduation exam for over 20 years….

2.3.1.2 Few-shot prompting: Leveraging exemplars for contextual alignment

Few-shot prompting is a critical technique in refining AI-generatedassessment content, particularly when precision in format, difficulty, andalignment with curricular standards is required This approach involves providing

AI models with concrete examples (e.g., past exam papers, sample questions, orannotated answers) to establish a clear reference framework For instance, when

Trang 37

2 CONTENT 2.1 Theoretical background

The integration of Artificial Intelligence (AI) into educational assessment isgrounded in constructivist and socio-cognitive theories, which emphasize learner-centered approaches and the alignment of evaluation with cognitive development.Bloom’s Taxonomy (1956) provides a hierarchical framework for categorizingeducational objectives—from knowledge recall to higher-order thinking skills such

as application, analysis, and evaluation AI-enhanced assessment tools cansystematically generate and align test items with these cognitive domains,promoting both validity and differentiation Furthermore, the application of promptengineering is rooted in human-computer interaction (HCI) and natural languageprocessing (NLP) theory, enabling educators to effectively guide AI systems toproduce contextually relevant and pedagogically sound outputs National policies,such as Vietnam’s Resolution No 57-NQ/TW (2024) and MOET’s Directive4324/BGDĐT-CNTT (2024), reaffirm the strategic role of AI and digitaltransformation in education, urging schools to harness technology as a means ofinnovation, quality improvement, and sustainable development

2.2 Current situation

At Nguyen Trai High School, with its 32 classes and over 1,400 students, theEnglish Group faces significant challenges in designing and administeringassessments The current testing framework includes Periodic centralizedexaminations (Midterm I, Final I, Midterm II, and Final II); three benchmark testsfor 12th-grade students; three annual English Olympiads for gifted students

Given the scale of assessment demands, the Group of English (comprisingeight teachers) primarily relies on traditional test construction methods, whichpresent several critical limitations as follows:

Heavy dependence on external sources

Test materials are often compiled by copying and pasting sections fromexisting exams sourced from other schools or online teaching repositories Whilethis approach reduces preparation time, it introduces multiple issues:

Copyright infringement risks: Unauthorized use of published materials mayviolate intellectual property laws

Ngày đăng: 20/06/2025, 13:25

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w