1. Trang chủ
  2. » Luận Văn - Báo Cáo

Augmenting the computational and reasoning proficiencies of large language models for tackling vietnamese high school ma

64 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Augmenting the computational and reasoning proficiencies of large language models for tackling Vietnamese high school mathematical problems
Tác giả Nguyen Thao Vi
Người hướng dẫn Dr. Nguyen Doan Dong
Trường học Vietnam National University, Hanoi International School
Chuyên ngành Management Information Systems
Thể loại Graduation project
Năm xuất bản 2024
Thành phố Hanoi
Định dạng
Số trang 64
Dung lượng 1,38 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • CHAPTER 1 INTRODUCTION (9)
  • CHAPTER 2 LITERATURE REVIEW (12)
    • 2.1. Overview of large language models (12)
    • 2.2. LLMs performance assessment on math (14)
    • 2.3. Challenges of solving Vietnamese high school mathematics with LLM (15)
    • 2.4. Previous research on computational problem-solving and reasoning in (17)
  • CHAPTER 3 METHODOLOGY (19)
    • 3.1. Reasoning ability improvement via retrieval of similar math problems (21)
      • 3.1.1. Rationale for topic classification (21)
      • 3.1.2. Sample problems dataset preparation (22)
      • 3.1.3. Model selection for topic classification (28)
      • 3.1.4. Similar math problem retrieval (36)
    • 3.2. Implementation for enhanced computation (38)
  • CHAPTER 4 EXPERIMENT ANALYSIS (41)
    • 4.1. Experiment Setup (41)
    • 4.2. Evaluation Result (42)
  • CHAPTER 5 DISCUSSION (46)
    • 5.1. Implementation of the study (46)
    • 5.2. Limitations and Future directions (50)
  • CHAPTER 6 CONCLUSION (55)

Nội dung

Augmenting the computational and reasoning proficiencies of large language models for tackling vietnamese high school ma Nâng cao năng lực tính toán và suy luận của các mô hình ngôn ngữ lớn để giải quyết vấn đề trung học phổ thông ở Việt Nam

INTRODUCTION

Large language models (LLMs) have shown impressive abilities in text processing and generation, paving the way for their use in mathematical problem-solving Recent research highlights their potential in addressing various mathematical tasks, such as basic arithmetic, geometric problems, generating proofs for simple theorems, and translating natural language into symbolic expressions Nonetheless, LLMs face challenges when tackling complex mathematical problems, especially those found in higher education, due to several limiting factors.

One significant challenge in high school mathematics education in Vietnam is its inherent complexity, covering a wide array of topics such as Arithmetic, Algebra, Trigonometry, Statistics, Probability, Calculus, and Geometry, each with varying difficulty levels This diversity demands that language learning models (LLMs) possess a profound understanding of each mathematical concept and the capability to apply this knowledge to solve specific problems Furthermore, many Vietnamese math problems are intricate and require a series of problem-solving steps along with clear explanations, often building on knowledge from previous classes, which can make it difficult for LLMs to generate accurate and systematic responses.

Previous research has identified significant limitations in the effectiveness of existing large language models (LLMs) when applied to Vietnamese high school mathematics problems A comparative study evaluated various LLMs on questions from the Vietnamese National High School Graduation Exam, revealing that, despite some promising results, the overall accuracy and explainability of the LLM solutions were inadequate This underscores the necessity for innovative strategies tailored to the unique challenges of integrating LLMs into the specific context of Vietnamese high school mathematics.

The main goal of this research is to enhance the computational and reasoning abilities of large language models (LLMs) for solving Vietnamese high school mathematics problems This involves using topic classification methods to identify the mathematical domain of questions, retrieving relevant problems from a database, and extracting the appropriate answers The study also aims to implement in-context learning to improve model performance and integrate code-generation techniques for accurate and efficient problem-solving, while offering detailed explanations for the results obtained.

This research highlights the potential of Large Language Models (LLMs) to transform education by facilitating personalized and interactive learning experiences By integrating LLMs into educational systems, intelligent tutoring solutions can be created that cater to individual student needs, offering customized guidance and detailed explanations for complex problems This tailored approach not only boosts student engagement and understanding but also streamlines educators' workloads, ultimately enhancing the quality of instruction.

This study will focus on Vietnamese high school mathematics for grades 10,

The Vietnamese high school curriculum includes a wide array of subjects, with a focus on mathematics To enhance the proposed system, a carefully selected dataset of math problems and solutions from Vietnamese high schools will be employed for training and evaluation purposes.

This research presents a groundbreaking approach by integrating reasoning enhancement techniques with case retrieval and code-based computational power enhancement While previous studies, including Meta-CoT for reasoning enhancement and Mathprompter for computational power enhancement, have explored similar methods, this study is the first to combine both techniques specifically for Vietnamese high school mathematics This innovative solution addresses the unique challenges faced in this educational context.

11 was developed as part of this research The data necessary for the study was collected, and the proposed methodologies were tested using this data

The research utilized a meticulously curated dataset of Vietnamese high school math problems and solutions, providing a robust foundation for training and evaluating the proposed system This dataset serves as a crucial resource for future studies in the field and establishes a benchmark for assessing the performance of large language models in Vietnamese high school mathematics.

This research successfully tackles the challenges faced by large language models in addressing complex mathematical problems Key contributions include the integration of reasoning enhancement and improved code-based computational power, along with a customized solution for Vietnamese high school mathematics The commitment to data collection and testing highlights a dedicated effort to cater to the specific needs of Vietnamese students and educators.

LITERATURE REVIEW

Overview of large language models

Large Language Models (LLMs) represent a significant breakthrough in artificial intelligence, fundamentally changing how we process and generate human language These advanced models are designed to replicate natural human communication patterns using neural networks, which are complex systems inspired by the human brain's structure With billions of parameters, LLMs can manage extensive information efficiently The key to their effectiveness lies in the Transformer architecture, which allows LLMs to understand contextual relationships and dependencies in text By utilizing self-attention mechanisms, this architecture enables LLMs to process language effectively, model intricate patterns, and produce coherent, contextually relevant responses, thereby transforming language understanding and generation.

In recent years, large language models (LLMs) like GPT-4, GPT-3.5, Bard, and LLaMA have garnered significant attention for their impressive capabilities in natural language processing (NLP) These models excel in various language-related tasks, particularly in question answering, where they provide accurate and insightful responses based on contextual understanding Additionally, they demonstrate remarkable skills in text summarization, effectively condensing lengthy content into concise summaries while retaining essential details Their performance across diverse NLP challenges highlights their sophisticated abilities and transformative potential in the field.

Large Language Models (LLMs) excel in understanding relationships and logical connections between sentences, making them invaluable for tasks like sentiment analysis and information retrieval, where precise textual entailment is essential Their ability to generate human-like text results in coherent and contextually appropriate narratives, often indistinguishable from those crafted by human authors, which enhances the authenticity and fluency of their outputs The rise of these advanced LLMs marks a transformative era in language processing, setting new benchmarks for understanding, generating, and manipulating human language As researchers and practitioners leverage these models, it is clear that we can anticipate even more remarkable advancements in language intelligence in the future.

LLMs possess capabilities that extend beyond natural language processing, enabling them to handle various forms of structured and symbolic data across multiple domains In mathematics, these models excel at solving complex problems and intricate equations They also demonstrate impressive deductive reasoning skills in logic, allowing for logical conclusions based on given premises, which enhances automated reasoning Moreover, LLMs have a strong understanding of programming code, making them essential tools in software development and automation.

The versatility and adaptability of large language models (LLMs) make them invaluable tools with significant potential across diverse applications beyond natural language processing These models empower researchers and practitioners in various fields to leverage their capabilities for improved problem-solving and intelligent data management.

LLMs performance assessment on math

Mathematics presents a significant challenge for large language models (LLMs) due to the necessity of comprehending natural language and executing precise calculations based on mathematical principles Research has assessed the mathematical capabilities of various LLMs across tasks such as arithmetic, math word problems, and theorem proving Among these models, GPT-4 stands out for its exceptional proficiency, outperforming others like ChatGPT and Minerva on datasets including GSM8K, MATH, and MMMLU-STEM Additionally, GPT-4 has shown superior performance against InstructGPT, Galactica, and LlaMA on the MATH 401 arithmetic dataset and has excelled in solving the CMATH dataset, surpassing models like Chinese-Alpaca and Moss Furthermore, it has achieved higher scores than Meta AI's Llama 2 in multiple evaluations, including MMLU and HellaSwag, and outperformed Google's Bard in a math dataset study GPT-3.5-turbo serves as an intermediary between GPT-3 and GPT-4, designed to enhance speed and reduce operational costs A study comparing GPT-3.5-turbo and GPT-4 on Brazilian University Admission Exams revealed that GPT-3.5-turbo can achieve results comparable to zero-shot GPT-4 using specific techniques Consequently, this report focuses on the capabilities of GPT-4 and GPT-3.5-turbo.

Recent studies indicate that large language models (LLMs) can deliver remarkable performance on certain tasks when given suitable prompts or instructions However, they also encounter notable limitations and challenges in other areas, particularly in complex scenarios.

Despite its impressive capabilities, GPT-4 faces challenges when tackling complex, multi-step, or abstract problems, particularly in areas like Intermediate Algebra and Precalculus It often struggles to devise or select appropriate plans, leading to minor errors during execution Additionally, problems involving special symbols or notations, such as fractions, radicals, exponents, or LaTeX expressions, can be difficult for the model due to the need for careful formatting and parsing Moreover, GPT-4's performance is hindered by ambiguous or incomplete problems, as it relies heavily on the explicit information provided in the problem statement and supporting context Consequently, vague or unclear problems may result in incorrect or meaningless answers, underscoring the model's dependence on explicit information.

Understanding the specific contexts in which large language models (LLMs) operate optimally is crucial due to their inherent limitations Although LLMs exhibit impressive capabilities, challenges arise when addressing complex reasoning tasks, specialized symbols, or ambiguous and incomplete information By recognizing these factors, users can effectively harness the strengths of LLMs while mitigating their limitations, ensuring accurate and reliable outcomes.

Challenges of solving Vietnamese high school mathematics with LLM

Previous studies have identified several challenges and difficulties that LLMs encounter when solving mathematics problems, which can be categorized into two

Mathematical calculation and reasoning are crucial components of effective problem-solving, encompassing basic operations like addition and multiplication, as well as the application of logical principles to tackle complex issues in algebra, geometry, and calculus While these skills are interconnected, large language models (LLMs) often struggle with intricate mathematical structures, leading to inefficient solutions and errors such as inconsistent embeddings and inaccurate computations Furthermore, research indicates that LLM performance can vary significantly across different languages, highlighting the need for improvements in their mathematical capabilities.

The Vietnamese high school math curriculum is known for its complexity, emphasizing theoretical concepts that require students to engage in abstract reasoning and problem-solving This educational approach fosters a problem-solving mindset rather than rote memorization, demanding strong logical reasoning and the practical application of math in real-world situations Students are encouraged to develop critical thinking skills, analyze problems, and create innovative solutions These elements present significant challenges for LLMs when attempting to solve Vietnamese high school math problems.

This research aims to improve the performance of Large Language Models (LLMs) by utilizing innovative methods and techniques Key strategies include fine-tuning models on relevant datasets, integrating domain-specific knowledge, and developing novel algorithms designed to tackle unique challenges in specific areas.

The research addresses 17 challenges in Vietnamese mathematics to enhance the proficiency of language models (LLMs) By focusing on improving these models, the aim is to enable them to solve Vietnamese high school math problems with greater accuracy, efficiency, and understanding The findings will offer valuable insights and support for students, educators, and researchers navigating this complex area.

Previous research on computational problem-solving and reasoning in

To tackle the complexities of computational problem-solving and reasoning in mathematics, various methods have been proposed, including pre-training, fine-tuning, instruction learning, tool-based methods, and chain-of-thought techniques Pre-training involves training large language models (LLMs) on extensive mathematical datasets, such as MathBERT and MathGLM, to bolster their mathematical capabilities Fine-tuning is aimed at customizing LLMs for specific mathematical tasks, improving their performance in targeted areas like MetaMath and Goat Instruction learning provides LLMs with natural language prompts to guide output generation, exemplified by tools like MathPrompter and WizardMath This approach often employs In-context Learning (ICL) and advanced Sample Retrieval (SR) methods, enabling LLMs to tackle new tasks with example-driven input, as demonstrated in Meta-CoT and CoT-Influx Tool-based methods enhance LLMs by integrating them with external programs for mathematical operations, with examples including SymbLLM and Codex-math Finally, chain-of-thought methods allow LLMs to articulate intermediate steps in problem-solving, improving their reasoning strategies.

CoT, ICL, and Tool Employing (TE) methods offer significant advantages over traditional fine-tuning and pre-training approaches in enhancing the performance of large language models (LLMs) for mathematical tasks While fine-tuning can improve performance, it is limited by its dependence on task-specific data and the risk of catastrophic forgetting Pre-training, although beneficial, demands substantial data and computational resources, often failing to address the complexities of specific mathematical problem-solving In contrast, CoT, ICL, and TE require fewer resources, provide greater flexibility across various tasks, and enhance interpretability and user control Given the complexity of high school mathematics, the integration of CoT, ICL, and TE represents a promising strategy for improving LLM performance in this field.

METHODOLOGY

Reasoning ability improvement via retrieval of similar math problems

The proposed method initiates with a critical classification step that predicts the relevant curriculum topic for each math problem This classification considers the student's grade level, including 10th, 11th, or 12th grade, ensuring precise alignment with educational standards.

The proposed methodology enhances computational capabilities through topic classification, similar problem retrieval, and code generation By extracting answers from similar problems, ICL integrates this information into prompts The code generation process encourages the LLM to create executable code, which is run via an external interpreter Subsequently, the LLM offers an explanation of the results, thereby improving both interpretability and explanatory power.

22 categorize the problem The primary objective of this categorization is to narrow down the scope and improve the accuracy of the search results

Categorizing math problems by grade levels acknowledges the unique knowledge and skills required at each educational stage For example, 10th-grade math problems involve different concepts than those for 12th graders This approach is essential for ensuring that searches for similar examples are relevant and reliable within the appropriate grade context.

Systematically organizing math problems by grade levels and topics enhances efficiency and effectiveness in problem-solving This method simplifies the identification process, significantly decreasing the time and resources needed to locate relevant samples that support reasoning By focusing on specific grade levels, it reduces the likelihood of encountering unrelated examples, thereby optimizing the overall problem-solving experience.

A systematic organization of problem-solving approaches enhances efficiency and effectiveness by aligning sample problems with the appropriate grade level and topic This alignment ensures that the reasoning and solution strategies are tailored to each student's knowledge and understanding, optimizing the learning experience As a result, students benefit from increased accuracy and a more focused, impactful educational journey.

To ensure the provision of the most suitable sample demonstrations for guiding the LLM, it is essential to begin by gathering a high-quality dataset For this

A web crawl of timdapan.com, a reputable online platform for math problem solutions, was conducted to enhance reliability by collecting data from the Ministry of Education's textbooks Three separate datasets were compiled for Grade 10, Grade 11, and Grade 12 math questions and answers Each dataset includes three fields: Topic, Question, and Answer, with questions featuring words, mathematical expressions, and formulas in LaTeX format.

Table 3-1: Math QA datasets overview

Dataset name Dataset size Number of topics

1 Inequalities and Inequations ("Bất đẳng thức, bất phương trình")

2 Trigonometric Ratios and Trigonometric Formulas ("Cung và góc lượng giác, công thức lượng giác")

3 Linear and Quadratic Functions ("Hàm số bậc nhất và bậc hai")

4 Set Theory ("Mệnh đề tập hợp")

5 Coordinate Method in the Plane ("Phương pháp tọa độ trong mặt phẳng")

6 Equations and Systems of Equations ("Phương trình, hệ phương trình")

8 Dot Product of Two Vectors and its Applications ("Tích vô hướng của hai vectơ và ứng dụng")

1 Sequences, Arithmetic Progressions, and Geometric Progressions ("Dãy số, cấp số cộng và cấp số nhân")

3 Trigonometric Functions and Trigonometric Equations ("Hàm số lượng giác và phương trình lượng giác")

4 Transformations and Similarity in the Plane ("Phép dời hình và phép đồng dạng trong mặt phẳng")

5 Permutations, Combinations, and Probability ("Tổ hợp xác suất")

6 Vectors in Space, Orthogonal Relationships in Space ("Vectơ trong không gian, quan hệ vuông góc trong không gian")

7 Lines and Planes in Space, Parallel Relationships ("Đường thẳng và mặt phẳng trong không gian quan hệ song song")

1 Exponential Functions, Exponential Equations, and Logarithmic Functions ("Hàm số lũy thừa hàm số mũ và hàm số lôgarit")

3 Cone, Cylinder, and Sphere ("Mặt nón mặt trụ mặt cầu")

4 Indefinite Integral, Definite Integral, and Applications ("Nguyên hàm, tích phân và ứng dụng")

5 Coordinate Method in Space ("Phương pháp tọa độ trong không gian")

7 Applications of Derivatives in Function Analysis and Graphing ("Ứng dụng đạo hàm để khảo sát và vẽ đồ thị của hàm số")

During the data preprocessing phase, essential operations were conducted to enhance dataset quality and consistency Initially, records with null or duplicate values were eliminated to maintain data integrity Furthermore, issues related to the "Ôn tập chương" topics were identified for further analysis.

25 categorized into the corresponding main topic This step was crucial for organizing the dataset and facilitating effective training and evaluation

Figure 3-3: Topic Distribution for Grade10_QA Most 10th grade problems are from the topic “Bất đẳng thức, bất phương trình” while “Thống kê” has the fewest problems

Figure 3-4: Topic Distribution for Grade11_QA 11th grade problems set does not suffer from severe imbalance

Tokenizing and vectorizing text is essential prior to training, as it transforms text into numerical representations and extracts significant features This process allows machine learning models to effectively process and learn from textual data.

Tokenization is a crucial process that involves breaking down text into smaller units, such as words or subwords, to enable effective analysis In this study, the ViTokenizer library from pyvi, a specialized Python library for processing Vietnamese text, was employed ViTokenizer utilizes a Conditional Random Field (CRF) model for tokenization, extracting various features for each word in the input sentence, including the word itself, its case information, and characteristics of surrounding words The CRF model, a probabilistic graphical model, estimates the conditional probability P(y|x), where x represents the sequence of feature vectors and y denotes the corresponding label sequence This model is trained to optimize the conditional probability for accurate label sequences based on the training data, allowing ViTokenizer to effectively process new sentences.

In Grade 12, the distribution of problems shows that topics related to "Exponential Functions, Logarithmic Functions, and Power Functions" are twice as prevalent as those concerning "Polyhedra."

27 uses the trained CRF model to predict the most probable label sequence 𝑦 given the

𝑥 feature vectors of the sentence It then groups the characters into words based on these labels

Vectorization is the process of converting tokenized text into numerical vectors for easier processing by machine learning algorithms The TfidfVectorizer from the scikit-learn library is utilized to train each math QA dataset individually TF-IDF (Term Frequency-Inverse Document Frequency) is a popular method that assesses the significance of words in a document relative to a broader collection The Term Frequency (TF) measures how often a term appears in a document, while the Inverse Document Frequency (IDF) evaluates the rarity of a term across all documents, giving more weight to less frequent terms The TF-IDF score for each term is calculated by multiplying its TF value by its IDF value, providing a numerical representation that reflects both local and global importance The TfidfVectorizer transforms a set of raw documents into a matrix of TF-IDF features, where each document is represented by its word frequency within the collection.

• TF (t,d) represents the term frequency of term t in document d

• IDF (t) represents the inverse document frequency of term t

The TF (t,d) value is calculated using the formula:

TF (t,d) = count of t in d number of words in d

The IDF (t) value is calculated using the formula:

• N is the total number of documents in the collection

• df (t) is the number of documents that contain the term t

Training the TfidfVectorizer on each dataset is essential for capturing the unique linguistic features and vocabulary specific to different grade levels This method effectively models the distinct patterns and relationships within math questions and answers, facilitating accurate data analysis and interpretation.

By integrating the tokenization abilities of ViTokenizer with the vectorization strengths of TfidfVectorizer, we aim to accurately capture the semantic meaning and contextual information of mathematical questions The resulting tokenized and vectorized questions will be utilized as input features for training a classification model, enhancing its ability to classify and reason through high school math problems with greater precision.

3.1.3 Model selection for topic classification

To effectively classify topics in Vietnamese high school mathematics problems, selecting the right models is crucial This section outlines the training process for datasets from grades 10, 11, and 12, utilizing various machine learning models tailored for this purpose.

The article discusses various classification models used for topic classification, including Linear Support Vector Classifier, Random Forest Classifier, Gradient Boosting Classifier, Multinomial Naive Bayes, K-Nearest Neighbors Classifier, and Multi-layer Perceptron Classifier Each of these models possesses unique characteristics, making them suitable for addressing the complexities involved in classifying topics effectively.

Implementation for enhanced computation

To optimize the computational process, the system first identifies the user input's most relevant question and retrieves its corresponding answer This information is then combined to create a CoT prompt that enhances in-context learning (ICL), enabling the LLM to understand how to resolve the problem by following a structured example By integrating both the question and answer, the LLM gains valuable context, allowing it to produce a more accurate Python code snippet tailored to the user's needs The CoT methodology facilitates a comprehensive understanding of the problem and its solution, empowering the LLM to generate code step-by-step effectively.

The prompt template for code generation is designed using OpenAI's framework, integrating a system prompt that includes the relevant topic, a sample problem, and its solution to facilitate one-shot learning This setup requests the language model to generate code for solving the specified problem In addition, the user prompt clearly defines the input math question that needs to be addressed.

After receiving a response from the Large Language Model (LLM), the extracted code snippet is processed and sent as a request to an Application Programming Interface (API), which acts as a crucial intermediary for data transfer and communication between various systems In this setup, the API is closely linked to a Docker container, a lightweight, self-contained environment tailored for application deployment and execution.

The Docker container is set up to deliver a Python 3.9 environment on a remote server, featuring a well-optimized setup that includes essential libraries for mathematical processing, such as NumPy, Pandas, SciPy, and SymPy These libraries are crucial for performing efficient mathematical computations and handling a variety of mathematical operations effectively.

When a code snippet is submitted as a string input to the API, it triggers the execution process within a Docker container The API manages the execution of the code, facilitating the interaction between the code and its designated environment After execution, the API quickly captures the output and returns it as a response.

When errors occur during code execution, the API generates and sends an error message to the user, offering essential feedback This proactive approach to error handling keeps users informed of any issues, enabling effective troubleshooting and resolution.

Utilizing Docker enhances code execution by providing a secure and reliable environment, ensuring the integrity of the process The isolation offered by Docker containers effectively separates the code, enhancing security and performance.

40 execution from other system components This isolation prevents any potential interference or conflicts that may arise, contributing to enhanced code execution reliability

Docker's scalability is crucial for managing diverse workloads, as it allows for the rapid deployment of multiple containers This capability enables the simultaneous execution of code snippets, ensuring the system can accommodate high volumes of execution requests As a result, users benefit from a responsive and reliable service experience.

The code execution process via API involves sending a code snippet from the LLM's response over the Internet to an API, which employs a Docker container on a remote server for secure and dependable execution This method ensures that code runs reliably within an isolated environment, safeguarding against interference with other system components.

EXPERIMENT ANALYSIS

Experiment Setup

To evaluate the effectiveness and accuracy of the proposed method, an experiment was conducted involving the collection of data from various online sources This process focused on three sets of final exam multiple-choice questions for grades 10, 11, and 12, which were randomly selected to ensure a representative sample Each set contained 50 questions, covering a wide range of knowledge from the academic year While the evaluation was limited to this dataset due to budget constraints, it offered a solid basis for analyzing the method's performance.

To conduct the experiment, a series of functions were developed based on the outlined processes Subsequently, the experiment proceeded by systematically testing

4 scenarios on both GPT-4 and GPT-3.5-turbo to assess the capabilities and limitations of the methods The scenarios encompassed the following:

The zero-shot approach assesses a method's effectiveness without any prior training or fine-tuning, relying solely on the models' inherent capabilities to process and answer questions.

The computation-focused approach emphasizes the exclusive use of a code interpreter for computational processing, where models are required to leverage these tools to execute essential calculations and produce precise results.

− Reasoning-focused approach: This scenario explored the application of topic classification and similar problem retrieval techniques to enhance the

The reasoning process involves categorizing questions by topic and utilizing insights from similar problems to enhance reasoning abilities, ultimately leading to more accurate responses.

The combined approach enhances previous methodologies by integrating both computational and reasoning improvement techniques This synergistic strategy leverages the strengths of computation-focused and reasoning-focused models, aiming to optimize calculation accuracy while improving the quality of reasoning.

In order to assess the models' performance, a binary grading system was utilized This grading system entailed comparing the models' responses to a pre-established

The evaluation of the system's answer generation involved establishing a "ground truth" for each question, allowing for the categorization of responses as either accurate or inaccurate This binary grading approach facilitated an objective and transparent assessment of the system's performance in delivering correct answers.

Google Colab was selected as the primary environment for consistent and standardized experimentation due to its Linux-based operating system and access to GPU resources This GPU support is especially advantageous for computationally intensive tasks, significantly improving the performance and speed of the models used in the experiments.

Evaluation Result

The evaluation results for GPT-3.5-turbo and GPT-4 models are detailed in the table below, showcasing their performance across grades 10, 11, and 12 using four distinct approaches: zero-shot, computation-focused, reasoning-focused, and combined.

In a zero-shot approach, where models are utilized without specific training, GPT-3.5-turbo achieved accuracies of 44%, 52%, and 48% for grades 10, 11, and 12, respectively In contrast, GPT-4 showed higher accuracy rates of 52%, 78%, and 64% for the same grades These findings highlight that both models have a fundamental level of understanding, but GPT-4 significantly outperformed GPT-3.5-turbo.

When employing the computation-focused approach, which relied on code- based tools for computational processing, the models' accuracy rates notably increased GPT-3.5-turbo achieved accuracies of 54%, 58%, and 56% for grades 10,

11, and 12, respectively GPT-4 exhibited even higher accuracy rates, with 74%, 84%, and 76% for the respective grades These findings suggest that the models' utilization of computational tools positively impacted their performance

The reasoning-focused approach, utilizing techniques like topic classification and similar problem retrieval, demonstrated notable accuracy improvements over the zero-shot method GPT-3.5-turbo achieved accuracies of 48%, 54%, and 50% for grades 10, 11, and 12, while GPT-4 outperformed with accuracy rates of 66%, 82%, and 68% for the same grades These findings highlight that the integration of reasoning techniques significantly enhances performance.

The combined approach that integrates both computational and reasoning improvement methods led to the highest accuracy rates among all tested models Specifically, GPT-3.5-turbo achieved accuracies of 60%, 62%, and 64% for grades 10, 11, and 12, respectively In contrast, GPT-4 showed significant enhancements, reaching accuracies of 80%, 88%, and 82% for the same grades This demonstrates that the synergistic combination of computation-focused and reasoning-focused methods yields the most accurate responses, with GPT-4 outperforming GPT-3.5-turbo.

In addition to the aforementioned points, there are other observations and evaluations made during the running and assessment process that I would like to further elaborate on:

The analysis of running time reveals a notable advantage of the GPT-3.5-turbo zero-shot approach, averaging 7.03 seconds, which is approximately 3.5 times faster than the GPT-4 model's average of 11.47 seconds This significant difference underscores the efficiency of GPT-3.5-turbo in generating prompt responses While varying methods used with GPT-3.5-turbo did not show considerable differences in running time compared to the zero-shot approach, GPT-4 exhibited running times 3 to 4 times longer for methods like Computation-focused and Reasoning-focused approaches This indicates that GPT-4 may demand more computational resources and time for processing and response generation in these contexts.

The integration of supplementary methods to enhance models significantly increases token usage When employing the zero-based technique for both GPT-3.5-turbo and GPT-4, the average token consumption is notably higher.

Incorporating additional techniques and approaches can lead to a token count that is 2 to 3 times higher than the zero-shot method This increase in token usage highlights the need for a broader linguistic context, which can significantly impact the cost and efficiency of the models.

The ability of large language models (LLMs) to generate effective code is heavily reliant on their understanding of the task at hand; if they lack the proper approach from the outset, the resulting code may be ineffective Notably, GPT-4 demonstrates superior code generation capabilities compared to GPT-3.5-turbo, producing code that aligns more closely with intended concepts and operates with fewer errors, thanks to advancements in its architecture and training However, despite these improvements, LLMs continue to struggle with problems requiring graphical representation or spatial reasoning, highlighting the inherent limitations of text-based models in addressing visually complex challenges.

DISCUSSION

Implementation of the study

A web application was developed using Flask, a lightweight and open-source web framework known for its simplicity and flexibility, to effectively demonstrate the results of the proposed method and support its practical implementation Flask's numerous advantages made it the ideal choice for this project, ensuring an optimal development experience.

In developing the web application with Flask, we effectively utilized HTML, CSS, and JavaScript to enhance the user interface and provide a smooth interactive experience This combination of technologies not only created a visually appealing design but also improved the overall usability of the application.

The web application features a user-friendly interface that allows users to easily input their questions, select their grade level (10th, 11th, or 12th), and choose between models such as GPT-4 or GPT-3.5-turbo Users can also set a similarity threshold to find the most relevant answers If the responses do not meet the threshold, the application automatically switches to the GPT-4 model combined with the CoT prompt, ensuring users receive satisfactory answers.

After users submit their queries and preferences, their inputs are carefully processed, triggering an interactive process within the application A window then appears, presenting a detailed comparison between the results produced by the combined method and the traditional zero-shot approach This side-by-side analysis offers crucial insights into the effectiveness and advantages of the combined method.

The combined method section offers users extensive supplementary information to improve understanding and support informed decision-making This includes details like the anticipated mathematical topic, the nearest matching question to the user's query, an automatically generated code snippet, and the results from executing the code By providing this comprehensive information, users gain a holistic view of the problem-solving process and can better understand the underlying mechanisms involved.

The web application efficiently manages errors that occur during code execution by automatically switching to the advanced GPT-4 model, ensuring seamless performance and enhanced reliability.

The architecture of my Math Solver web app consists of a frontend built with HTML, CSS, and JavaScript, while the backend is powered by Python Flask Users input their math questions through the frontend, which processes the queries and returns two method responses for display.

CoT prompt, thus ensuring that users receive a prompt and accurate response despite any challenges that may occur during code execution

The web application, built using Flask, HTML, CSS, and JavaScript, offers a user-friendly and visually appealing platform for engaging with the proposed methodology It allows for precise user inputs and provides in-depth insights into the problem-solving process, making it an invaluable resource for those seeking efficient and reliable assistance in resolving mathematical challenges.

The web app interface prompts users to input a properly formatted math question, select the corresponding grade level, and adjust the similarity threshold for problem retrieval as necessary.

The web application interface presents the user input alongside the solutions generated by two distinct methods: the combined approach and the zero-shot method, once the response is returned.

Limitations and Future directions

To improve Vietnamese high school students' math problem-solving skills for LLMs, it is crucial to identify and address specific inherent limitations By understanding these challenges, we can aim to develop stronger methodologies that enhance learning outcomes in the future.

Improving reasoning abilities through question referencing is limited by several factors, including the quality of the questions and sample answers used as references Identifying the type of problem accurately and locating similar input questions are essential for effective reasoning enhancement Therefore, having a comprehensive reference dataset that covers a diverse array of mathematical topics is crucial for success.

The current reference corpus exhibits an uneven distribution of mathematical topics, which may compromise the accuracy and reliability of classification results To enhance reasoning through question referencing, it is essential to create a comprehensive and diverse dataset that encompasses a wide range of mathematical domains By addressing these limitations, we can significantly improve reasoning abilities and achieve more precise and effective classifications in various areas of mathematics.

Currently, each evaluation set consists of only 50 comprehensive math questions While these questions are valuable in evaluating reasoning performance,

The limited number of evaluation questions does not adequately capture the diverse mathematical concepts and problem types present at each academic level Additionally, there is an uneven distribution of questions across different mathematical problems, leading to some types being overrepresented while others are underrepresented This imbalance restricts our ability to effectively evaluate the model's reasoning skills across the wide range of challenges that students face in their academic journey.

To overcome existing limitations in mathematics education assessment, it is essential to broaden the range of evaluation questions significantly By increasing the number of evaluation items, we can incorporate a diverse array of problem types, ensuring a balanced representation of the challenges students encounter This expansion should include various problem-solving approaches and cover all mathematical concepts taught at each academic level Furthermore, it is important to maintain an equitable distribution of evaluation questions across different problem types, facilitating a fair and comprehensive assessment of the model's reasoning capabilities and providing a more accurate understanding of its performance in various mathematical domains.

A significant limitation of the current evaluation process is its reliance on a binary grading system that solely focuses on the final answer This method fails to assess the effectiveness and accuracy of the reasoning improvement technique, as it does not capture the detailed steps taken to arrive at the solution.

The binary grading system evaluates answers solely as correct or incorrect, neglecting the reasoning and problem-solving methods behind them This oversimplified approach overlooks the complexities involved in understanding and evaluating responses.

The problem-solving process is complex, and an exclusive focus on the final answer overlooks the critical intermediate steps and logic involved This narrow perspective limits our ability to evaluate the model's reasoning capabilities comprehensively By merely examining the output without exploring the reasoning process, we hinder our ability to measure the effectiveness and accuracy of methods aimed at improving reasoning in problem-solving.

To enhance assessment accuracy, it is essential to create an evaluation framework that transcends traditional binary grading systems Implementing a nuanced grading approach that evaluates the quality of reasoning, problem-solving strategies, and intermediate steps will yield a more thorough understanding of a model's reasoning capabilities By focusing on the reasoning process instead of just the final outcome, we can more effectively assess the effectiveness and precision of methods aimed at improving reasoning and elucidating problem-solving steps.

The research faces significant limitations in the tokenization, vectorization, and retrieval processes for mathematical formulas The use of ViTokenizer from Pyvi and TfidfVectorizer from scikit-learn, while foundational, may not adequately reflect the unique characteristics and semantic relationships inherent in mathematical expressions ViTokenizer, tailored for Vietnamese natural language processing, is not fully optimized for the specific structures of mathematical notation, potentially resulting in less effective outcomes Additionally, TfidfVectorizer's assumption of independent word frequencies and cosine similarity's disregard for semantic context further hinder the accurate capture of nuanced relationships within mathematical formulas.

To address the challenges of tokenization and vectorization in mathematical problems that require context and formulas, it is crucial to investigate advanced techniques tailored for these situations.

Pratik et al propose a hybrid tokenization technique that combines word-level and character-level tokenization based on token types This approach treats linguistic elements, such as words and phrases, as single tokens, while mathematical elements, like numbers and symbols, are treated as separate tokens This method effectively captures both linguistic and mathematical components, minimizing the total number of tokens needed for problem and answer representation Additionally, a sophisticated tokenization method for math formulas involves deconstructing them into individual symbols or tokens using tree representations, such as Symbol Layout Trees or Operator Trees Each node in these trees corresponds to a token, allowing the model to capture the visual layout and semantic meaning of the formula, thus effectively representing the complex structure of mathematical expressions.

To effectively preserve and capture the semantic relationships among various mathematical entities, advanced embedding methods are essential Unlike traditional vectorization techniques that rely on statistical or frequency-based approaches, word embedding techniques provide a more nuanced representation by utilizing dense vectors derived from extensive text datasets For instance, the Tangent Combined FastText (Tangent-CFT) method converts mathematical formulas into hierarchical representations, subsequently transforming tokens into vectors via the fastText n-gram model The overall embedding of the formula is achieved by averaging these tuple embeddings When ample data is available, developing a model that reflects structural features and semantic connections between formulas and their context—similar to MathBERT but tailored for Vietnamese—should be considered.

The rising costs associated with advanced language models, particularly proprietary ones like OpenAI's GPT-4, pose significant challenges for research and adoption As these models become more sophisticated, their usage expenses are expected to increase, making GPT-4 particularly costly to access This high price tag can create barriers to entry, limiting widespread adoption and hindering the potential benefits of GPT-4 Additionally, the financial implications may restrict opportunities for research and development in the field.

To alleviate the financial strain of proprietary models, exploring open-source language models (LLMs) offers a viable solution The open-source community has created various models that deliver performance comparable to their proprietary counterparts while remaining freely accessible Utilizing these alternatives can lead to a more cost-effective approach for research Notable models like BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT approach) have gained popularity and boast strong community support These models provide essential tools for tokenization, vectorization, and mathematical formula retrieval, aligning well with the objectives of this research.

CONCLUSION

This research presents a novel method aimed at improving the computational and reasoning abilities of large language models (LLMs) for tackling Vietnamese high school mathematics problems By integrating advanced techniques such as topic classification, similar problem retrieval, code generation, and code execution, this approach significantly boosts the accuracy and effectiveness of LLM-based solutions.

The evaluation of the proposed method was performed on two advanced language models, GPT-4 and GPT-3.5-turbo, utilizing a variety of multiple-choice questions from Vietnamese high school final exams This method demonstrated a significant enhancement in the performance of both models, greatly exceeding the baseline zero-shot approach.

A web application has been created to offer users direct access to enhanced problem-solving capabilities of LLMs, allowing them to utilize this advanced technology conveniently and efficiently.

This research advances educational technology by improving the computational and reasoning skills of large language models (LLMs) and making this technology accessible through a user-friendly web application Its main objective is to enable students to effectively utilize large language tools to address high school mathematics challenges, potentially enhancing the learning experiences of students in Vietnam.

[6] Bộ Giáo dục và Đào tạo Việt Nam (2018) Chương trình sách giáo khoa toán học THPT cơ bản và nâng cao

[1] Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., Ganu, T., Segal, S., Axmed, M., Bali, K., Sitaram, S (2023) MEGA: Multilingual Evaluation of Generative AI DOI: 10.48550/arXiv.2303.12528

The PaLM 2 Technical Report, authored by Anil et al (2023), presents a comprehensive overview of advancements in natural language processing, showcasing contributions from a diverse team of researchers This report emphasizes the innovative methodologies and technologies utilized to enhance language models, highlighting the collaborative efforts that drive progress in the field The findings underscore the significance of continuous research and development in improving AI capabilities, ultimately aiming to foster better human-computer interactions For more detailed insights, refer to the report available at DOI: 10.48550/arXiv.2305.10403.

[3] Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M.D., McAleer, S., Jiang, A.Q., Deng, J., Biderman, S., Welleck, S (2023) Llemma: An Open Language

Model For Mathematics arXiv.Org Available at: https://arxiv.org/abs/2310.10631v2

[4] Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H.,

Ji, Z., Yu, T., Chung, W., Do, Q.V., Xu, Y., Fung, P (2023) A Multitask,

Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity DOI: 10.48550/arXiv.2302.04023

[5] Bhattacharya, A (2017) A Survey of Question Answering for Math and Science Problem DOI: 10.48550/arXiv.1705.04530

[7] Breiman, L (2001) Random Forests, Mach Learn., 45(1), pp 5–32 DOI: 10.1023/A:1010933404324

[8] Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H., Palangi, H., Ribeiro, M.T., Zhang,

Y (2023) Sparks of Artificial General Intelligence: Early experiments with GPT-4 DOI: 10.48550/arXiv.2303.12712

In their 2021 study, Chen et al evaluate large language models trained on code, providing insights into their capabilities and performance The research, documented in the DOI: 10.48550/arXiv.2107.03374, highlights the advancements in natural language processing and programming language understanding, emphasizing the significance of these models in enhancing coding efficiency and automating software development tasks The findings contribute to the ongoing discourse on the application of AI in programming, showcasing the potential for improved collaboration between humans and machines in coding environments.

[10] Chen, W., Ma, X., Wang, X., Cohen, W.W (2022) Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks arXiv.Org Available at: https://arxiv.org/abs/2211.12588v4

In their 2023 study, Chen et al explore the potential applications of Large Language Models (LLMs) in the context of learning on graphs The research, available on arXiv, highlights how LLMs can enhance graph-based learning methodologies, paving the way for innovative approaches in data analysis and representation For more details, refer to the full article at https://arxiv.org/abs/2307.03393v3.

[12] Cunningham, P., Delany, S (2007) k-Nearest neighbour classifiers, Mult Classif Syst, 54 DOI: 10.1145/3459665

[13] Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., Li, L., Sui, Z (2023) A Survey on In-context Learning DOI: 10.48550/arXiv.2301.00234

In their 2022 study published in the Proceedings of the National Academy of Sciences, Drori et al demonstrate that a neural network can effectively solve, explain, and generate university-level math problems through program synthesis and few-shot learning, achieving performance on par with human capabilities The research highlights the potential of artificial intelligence in educational contexts, particularly in enhancing problem-solving skills and understanding in mathematics.

[15] Gaur, V., Saunshi, N (2023) Reasoning in Large Language Models Through Symbolic Math Word Problems arXiv.Org Available at: https://arxiv.org/abs/2308.01906v1

[16] Geva, M., Gupta, A., Berant, J (2020) Injecting Numerical Reasoning Skills into Language Models., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, Association for Computational Linguistics, pp 946–958

[17] He-Yueya, J., Poesia, G., Wang, R.E., Goodman, N.D (2023) Solving Math Word Problems by Combining Language Models With Symbolic Solvers DOI: 10.48550/arXiv.2304.09102

[18] Huang, H., Tang, T., Zhang, D., Zhao, W.X., Song, T., Xia, Y., Wei, F (2023) Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting

[19] Huang, X., Zhang, L.L., Cheng, K.-T., Yang, M (2023) Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning DOI: 10.48550/arXiv.2312.08901

[20] Ilany, B.-S (2010) Language and Mathematics: Bridging between Natural Language and Mathematical Language in Solving Problems in Mathematics, Creat Educ., 01, pp 138–148 DOI: 10.4236/ce.2010.13022

In the paper "MathPrompter: Mathematical Reasoning using Large Language Models," Imani, Du, and Shrivastava (2023) explore the application of large language models in enhancing mathematical reasoning capabilities Presented at the 61st Annual Meeting of the Association for Computational Linguistics in Toronto, this research contributes to the field by demonstrating how these models can effectively tackle mathematical problems, thus bridging the gap between computational linguistics and mathematical reasoning The findings are documented in the conference proceedings, highlighting the significance of integrating advanced language models in educational and computational contexts.

[22] Johansson, M (2023) What Can Large Language Models Do for Theorem Proving and Formal Methods?, pp 391–394

[23] Lewkowycz, A., Andreassen, A., Dohan, D., Dyer, E., Michalewski, H., Ramasesh, V., Slone, A., Anil, C., Schlag, I., Gutman-Solo, T., Wu, Y., Neyshabur, B., Gur-Ari, G., Misra, V (2022) Solving Quantitative Reasoning Problems with Language Models DOI: 10.48550/arXiv.2206.14858

[24] Linear Support Vector Machine - an overview | ScienceDirect Topics (n.d.) Available at: https://www.sciencedirect.com/topics/engineering/linear-support- vector-machine

[25] Liu, B., Udell, M (2020) Impact of Accuracy on Model Interpretations DOI: 10.48550/arXiv.2011.09903

[26] Liu, T., Low, B.K.H (2023) Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

[27] Liu, W., Hu, H., Zhou, J., Ding, Y., Li, J., Zeng, J., He, M., Chen, Q., Jiang, B., Zhou, A., He, L (2023) Mathematical Language Models: A Survey

[28] López Espejel, J., Ettifouri, E.H., Yahaya Alassan, M.S., Chouham, E.M., Dahhane, W (2023) GPT-3.5, GPT-4, or BARD? Evaluating LLMs reasoning ability in zero-shot setting and performance boosting through prompts, Nat Lang Process J., 5, p 100032 DOI: 10.1016/j.nlp.2023.100032

The article by Luo et al (2023) introduces WizardMath, a novel approach that enhances mathematical reasoning capabilities in large language models through a method called Reinforced Evol-Instruct This research demonstrates significant advancements in the ability of AI to perform complex mathematical tasks, as detailed in their findings available on arXiv.

[30] Luo, Y., Yang, Z., Meng, F., Li, Y., Zhou, J., Zhang, Y (2023) An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine- tuning DOI: 10.48550/arXiv.2308.08747

[31] Maltoni, D., Ferrara, M (2023) Arithmetic with Language Models: from Memorization to Computation DOI: 10.48550/arXiv.2308.01154

In their 2022 paper presented at the SIAM International Conference on Data Mining, Mandlecha et al explore the use of hybrid tokenization techniques and datasets to effectively address mathematics and science problems through transformer models Their research contributes to the ongoing advancements in applying machine learning to educational challenges, highlighting the potential of innovative approaches in enhancing problem-solving capabilities in these fields.

[33] Mansouri, B., Rohatgi, S., Oard, D.W., Wu, J., Giles, C.L., Zanibbi, R (2019) Tangent-CFT: An Embedding Model for Mathematical Formulas., Proceedings of the

2019 ACM SIGIR International Conference on Theory of Information Retrieval, Santa Clara CA USA, ACM, pp 11–18

[34] Manyika, J., Hsiao, S (n.d.) An overview of Bard: an early experiment with generative AI

[35] Mosbach, M., Pimentel, T., Ravfogel, S., Klakow, D., Elazar, Y (2023) Few- shot Fine-tuning vs In-context Learning: A Fair Comparison and Evaluation DOI: 10.48550/arXiv.2305.16938

[36] Muffo, M., Cocco, A., Bertino, E (2023) Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition DOI: 10.48550/arXiv.2304.10977

[37] Multinomial Naive Bayes for Text Categorization Revisited | SpringerLink (n.d.) Available at: https://link.springer.com/chapter/10.1007/978-3-540-30549- 1_43

[38] Murtagh, F (1991) Multilayer perceptrons for classification and regression, Neurocomputing, 2(5), pp 183–197 DOI: 10.1016/0925-2312(91)90023-5

[39] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., Mian, A (2023) A Comprehensive Overview of Large Language Models arXiv.Org Available at: https://arxiv.org/abs/2307.06435v7

[40] Nguyen, T.-T., Trinh, T., Hằng, N., Hoang, N.-A., Tran, T., Pham, H.-H., Bui, V.-N (2020) Realistic Mathematics Education in Vietnam: Recent Policies and Practices, Int J Educ Pract., 8, pp 57–71 DOI: 10.18488/journal.61.2020.81.57.71

In their 2022 paper presented at the 15th International Conference on Natural Language Generation, Niyarepola et al explore the generation of math word problems using multilingual language models The research, published by the Association for Computational Linguistics, highlights innovative approaches to enhancing problem generation capabilities across diverse languages, contributing valuable insights to the field of natural language processing.

[42] Norberg, K., Almoubayyed, H., Fancsali, S.E., Ley, L.D., Weldon, K., Murphy, A., Ritter, S (n.d.) Rewriting Math Word Problems with Large Language Models

[43] Nunes, D., Primi, R., Pires, R., Lotufo, R., Nogueira, R (2023) Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams DOI: 10.48550/arXiv.2303.17003

[44] Ofoeda, J., Boateng, R., Effah, J (2019) Application Programming Interface (API) Research: A Review of the Past to Inform the Future, Int J Enterp Inf Syst.,

OpenAI has collaborated with a diverse team of experts, including researchers and engineers, to enhance artificial intelligence development This extensive group brings together a wealth of knowledge and skills, contributing to the advancement of AI technologies Their collective efforts focus on creating innovative solutions that push the boundaries of what AI can achieve, emphasizing collaboration and interdisciplinary approaches to tackle complex challenges in the field.

A list of researchers and scientists has been compiled, featuring notable individuals such as Li, Lim, Lin, Litwin, Lopez, Lowe, Lue, Makanju, Malfacini, Manning, Markov, Markovski, Martin, Mayer, Mayne, McGrew, McKinney, McLeavey, McMillan, McNeil, Medina, Mehta, Menick, Metz, Mishchenko, Mishkin, Monaco, Morikawa, Mossing, Mu, Murati, Murk, Mély, Nair, Nakano, Nayak, Neelakantan, Ngo, Noh, Ouyang, O’Keefe, Pachocki, Paino, Palermo, Pantuliano, Parascandolo, Parish, Parparita, Passos, Pavlov, Peng, Perelman, Peres, Petrov, Pinto, Michael, Pokorny, Pokrass, Pong, Powell, Power, Proehl, Puri, Radford, Rae, Ramesh, and others.

The GPT-4 Technical Report, authored by a comprehensive team including Raymond, C., Real, F., Rimbach, K., and many others, presents significant advancements in AI technology This report, available on arXiv.org, details the innovative features and capabilities of GPT-4, showcasing its potential applications across various fields The collaborative effort highlights the importance of interdisciplinary research in driving progress within artificial intelligence For more information, the full report can be accessed at https://arxiv.org/abs/2303.08774v4.

[46] OpenAI Platform (n.d.) Available at: https://platform.openai.com

[47] (PDF) Gradient Boosting Machines, A Tutorial (n.d.) Available at: https://www.researchgate.net/publication/259653472_Gradient_Boosting_Machines _A_Tutorial

[48] (PDF) Tokenization as the initial phase in NLP (n.d.) Available at: https://www.researchgate.net/publication/221102283_Tokenization_as_the_initial_ phase_in_NLP

[49] Peng, S., Yuan, K., Gao, L., Tang, Z (2021) MathBERT: A Pre-Trained Model for Mathematical Formula Understanding arXiv.Org Available at: https://arxiv.org/abs/2105.00377v1

Ngày đăng: 28/02/2025, 22:52

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[6] Bộ Giáo dục và Đào tạo Việt Nam. (2018). Chương trình sách giáo khoa toán học THPT cơ bản và nâng cao Sách, tạp chí
Tiêu đề: Chương trình sách giáo khoa toán học THPT cơ bản và nâng cao
Tác giả: Bộ Giáo dục và Đào tạo Việt Nam
Năm: 2018
[1] Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., Ganu, T., Segal, S., Axmed, M., Bali, K., Sitaram, S. (2023). MEGA: Multilingual Evaluation of Generative AI. DOI: 10.48550/arXiv.2303.12528 Sách, tạp chí
Tiêu đề: MEGA: Multilingual Evaluation of Generative AI
Tác giả: Ahuja, K., Diddee, H., Hada, R., Ochieng, M., Ramesh, K., Jain, P., Nambi, A., Ganu, T., Segal, S., Axmed, M., Bali, K., Sitaram, S
Năm: 2023
[3] Azerbayev, Z., Schoelkopf, H., Paster, K., Santos, M.D., McAleer, S., Jiang, A.Q., Deng, J., Biderman, S., Welleck, S. (2023). Llemma: An Open LanguageModel For Mathematics. arXiv.Org. Available at:https://arxiv.org/abs/2310.10631v2 Sách, tạp chí
Tiêu đề: Llemma: An Open LanguageModel For Mathematics
Tác giả: Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck
Nhà XB: arXiv
Năm: 2023
[4] Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q.V., Xu, Y., Fung, P. (2023). A Multitask Sách, tạp chí
Tiêu đề: A Multitask
Tác giả: Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q.V., Xu, Y., Fung, P
Năm: 2023
[5] Bhattacharya, A. (2017). A Survey of Question Answering for Math and Science Problem. DOI: 10.48550/arXiv.1705.04530 Sách, tạp chí
Tiêu đề: A Survey of Question Answering for Math and Science Problem
Tác giả: A. Bhattacharya
Năm: 2017
[10] Chen, W., Ma, X., Wang, X., Cohen, W.W. (2022). Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks. arXiv.Org. Available at: https://arxiv.org/abs/2211.12588v4 Sách, tạp chí
Tiêu đề: Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Tác giả: Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen
Nhà XB: arXiv.Org
Năm: 2022
[11] Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., Wang, S., Yin, D., Fan, W., Liu, H., Tang, J. (2023). Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs. arXiv.Org. Available at:https://arxiv.org/abs/2307.03393v3 Sách, tạp chí
Tiêu đề: Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
Tác giả: Chen, Z., Mao, H., Li, H., Jin, W., Wen, H., Wei, X., Wang, S., Yin, D., Fan, W., Liu, H., Tang, J
Nhà XB: arXiv.Org
Năm: 2023
[12] Cunningham, P., Delany, S. (2007). k-Nearest neighbour classifiers, Mult Classif Syst, 54. DOI: 10.1145/3459665 Sách, tạp chí
Tiêu đề: k-Nearest neighbour classifiers
Tác giả: P. Cunningham, S. Delany
Nhà XB: Mult Classif Syst
Năm: 2007
[14] Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., Liu, K., Chen, L., Tran, S., Cheng, N., Wang, R., Singh, N., Patti, T.L., Lynch, J., Shporer, A., Verma, N., Wu, E., Strang, G. (2022). A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level, Proc. Natl. Acad. Sci., 119(32), p. e2123433119. DOI:10.1073/pnas.2123433119 Sách, tạp chí
Tiêu đề: A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level
Tác giả: Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., Liu, K., Chen, L., Tran, S., Cheng, N., Wang, R., Singh, N., Patti, T.L., Lynch, J., Shporer, A., Verma, N., Wu, E., Strang, G
Nhà XB: Proc. Natl. Acad. Sci.
Năm: 2022
[15] Gaur, V., Saunshi, N. (2023). Reasoning in Large Language Models Through Symbolic Math Word Problems. arXiv.Org. Available at:https://arxiv.org/abs/2308.01906v1 Sách, tạp chí
Tiêu đề: Reasoning in Large Language Models Through Symbolic Math Word Problems
Tác giả: Gaur, V., Saunshi, N
Nhà XB: arXiv.Org
Năm: 2023
[18] Huang, H., Tang, T., Zhang, D., Zhao, W.X., Song, T., Xia, Y., Wei, F. (2023). Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting Sách, tạp chí
Tiêu đề: Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting
Tác giả: Huang, H., Tang, T., Zhang, D., Zhao, W.X., Song, T., Xia, Y., Wei, F
Năm: 2023
[19] Huang, X., Zhang, L.L., Cheng, K.-T., Yang, M. (2023). Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning. DOI: 10.48550/arXiv.2312.08901 Sách, tạp chí
Tiêu đề: Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Tác giả: Huang, X., Zhang, L.L., Cheng, K.-T., Yang, M
Năm: 2023
[20] Ilany, B.-S. (2010). Language and Mathematics: Bridging between Natural Language and Mathematical Language in Solving Problems in Mathematics, Creat.Educ., 01, pp. 138–148. DOI: 10.4236/ce.2010.13022 Sách, tạp chí
Tiêu đề: Language and Mathematics: Bridging between Natural Language and Mathematical Language in Solving Problems in Mathematics
Tác giả: Ilany, B.-S
Nhà XB: Creat.Educ.
Năm: 2010
[21] Imani, S., Du, L., Shrivastava, H. (2023). MathPrompter: Mathematical Reasoning using Large Language Models., In: Sitaram, S., Beigman Klebanov, B., Williams, J.D. eds., Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), Toronto, Canada, Association for Computational Linguistics, pp. 37–42 Sách, tạp chí
Tiêu đề: MathPrompter: Mathematical Reasoning using Large Language Models
Tác giả: Imani, S., Du, L., Shrivastava, H
Nhà XB: Association for Computational Linguistics
Năm: 2023
[22] Johansson, M. (2023). What Can Large Language Models Do for Theorem Proving and Formal Methods?, pp. 391–394 Sách, tạp chí
Tiêu đề: What Can Large Language Models Do for Theorem Proving and Formal Methods
Tác giả: M. Johansson
Năm: 2023
[23] Lewkowycz, A., Andreassen, A., Dohan, D., Dyer, E., Michalewski, H., Ramasesh, V., Slone, A., Anil, C., Schlag, I., Gutman-Solo, T., Wu, Y., Neyshabur, B., Gur-Ari, G., Misra, V. (2022). Solving Quantitative Reasoning Problems with Language Models. DOI: 10.48550/arXiv.2206.14858 Sách, tạp chí
Tiêu đề: Solving Quantitative Reasoning Problems with Language Models
Tác giả: Lewkowycz, A., Andreassen, A., Dohan, D., Dyer, E., Michalewski, H., Ramasesh, V., Slone, A., Anil, C., Schlag, I., Gutman-Solo, T., Wu, Y., Neyshabur, B., Gur-Ari, G., Misra, V
Năm: 2022
[24] Linear Support Vector Machine - an overview | ScienceDirect Topics. (n.d.). Available at: https://www.sciencedirect.com/topics/engineering/linear-support-vector-machine Sách, tạp chí
Tiêu đề: Linear Support Vector Machine - an overview
Nhà XB: ScienceDirect Topics
[26] Liu, T., Low, B.K.H. (2023). Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks Sách, tạp chí
Tiêu đề: Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks
Tác giả: Liu, T., Low, B.K.H
Năm: 2023
[27] Liu, W., Hu, H., Zhou, J., Ding, Y., Li, J., Zeng, J., He, M., Chen, Q., Jiang, B., Zhou, A., He, L. (2023). Mathematical Language Models: A Survey Sách, tạp chí
Tiêu đề: Mathematical Language Models: A Survey
Tác giả: Liu, W., Hu, H., Zhou, J., Ding, Y., Li, J., Zeng, J., He, M., Chen, Q., Jiang, B., Zhou, A., He, L
Năm: 2023
[62] What are large language models? | IBM. (n.d.). Available at: https://www.ibm.com/topics/large-language-models Link

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w