Predict the likelihood of customers making online purchases based on information about customer behavior

Dự đoán khả năng khách hàng mua hàng trực tuyến dựa trên thông tin về hành vi của khách hàng Predict the likelihood of customers making online purchases based on information about customer behavior

Introduction

The rapid rise of e-commerce has opened up significant market opportunities, yet conversion rates have not kept pace with this growth, indicating a need for innovative solutions that provide tailored promotions to online shoppers Unlike physical retail, where experienced salespeople offer personalized recommendations that enhance purchasing decisions and boost sales, e-commerce struggles to replicate this personal touch Addressing this challenge is crucial for e-commerce businesses aiming to improve conversion rates and maximize sales.

To tackle the challenge of enhancing online shopping experiences, e-commerce and IT companies are investing in advanced early detection and behavioral prediction systems designed to mimic the personalized service of human salespeople Concurrently, academic research is delving into machine learning techniques to forecast user behavior and boost conversion rates Some studies categorize user visits by analyzing navigational patterns, while others focus on real-time behavior prediction, allowing for prompt interventions that improve shopping experiences and minimize cart abandonment.

This thesis presents a predictive system designed to assess the likelihood of online purchases based on customer behavioral data It employs eight machine learning models, including Multilayer Perceptron (MLP), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Decision Tree, Bagging, Random Forest, XGBoost, and AdaBoost The effectiveness of these models is evaluated in predicting purchasing intentions and site abandonment using data from an online retailer The goal of this comparison is to identify the most effective strategies for enhancing conversion rates and improving the overall customer experience.

Previous research has demonstrated the potential of various machine learning techniques in predicting online shopping behaviors For instance, Moe (2003)

K-means clustering categorizes user visits, revealing distinct purchasing intentions through navigational patterns Research by Mobasher et al (2000) demonstrated that clustering models can enhance recommender systems, highlighting how user profiles derived from clickstream data significantly boost personalization during initial user interactions These findings emphasize the critical role of understanding user behavior in delivering customized experiences that foster higher conversion rates.

Leveraging foundational studies, this approach employs a robust suite of machine learning models to forecast purchasing behavior, addressing the urgent need for real-time insights that can minimize shopping cart abandonment and enhance conversion rates For instance, Suchacka and Chodak (2018) analyzed e-customer behaviors through session data, using association rule mining to assess purchasing probabilities Additionally, Yeung (2020) showcased the effectiveness of navigation paths in predicting user actions, underscoring the critical role of real-time behavioral predictions in the e-commerce landscape.

Predicting the likelihood of customers making online purchases using behavioral data is crucial for the e-commerce industry, as it can enhance marketing strategies, optimize resource allocation, and increase revenue for online retailers The capacity to anticipate and influence customer behavior in real-time marks a significant advancement in the evolution of e-commerce technology.

Utilizing advanced models such as Random Forest and XGBoost, renowned for their exceptional predictive accuracy and robustness, this system aims to forecast purchasing intentions while offering actionable insights for e-commerce platforms By personalizing user experiences and enabling timely interventions to retain potential customers, this approach is set to significantly improve the efficiency and effectiveness of online retail operations.

Problem statement and data

This thesis aims to analyze customer behavioral data on the website to predict the probability of online purchases By examining customer interactions, preferences for specific products, and the factors influencing their buying decisions, we seek to understand what drives customers to complete a purchase or abandon their shopping experience.

In our analysis, we utilized data from 12,330 online shopping sessions, revealing that 84.5% (10,422) of these sessions were classified as negative, indicating no purchase was made, while 15.5% (1,908) were classified as positive, resulting in a purchase This dataset includes various metrics such as the time customers spend on different page types (admin, information, product), bounce rates, average pages viewed, purchase timing (weekday vs weekend, month), geographic location, device and browser type, and customer status (new vs returning) A key challenge in this analysis is the imbalance in the data, as the number of non-buying customers significantly outweighs the number of buying customers.

To address the issue, we will first preprocess the data by managing missing values, converting categorical variables into numeric formats, and normalizing the dataset Following this, we will analyze the data through graphical representations and statistical methods to gain insights into its characteristics Finally, we will employ techniques to generate synthetic data samples, which will assist in balancing the ratio between the two customer groups.

Machine learning models, including Random Forest, MLP, XGBoost, KNN, Decision Tree, SVM, AdaBoost, and Bagging, are employed to forecast customer purchasing behavior The effectiveness of these models will be assessed using both the original dataset and the processed data to identify the most accurate predictive model.

Table 2-The description of data

Variable name Role Type Description

Administrative Feature Integer Number of admin-related pages visited

Administrative_Duration Feature Integer Time spent on admin-related pages

Informational Feature Integer Number of informational pages visited

Informational_Duration Feature Integer Time spent on informational pages

ProductRelated Feature Integer Number of informational pages visited

ProductRelated_Duration Feature Continuous Time spent on product-related pages

BounceRates Feature Continuous Rate of visitors bouncing from the first page

ExitRates Feature Continuous Rate of visitors exiting from a specific page

PageValues Feature Integer Average value of pages visited

SpecialDay Feature Integer Proximity to a special day/event

OperatingSystems Feature Integer Operating system of the user

Browser Feature Integer Browser used by the user

Region Feature Integer Region from which the user is visiting

TrafficType Feature Integer Type of traffic source

Type of visitor: new, returning, or other

Weekend Feature Binary Representing whether the session is on a weekend

Representing whether or not the user completed the purchase

Methods

1 An overview of the method

Data imbalance is a significant challenge in predicting online purchasing behavior, as the number of non-purchasing customers typically exceeds that of purchasing customers This disparity complicates the development of accurate prediction models To address this issue, this thesis employs two techniques aimed at mitigating the effects of data imbalance.

SMOTE (Synthetic Minority Oversampling Technique) addresses the issue of class imbalance by generating synthetic data samples for the minority class, specifically customer purchases It achieves this by interpolating between randomly selected samples from the minority class and their nearest neighbors, repeating the process until the minority class is balanced with the majority class of non-purchasing customers Implementing SMOTE enhances the model's predictive performance by enriching its understanding of the minority class, thereby enabling it to learn critical features and mitigate classification bias towards the majority class.

Random Oversampling (RO) is an effective method for addressing data imbalance by randomly sampling with replacement from the minority class This process continues until the number of minority class samples matches that of the majority class While RO is straightforward to implement, it carries the risk of overfitting, particularly when the minority class has a limited sample size.

SMOTE and Random Oversampling are both effective methods for addressing data imbalance, each employing a distinct strategy While SMOTE generates new synthetic samples, Random Oversampling duplicates existing samples from the minority class The selection of the appropriate technique is contingent upon the unique characteristics of the dataset and the model being utilized This thesis explores the application of both SMOTE and Random Oversampling to mitigate data imbalance, thereby enhancing the performance of subsequent forecasting models, including Random Forest and MLPClassifier.

XGBoost, KNN, Decision Tree, SVM, AdaBoost, and Bagging Integrating SMOTE and Random Oversampling with these models significantly improves the model's accuracy and reliability in predicting online purchasing behavior

Figure 3-Flowchart of Problem-solving Steps

Random Forest is an advanced machine learning model that enhances prediction accuracy by aggregating the outputs of various decision trees created from diverse data subsets This ensemble approach effectively reduces overfitting and enhances the stability of the model By utilizing SMOTE-weighted data, Random Forest significantly boosts forecast reliability and accuracy.

The MLPClassifier, a type of artificial neural network, excels in learning intricate non-linear patterns within data due to its multi-hidden layer structure This architecture enables the model to effectively identify complex relationships among features, leading to improved prediction accuracy In this thesis, the MLPClassifier was employed to boost forecasting performance on a dataset that was balanced using SMOTE.

XGBoost (Extreme Gradient Boosting) is a high-performance boosting algorithm known for its impressive computational speed It constructs a series of sequential decision tree models, each aimed at correcting the errors made by its predecessor Particularly effective on SMOTE-balanced data, XGBoost enhances forecasting accuracy by adeptly managing imbalanced datasets and reducing the risk of overfitting.

KNN (K-Nearest Neighbors) is an effective classification algorithm that utilizes the distance principle to classify new data points By balancing the data, KNN can accurately predict outcomes by leveraging information from neighboring data points, making it a powerful tool for distance-based classification.

Decision Trees are a user-friendly prediction model that utilizes branching decision rules for data classification While they can be prone to overfitting, they serve as a critical foundation for various ensemble learning techniques This thesis explores the application of Decision Trees to enhance forecasting accuracy on imbalanced datasets, highlighting their essential role in ensemble models like Random Forest and AdaBoost.

Support Vector Machine (SVM) is a robust classification model particularly suited for binary classification tasks This algorithm identifies the optimal hyperplane that separates different data layers, thereby maximizing prediction accuracy In this thesis, SVM is utilized on balanced datasets to leverage its strong classification capabilities and improve forecasting performance.

AdaBoost is a powerful boosting algorithm that enhances the performance of weak models by combining them into a single strong model It achieves this by adjusting the weights of data samples in each iteration, thereby improving the predictive accuracy of these weak models.

Bagging (Bootstrap Aggregating) is an Ensemble Learning method used in

Machine learning enhances the accuracy of prediction models by training independent sub-models on specific sub-datasets, with final predictions derived from their combined results This thesis employs Bagging to boost both the stability and accuracy of the prediction model, utilizing data balanced through SMOTE By aggregating predictions from various sub-models, Bagging minimizes reliance on any single model, thereby improving the overall generalizability of the predictive framework.

To effectively prepare data for training a machine learning model, key libraries such as imbalanced-learn, ydata_profiling, and scikit-learn are imported for data analysis and processing The data is read from a CSV file into a DataFrame for thorough preprocessing and analysis The dataset comprises 12,330 samples.

The dataset comprises 18 columns featuring various data types, including integers, floats, objects, and logical types Integer columns capture metrics like page visit counts, while float columns represent key performance indicators such as bounce rates and page values Additionally, categorical columns provide essential details regarding the month and visitor types, which require transformation for effective utilization in machine learning models.

The dataset is complete with no missing values, which reduces the need for extensive preprocessing Analysis of the distribution reveals notable fluctuations in key variables, particularly those associated with visits and page metrics, highlighting the necessity of normalizing the data prior to application.

18 machine learning models to ensure the model's effectiveness and reliability.

Results

EDA (Explore Data Analysis)

To further explore the data, a variety of graphs and statistical analyses were used to provide a detailed overview of the data set's characteristics

Figure 4.1-Overview of Features in the Dataset

To analyze the distribution of data in a DataFrame, histograms are created for each variable This crucial step in exploratory data analysis (EDA) enhances our understanding of the characteristics and nature of each variable's distribution.

The provided graphs offer valuable insights into business operations and visitor behavior on websites or applications, enabling the analysis of user interactions and the assessment of business performance By examining key metrics such as bounce rate, page value, and revenue distribution, managers can enhance their understanding of user experiences and streamline workflows Additionally, the data from these charts aids in evaluating seasonal trends, identifying customer sources, and optimizing user experiences across various platforms.

Next, use `sns.countplot` to count the number of samples of variables like `Month`,

Analyzing the relationship between the `Weekend` variable and `VisitorType` in relation to the target variable `Revenue` allows for a comprehensive evaluation of categorical data distribution This approach is essential for identifying significant trends and patterns, which can inform future analysis and enhance predictive accuracy.

By calculating and visualizing the correlation matrix between variables using

Utilizing `df.corr()` and `sns.heatmap` enhances the understanding of variable correlations and reveals data patterns that influence the target variable Heatmaps offer a comprehensive view of relationships among variables, aiding in the identification of key variables crucial for machine learning models.

Figure 4.6-Dashboard of overview of customer behavior and conversions

The customer behavior analysis dashboard highlights that users spend considerable time on product pages, reflecting strong interest in the website's offerings In contrast, low engagement on informational and administrative pages indicates a need for improved content and design Notably, there is a consistent increase in time spent on product pages, particularly in December, likely due to holiday promotions However, bounce and exit rates peak in February, suggesting potential content or user experience issues during that month Furthermore, the data reveals that most visitors are returning customers, showcasing the website's effectiveness in customer retention.

25 customers However, there's room to improve strategies for attracting new customers and understanding why some visitors don't return after their initial visit

Figure 4.7-Dashboard of analyze conversion rate and revenue

The second dashboard reveals monthly insights into conversion rates and revenue, highlighting March and November as peak months likely influenced by seasonal factors or effective marketing campaigns Despite high revenues during these months, elevated bounce and exit rates indicate the necessity for improved user engagement alongside traffic acquisition The analysis points to a heavy dependence on a single revenue stream, which could be risky if disrupted It also shows that new visitors have the highest conversion rates, suggesting an effective onboarding process, while strategies for converting returning visitors need enhancement Furthermore, variations in average metrics for administrative, informational, and product-related page views across months suggest areas ripe for optimization Overall, the dashboard underscores the importance of understanding customer behavior nuances and tailoring strategies for different visitor segments to optimize conversions and revenue.

Results of models with SMOTE applied

The RandomForest Classifier model after applying SMOTE shows good overall classification performance with accuracy reaching 0.90 However, there is a significant difference in F1-score between the two classes

For class 0 (no purchase), the F1-score reached 0.94, showing that the model predicts these cases very well, with the ability to balance precision and recall However, layer

The purchase prediction model exhibits an F1-score of only 0.70, significantly lower than layer 0, indicating challenges in accurately predicting purchase cases despite achieving a high accuracy rate The precision for grade 1 remains satisfactory at 0.64 The application of SMOTE has enhanced the RandomForest Classifier's predictive capabilities compared to the model without SMOTE However, further refinements and optimizations are essential to improve the model's effectiveness in predicting purchasing cases in layer 1, ultimately enhancing its practical efficiency.

The MLP Classifier (Neural Network) model, enhanced by SMOTE, demonstrates strong performance with an accuracy of 88%, accurately predicting class labels for 88% of the dataset However, the F1-score reveals a disparity between the two classes: for class 0 (no purchase), the F1-score is 0.93, indicating excellent prediction capability with a balanced precision and recall In contrast, class 1 (purchases) has a significantly lower F1-score of 0.67, highlighting the model's challenges in predicting purchase cases accurately, despite achieving a reasonable precision of 0.59 for this class.

After implementing SMOTE, the model achieved an impressive accuracy of 0.89, indicating that it correctly predicts the class label for 89% of the samples in the dataset Additionally, a closer examination of the F1-score reveals further insights into the model's classification performance.

The analysis reveals a significant disparity between the two classes in the model's performance For class 0 (no purchase), the F1-score achieved an impressive 0.93, demonstrating the model's effectiveness in balancing precision and recall for these cases In contrast, class 1 (purchase) recorded a notably lower F1-score of 0.67, highlighting the challenges the model faces in accurately predicting purchase instances Despite this, the precision for class 1 remains relatively strong at 0.62.

The K-Nearest Neighbors (KNN) model, after the application of SMOTE, achieved an accuracy of 0.71 Notably, the F1-score indicates a substantial disparity between the two classes; class 0 (no purchase) has a strong F1-score of 0.81, reflecting the model's proficiency in predicting these instances with a good balance of precision and recall In contrast, class 1 (purchase) has a much lower F1-score of 0.38, highlighting the model's challenges in accurately predicting purchase cases, despite a decent recall of 0.59 Overall, while the KNN model demonstrates improved predictive capabilities compared to its non-SMOTE counterpart, the enhancement is not markedly significant.

The Decision Tree Classifier model, enhanced by SMOTE, demonstrates commendable overall classification performance with an accuracy of 0.86 Notably, there is a marked disparity between the classes; for class 0 (non-purchase), the F1-score is an impressive 0.91, indicating strong predictive capability with a good balance of precision and recall In contrast, class 1 (purchase) has a significantly lower F1-score of 0.61, reflecting the model's challenges in accurately predicting purchase cases, despite a reasonable recall of 0.69 for this class.

2.6 Support Vector Machine (SVM) Classifier

The model's overall classification performance is limited, achieving an accuracy of just 0.54 However, it demonstrates a significant disparity in F1-scores between the two classes Specifically, for class 0 (no purchase), the F1-score is 0.64, indicating that the model effectively predicts these instances while maintaining a good balance between precision and recall.

The F1-score for class 1 (purchase) is only 0.36, indicating significant challenges in accurately predicting purchase cases, despite a high recall of 0.82 The SVM model, even after applying SMOTE, remains ineffective, likely due to suboptimal class separation or insufficient data for the model to learn class 1 characteristics Therefore, further adjustments and optimizations are essential to enhance the model's ability to predict purchase scenarios effectively.

The AdaBoost Classifier model, enhanced by SMOTE, demonstrates strong overall classification performance with an accuracy of 0.89, accurately predicting 89% of the dataset's samples However, a deeper analysis of the F1-score reveals a notable disparity between the two classes: class 0 (no purchase) achieves an impressive F1-score of 0.93, indicating effective prediction capabilities, while class 1 (purchase) lags behind with an F1-score of only 0.68, highlighting challenges in accurately predicting purchase cases Despite this, class 1 maintains a respectable recall of 0.77, suggesting reasonable performance in identifying purchase instances.

The Bagging Classifier, after applying SMOTE, achieves an impressive accuracy of 0.89, indicating that it correctly predicts the class label for 89% of the dataset samples Notably, the F1-score reveals a marked disparity between the two classes; class 0 (no purchase) boasts an F1-score of 0.93, reflecting the model's strong predictive capability in these cases with a good balance of precision and recall Conversely, class 1 (purchase) has a lower F1-score of 0.69, highlighting the model's challenges in accurately predicting purchase instances, despite a respectable recall of 0.77 While the model demonstrates significant improvement in predictive performance compared to its SMOTE-free counterpart, further refinements and optimizations are necessary to enhance its overall accuracy and effectiveness.

Figure 4.8-ROC Curves of Classification Models using SMOTE

The ROC (Receiver Operating Characteristic) chart illustrates the classification effectiveness of machine learning models enhanced by the SMOTE technique to address data imbalance issues A key metric for assessing model performance in this graph is the area under the curve (AUC).

The analysis of the chart reveals that the Random Forest, XGBoost, and Bagging models exhibit exceptional classification performance, achieving an AUC of 0.93, which indicates their strong capability to differentiate between purchase and non-purchase cases Additionally, the MLP and AdaBoost models also produced commendable results, with AUC values of 0.91 and 0.92, respectively, demonstrating their effectiveness, although slightly lower than the top-performing models.

The Decision Tree and KNN models demonstrated lower performance compared to other models, achieving AUC scores of 0.79 and 0.71, respectively This indicates that both models exhibit weaker classification capabilities, particularly the KNN model, which showed the least effectiveness.

AUC is close to the random classification model (diagonal) The SVM model also did not achieve satisfactory results with an AUC of only 0.71

The application of the SMOTE technique has notably enhanced the classification capabilities of machine learning models, particularly for the minority class Despite this improvement, there remains a significant variation in performance among different models and specific problems In this context, Random Forest, XGBoost, and Bagging emerge as the most effective choices for achieving data balancing.

Results of models without SMOTE applied

The RandomForest model's performance prior to applying SMOTE reveals a notable disparity in prediction capabilities between the two classes Class 0 achieved an impressive F1-score of 0.95, indicating strong effectiveness in predicting non-purchase cases with a solid balance of precision and recall In contrast, class 1's F1-score was significantly lower at 0.66, highlighting the model's challenges in accurately predicting purchase cases Despite this, the accuracy for class 1 remains acceptable at 0.76.

The F1-score difference indicates that the RandomForest model, prior to applying SMOTE, favored predicting non-purchases (class 0) over purchases (class 1), likely due to an imbalance in the original dataset where class 0 samples outnumbered class 1 Implementing the SMOTE technique can enhance the model's predictive performance for class 1, as demonstrated in earlier findings.

The MLP model's performance prior to applying SMOTE reveals a significant imbalance in its predictive capabilities for the two classes Class 0 achieves a high F1-score of 0.94, indicating strong effectiveness in predicting non-purchase cases with a good balance of precision and recall In contrast, class 1 has a much lower F1-score of 0.63, highlighting the model's challenges in accurately predicting purchase cases, despite maintaining a reasonable accuracy of 0.71 for this class This disparity in F1-scores suggests that the MLP model is not well-optimized for purchase predictions, likely due to the imbalance in the original dataset, where non-purchase samples significantly outnumber purchase samples.

The XGBoost Classifier model demonstrates a notable imbalance in its predictive capabilities across the two classes With an impressive F1-score of 0.94 for class 0, the model excels in accurately predicting non-purchases (negative cases), reflecting a strong balance between accuracy and sensitivity In contrast, the F1-score for class 1 is significantly lower at 0.65, indicating the model's challenges in effectively predicting purchases (positive cases).

The XGBoost Classifier model demonstrates high accuracy in predicting positive purchase cases, achieving a precision score of 0.71 for class 1 However, the notable difference in F1-scores between the two classes indicates that the model may not be fully optimized for predicting purchase cases This limitation is likely attributed to an imbalance in the original dataset, where samples from class 0 significantly outnumber those from class 1.

The K-Nearest Neighbors (KNN) model exhibits a significant imbalance in prediction performance prior to the application of SMOTE, with an F1-score of 0.90 for class 0, demonstrating strong effectiveness in predicting non-purchases while maintaining a favorable balance between accuracy and sensitivity In contrast, the F1-score for class 1 reveals notable deficiencies in the model's predictive capabilities.

The KNN model demonstrated a low precision of 0.17 for predicting purchase cases in class 0, indicating challenges in accurately forecasting these instances, even though its performance for class 1 remains acceptable at 0.36 This discrepancy suggests that the model is not well-optimized for predicting purchases, likely due to an imbalance in the dataset where class 0 significantly outnumbers class 1 Implementing data imbalance techniques, such as SMOTE, could enhance the model's predictive performance for class 1, as evidenced by improvements in previous models.

The Decision Tree Classifier model exhibited an imbalance in prediction ability between the two classes before applying SMOTE The F1-score for class 0 was 0.92, indicating strong performance in predicting non-purchase cases, while the F1-score for class 1 was only 0.55, reflecting significant difficulty in accurately predicting purchase cases This disparity suggests that the model is not well-optimized for identifying purchase instances, likely due to the original data's imbalance, where class 0 samples outnumber class 1.

3.6 Support Vector Machine (SVM) Classifier

The SVM model's initial results indicate a significant imbalance in classification performance between the two classes While the F1-score for layer 0 is an impressive 0.91, demonstrating strong accuracy and sensitivity in predicting non-purchase cases, the F1-score for layer 1 is 0, revealing the model's complete inability to predict purchase scenarios This stark contrast in F1-scores highlights the critical issue of imbalance in the SVM model's effectiveness.

37 model is not optimized for predicting purchase cases, possibly due to extreme imbalance in the original data, where the number of samples of class 0 is much superior to class 1

Before applying SMOTE, the AdaBoost Classifier exhibited a notable imbalance in its predictive performance across two classes The F1-score for class 0 was 0.94, indicating strong accuracy in predicting non-purchase cases, while the F1-score for class 1 was only 0.64, revealing significant challenges in accurately predicting purchase cases This disparity in F1-scores highlights the inadequacy of the AdaBoost Classifier in effectively addressing data imbalance.

Prior to implementing SMOTE, the Bagging Classifier revealed a significant imbalance in predictability between the two classes Specifically, the F1-score for class 0 was 0.94, indicating that the model effectively predicts non-purchase cases while demonstrating a strong capacity for balance.

The F1-score for class 1 is only 0.64, which is significantly lower than that of class 0, indicating that the model struggles to accurately predict purchase cases This disparity in F1-scores suggests that the Bagging Classifier model is not well-optimized for predicting purchases, likely due to an imbalance in the original dataset, where samples of class 0 outnumber those of class 1.

Figure 4.9-ROC Curves of Classification Models without using SMOTE

The ROC (Receiver Operating Characteristic) curve illustrates the performance of classification models prior to implementing the SMOTE technique to address data imbalance issues A key metric in this chart is the area under the curve (AUC), which serves as a critical indicator of the model's classification effectiveness; a higher AUC signifies superior model performance.

The analysis reveals that Random Forest and Bagging models excel in classification, achieving an impressive AUC of 0.93, indicating their effectiveness in differentiating between purchase and non-purchase cases across various decision thresholds This is further illustrated by the ROC curve, which is positioned near the upper left corner Additionally, the AdaBoost model demonstrates strong performance with an AUC of 0.92, closely trailing the top-performing models.

The Decision Tree and KNN models demonstrated inferior performance compared to other models, achieving AUCs of 0.73 and 0.69, respectively, indicating their limited classification ability Notably, the KNN model's AUC is nearly equivalent to that of a random classification model Additionally, the SVM model fell short of expectations with an AUC of just 0.78.

Compare each model of SMOTE and without SMOTE

4.1 Compare using Random Forest with SMOTE and without SMOTE

The results indicate that the application of SMOTE significantly enhances the performance of the Random Forest model, particularly in terms of the F1-score and Accuracy metrics Specifically, the F1-score for class 1 improves from 0.656 to 0.698, demonstrating an enhanced ability of the model to balance precision and recall for the minority class However, this improvement is accompanied by a minor decrease in the F1-score for the other class.

The model's accuracy shows a slight decline when using SMOTE, decreasing from 0.905 to 0.895, as it prioritizes enhancing the performance of the minority class While SMOTE effectively improves the model's predictive capability for the minority class, it results in a minor reduction in overall accuracy and performance for the majority class, leading to an overall accuracy shift from 0.945 to 0.936.

4.2 Compare SMOTE and without SMOTE with MLP

The application of SMOTE to the MLPClassifier model has resulted in notable changes in the F1-score and accuracy metrics Specifically, the F1-score for the minority class (class 1) improved from 0.627 to 0.667, indicating enhanced model performance Conversely, the F1-score for the majority class (class 0) experienced a slight decrease from 0.939 to 0.927 Additionally, the overall accuracy of the model decreased from 0.896 to 0.880, likely due to SMOTE's emphasis on enhancing the classification capability of the minority class, which led to a minor trade-off in accuracy Overall, SMOTE has demonstrated effectiveness in improving the classification of the minority class.

41 classification ability of the MLPClassifier model for the minority class, but is accompanied by a slight degradation in the overall accuracy and classification performance of the majority class

4.3 Compare SMOTE and without SMOTE with XGBoost

After applying SMOTE to the XGBoostClassifier model, the F1-score for the minority class (class 1) improved from 0.652 to 0.675, indicating enhanced classification ability and better balance However, the F1-score for the majority class (class 0) slightly decreased from 0.940 to 0.933 Additionally, the overall accuracy of the model experienced a minor decline from 0.899 to 0.889, likely due to the emphasis on improving the minority class's classification performance.

4.4 Compare SMOTE and without SMOTE with KNN

After applying SMOTE to the KNN model, there were notable changes in the F1-score and accuracy metrics The F1-score for the minority class (class 1) improved from 0.174 to 0.385, enhancing the model's classification ability, while the performance for the majority class (class 0) slightly decreased, with the F1-score dropping from 0.905 to 0.806 Consequently, the overall accuracy of the model declined from 0.829 to 0.705, highlighting SMOTE's focus on boosting the minority class's classification ability at the expense of overall accuracy and the majority class's performance.

4.5 Compare SMOTE and without SMOTE with Decision Tree

After implementing SMOTE, the F1-score for the minority class (class 1) improved from 0.551905 to 0.6062, indicating enhanced classification ability Although the F1-score for the majority class (class 0) slightly decreased from 0.917253 to 0.914485, the overall model accuracy showed a minor decline from 0.860303 to 0.859484 These results suggest that SMOTE effectively balanced the model's performance across classes, particularly benefiting the minority class.

4.6 Compare SMOTE and without SMOTE with SVM

The application of SMOTE significantly improved the F1-score for the minority class (class 1), increasing from 0 to 0.357919, indicating a better ability to predict this class However, the performance for the majority class (class 0) declined, with the F1-score dropping from 0.91487 to 0.641558 Consequently, the overall model accuracy also suffered, decreasing from 0.843097 to 0.539943, as the model prioritized enhancing the classification of the minority class at the expense of the majority class's performance.

4.7 Compare SMOTE and without SMOTE with Adaboost

AdaBoostClassifier before and after applying SMOTE had changes in the F1-score and Accuracy indices F1-score improved the model's classification ability for class

1 (minority), shown by the growth of F1-score from 0.637143 to 0.683662, showing that the model after applying SMOTE was well balanced but the performance of class

The overall performance of the model experienced a decrease, with the majority class dropping from 0.939264 to 0.932073 Additionally, the accuracy of the model declined from 0.895944 to 0.888161 after implementing SMOTE, which aimed to enhance the classification capability of the minority class This adjustment led to a minor reduction in overall model accuracy.

4.8 Compare SMOTE and without SMOTE with Bagging

After applying SMOTE to the BaggingClassifier model, there was a notable change in the F1-score and accuracy metrics The F1-score improved from 0.63893 to 0.688372, indicating enhanced classification performance for the minority class (class 1), while the F1-score for the majority class (class 0) slightly decreased from 0.942267 to 0.933366 Additionally, the overall model accuracy declined from 0.900451 to 0.890209, suggesting a trade-off between improved minority classification and a slight reduction in overall accuracy Therefore, the effectiveness of SMOTE should be evaluated based on the specific context of the problem.

Results of models with Random Oversampling applied

The Random Oversampling method addresses the issue of imbalanced training data by creating additional samples from the minority class This approach enhances the balance between class samples, ultimately improving the model's learning and predictive capabilities for minority classes.

After applying Random Oversampling, looking at the F1-score, we see that the model achieved good results with class 0 (the majority), reaching 0.94 However, for class

The model achieved an F1-score of 0.69 for the minority class, indicating challenges in accurately classifying these cases While the overall accuracy stands at a high 0.90, this figure may be skewed due to imbalances in the original dataset Therefore, it is essential to consider additional metrics like the F1-score for a more comprehensive evaluation of the model's performance.

The application of Random Oversampling has yielded notable improvements in the model's performance, particularly in terms of F1-score and accuracy While the model achieved a satisfactory F1-score of 0.87 for the majority class, it struggled to accurately classify the minority class, with a lower F1-score of 0.58 Nonetheless, Random Oversampling has demonstrated its effectiveness in enhancing the model's ability to predict the minority class, albeit with remaining performance imbalances between the two classes.

The model achieved an accuracy of 0.80, indicating a strong performance; however, this figure is influenced by the imbalance present in the original dataset Therefore, it is essential to also assess the F1-score to gain a comprehensive understanding of the model's effectiveness.

Applying Random Oversampling highlights significant differences in F1-score and Accuracy metrics The model achieved an impressive F1-score of 0.93 for class 0 (majority), but struggled with class 1 (minority), obtaining only 0.66, indicating challenges in accurately classifying instances of this class Although the overall accuracy of the model reached 0.89, which is quite high, it may be influenced by the original data imbalance Therefore, it's essential to consider additional metrics like F1-score for a comprehensive evaluation of the model's performance.

Looking at the F1-score after applying Random Oversampling, we see that the model achieved good results with class 0 (the majority), reaching 0.80 However, for class

The F1-score for the minority class was only 0.39, indicating that the model struggles to accurately classify these cases While Random Oversampling has slightly enhanced the prediction capability for the minority class, its effectiveness remains limited The overall accuracy of the model stands at 0.70, which, although not excessively low, suggests that the imbalance in the original dataset may cause the model to favor predictions for the majority class.

After implementing Random Oversampling, the model demonstrated notable improvements in F1-score and Accuracy The F1-score for the majority class (class 0) achieved an impressive 0.92, while the minority class (class 1) only reached 0.53, indicating that although enhancements were made, the model's effectiveness for the minority class remains limited The overall accuracy of the model reached 0.86, reflecting a strong performance While Random Oversampling has significantly enhanced the model's classification ability for the minority class, its performance in this area still requires further improvement.

Applying Random Oversampling has highlighted significant insights into model performance, particularly concerning the F1-score and Accuracy metrics The model achieved an F1-score of 0.66 for class 0 (majority), which is relatively low compared to other models, while the F1-score for class 1 (minority) was only 0.37, indicating ongoing challenges in accurately classifying minority cases Although Random Oversampling has enhanced the model's ability to predict the minority class, its effectiveness remains limited The overall accuracy of the model stands at 0.56, a modest figure likely influenced by the original data's imbalance, which skews predictions towards the majority class.

The application of Random Oversampling has enhanced the classification results, particularly benefiting the minority class The model achieved an impressive F1-score of 0.92 for the majority class, demonstrating strong predictive capability However, the F1-score for the minority class remains at 0.66, indicating that the model continues to face challenges in accurately predicting these cases.

51 accurately classifying cases The overall accuracy of the model is 0.87, which is quite high, but imbalances in the original data can confound this result

After implementing Random Oversampling, the model's performance revealed significant insights, particularly in the F1-score and Accuracy metrics The F1-score for the majority class (class 0) was impressive at 0.93, while the minority class (class 1) lagged at only 0.65, indicating challenges in accurately classifying this group Although Random Oversampling improved predictions for the minority class, its effectiveness remains limited The overall model accuracy reached 0.89, which is relatively high; however, this figure may be influenced by the initial data imbalance, highlighting the importance of considering additional metrics like the F1-score for a thorough assessment of the model's effectiveness.

Compare SMOTE, Random Oversampling and without SMOTE of 8

Table 4.1- Compare SMOTE, RO and without SMOTE of 8 models

SMOTE RO UNSMOTE SMOTE RO UNSMOTE

The table presents a comparison of the performance of eight machine learning models utilizing different imbalanced data handling techniques, including SMOTE (Synthetic Minority Over-sampling Technique), Random Oversampling (RO), and a baseline without SMOTE The analysis focuses on two key metrics: Accuracy and the F1-score for class 1.

The models all achieved quite high and similar accuracy when applying both methods, ranging from 80% to 90%, showing that these models all have good

53 classification ability on the given data set, regardless of the imbalance treatment method

Class 1 of the F1-score highlights distinct differences among various methods and models This metric serves as a comprehensive index of precision and recall, specifically designed to assess model performance on minority classes.

SMOTE significantly enhances the F1-score for class 1 across various models, particularly benefiting Random Forest, MLP, and XGBoost This improvement indicates that SMOTE effectively boosts the predictive performance of these models for the minority class.

RO also improved class 1 of F1-score compared to not using SMOTE in some models, but not as effectively as SMOTE

Class 1 of F1-score of the SVM model has a result equal to 0 when imbalance treatment is not used, indicating that the model cannot predict the minority class in this case However, when applying SMOTE or RO, the F1-score of SVM increases significantly, demonstrating the effectiveness of these methods in improving the prediction ability of the minority class

SMOTE has proven to be the most effective technique for enhancing the F1 score of class 1 across various models Nevertheless, the choice of method should be tailored to the specific characteristics of each dataset and model.

Compare SMOTE and Random Oversampling of 8 models with results of

Table 4.2-Compare SMOTE and RO of 8 models with article [2]

The table presented compares the performance of eight models utilizing two data imbalance handling techniques, SMOTE and Random Oversampling (RO) Additionally, it contrasts these results with the findings from the research article [2], focusing specifically on class 1's F1-score and accuracy index.

All models demonstrated impressive accuracy exceeding 80%, indicating strong classification performance on the dataset, irrespective of the imbalance treatment methods applied However, notable variations were observed in the F1-score, particularly in comparison to findings from previous research.

SMOTE often leads to a decrease in the F1-score for class 1, as observed in previous research [2], with the exception of the SVM model This reduction may be attributed to the generation of synthetic samples by SMOTE, which can obscure important information and diminish the model's generalization capability.

On the contrary, Random Oversampling(RO) does not significantly change the class

1 of F1-score compared to the research results [2], in some cases such as the KNN model, RO even slightly improves the class 1 of F1-score

The SVM model demonstrates a significant increase in the F1-score for class 1 when utilizing SMOTE and RO techniques, particularly highlighting the effectiveness of SMOTE in enhancing the prediction capability for the minority class Among these methods, RO emerges as the superior option, while SMOTE proves beneficial primarily with certain models like SVM.

Conclusion

This thesis addresses the challenge of predicting customers' online purchasing capabilities using their behavioral data from e-commerce platforms It employs eight distinct machine learning models—Random Forest, MLP, XGBoost, KNN, Decision Tree, SVM, AdaBoost, and Bagging—alongside two techniques for managing data imbalance, SMOTE and Random Oversampling The study evaluates the performance of these models on real-world datasets, providing valuable insights into their effectiveness.

Analysis reveals that while all models demonstrate high accuracy exceeding 80%, there are notable disparities in their ability to predict the minority class of customers who make purchases, influenced by the methods used to address data imbalance Random Oversampling emerges as a superior technique for enhancing model prediction capabilities, whereas SMOTE shows effectiveness only with specific models and may hinder the predictive performance of others.

This research enhances our understanding of machine learning models and data imbalance techniques in predicting online purchasing behavior Additionally, the thesis offers valuable recommendations for selecting suitable models and data processing methods, which can effectively improve the business efficiency of e-commerce enterprises.

Recommendation

The thesis recommends integrating diverse data sources—such as behavioral insights, demographics, purchase history, and social media data—to enhance the prediction of customers' online purchasing abilities Developing real-time prediction models is essential for delivering personalized recommendations and offers during customer interactions on websites Additionally, a detailed analysis of customer behavior can significantly improve the accuracy of these predictive models, necessitating ongoing evaluation of their performance to ensure effectiveness.

57 multiple data sets to ensure generalizability and applicability in practice Cooperate with e-commerce businesses to deploy and evaluate the model, while protecting customer privacy according to information security regulations

1 Islam, Md.I.U., says:, L.W and says:, A (2023) Complete analytics of growth in eCommerce, WEBAPPICK Available at: https://webappick.com/growth-in- ecommerce-complete-analytics/

2 Sakar, C.O., Polat, S.O., Katircioglu, M et al Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks Neural Comput & Applic 31, 6893–6908 (2019) https://doi.org/10.1007/s00521-018-3523-0

3 Li, L Analysis of e-commerce customers' shopping behavior based on data mining and machine learning Soft Comput (2023) https://doi.org/10.1007/s00500-023-08903-5

4 S Mootha, S Sridhar and M S K Devi, "A Stacking Ensemble of Multi Layer Perceptrons to Predict Online Shoppers' Purchasing Intention," 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 2020, pp 721-726, doi: 10.1109/ISRITI51436.2020.9315447

5 Liu, Z., Li, J., Huang, J., & Li, Y (2019) A survey on predicting online shopping behavior via user reviews Knowledge and Information Systems, 61(3), 825-852

6 Osnat Mokryn, Veronika Bogina, Tsvi Kuflik, Will this session end with a purchase? Inferring current purchase intent of anonymous visitors, Electronic Commerce Research and Applications, Volume 34, 2019, 100836, ISSN 1567-

7 Rausch, T M., Derra, N D., & Wolf, L (2022) Predicting online shopping cart abandonment with machine learning approaches International Journal of Market Research, 64(1), 89-112 https://doi.org/10.1177/1470785320972526

8 Arora, A., Sakshi, Gupta, U (2024) Ensemble Learning for Enhanced Prediction of Online Shoppers’ Intention on Oversampling-Based Reconstructed Data In: Hassanien, A.E., Castillo, O., Anand, S., Jaiswal, A (eds) International

Tiêu đề	Predict the likelihood of customers making online purchases based on information about customer behavior
Tác giả	Nguyen Lan Chi
Người hướng dẫn	PhD. Tran Duc Quynh
Trường học	Vietnam National University, Hanoi International School
Chuyên ngành	Business Data Analytics
Thể loại	Graduation project
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	59
Dung lượng	1,64 MB