000062128 JOOMLA-BASED AUTOMATIC NEWS UPDATING SITE IN COMBINATION WITH VIETSPIDER TRANG WEB CẬP NHẬT TIN TỨC TỰ ĐỘNG DỰA TRÊN JOOMLA KẾT HỢP VỚI VIETSPIDER
c Methodologies
To achieve the stated objective, I will conduct thorough background research by locating and reading as many relevant materials as possible I will build hands-on expertise with VietSpider and Joomla through direct use and experimentation By synthesizing the insights gathered, I will present a novel, efficient approach to news gathering that leverages these tools.
I.D An Overview o f the Rest o f th e D ocum ent
This paper presents a structured, step-by-step overview of the ideas in my graduation thesis, starting with background information on data mining, web mining, and data clustering algorithms, and then introducing Vietspider and its features The discussion then moves to Chapter IV, where the proposed system is described in detail, including design decisions, architecture, and implementation considerations Finally, the thesis concludes with a summary of research findings and a plan for future work to extend and validate the study.
Faculty of Intormation Technology Graduation thesis
Supervisor: Dr Hoang Tuan Hao student: Vu Van Do
Faculty of lnformation Technology Graduation thesis
CHAPTER II BACKGROUNDS ABOUT DATA MINING AND WEB MINING
Data mining is a crucial aspect of information technology, enabling organizations to uncover meaningful patterns from large data sets This article provides the background concepts and presents the topics in a clear sequence: what data mining is, what it can do, how it works, and the techniques used in data mining In short, data mining involves extracting hidden relationships, trends, and insights that support decision making, predictive analytics, and strategic planning It works by collecting, cleaning, and preparing data, applying algorithms to discover patterns, and validating results through evaluation and interpretation Techniques used in data mining include classification, clustering, association rules, and anomaly detection, along with advanced methods such as neural networks and ensemble learning, chosen based on the data characteristics and business goals Understanding these elements helps readers see how data mining transforms raw information into actionable knowledge within the broader information technology landscape.
Data mining is a core discipline within computer science that involves extracting patterns from large data sets by combining statistical techniques with artificial intelligence and robust database management As a powerful tool for modern business, data mining transforms raw data into actionable business intelligence, providing a competitive informational advantage It supports a wide range of applications, including marketing, customer profiling, surveillance, fraud detection, and scientific discovery, helping organizations uncover insights that drive smarter decision-making.
Data mining, also known as data discovery or knowledge discovery, is the process of analyzing data from multiple perspectives and turning it into useful information that can help increase revenue, reduce costs, or both Data mining software is one of several analytical tools used to analyze data from many dimensions, categorize it, and summarize the relationships uncovered to reveal actionable insights for better business decisions.
Supervisor: Dr Hoang Tuan Hao student: Vu Van Do
Graduation thesis Technically, data mining is the process o f íinding correlations or pattem s among dozens o f fields in large relational databases”.
Data mining is the process of extracting knowledge from large data sets Jiawei Han and Micheline Kamber suggest that the more accurate phrasing is “knowledge discovery from data,” emphasizing the goal of uncovering meaningful information from data They also note that a variety of terms—such as knowledge mining, knowledge extraction, data/pattern analysis, data archaeology, and data dredging—carry similar meanings or only slight differences in emphasis.
II.A 2 W h a t can d a ta m ining do?
Data mining is now a core tool for companies with a strong consumer focus, including retailers, financial services, telecommunications, and marketing organizations It reveals how internal factors such as price, product positioning, and staff skills relate to external factors like economic indicators, competition, and customer demographics By linking these factors to outcomes, data mining helps assess impacts on sales, customer satisfaction, and profits It also enables organizations to drill down from high‑level summaries into detailed transactional data to uncover actionable insights.
Data mining lets retailers analyze point-of-sale records to send targeted promotions based on each customer's purchase history By mining demographic data from comment or warranty cards, retailers can develop richer customer profiles and more precise segmentation for personalized marketing campaigns This approach enables more relevant offers, improves response rates, and supports cross-selling and loyalty programs, ultimately driving sales and customer engagement.
To be speciíìc, data mining has been used to:
• Identiíy unexpected shopping pattems in supennarkets.
• Optimize website proíĩtability by making appropriate offers to each visitor.
Supervisor: Dr Hoang Tuan Hao student: Vu Van Do
• Predict c u s t o m e r r e s p o n s e rates in marketing campaigns.
• Deíìning new c u s t o m e r groups for marketing purposes.
*age 110 • Predict customer defections: which customers are likely to switch to an altemative supplier in the near íuture.
• Distinguish between profítable and unproíĩtable customers.
• Improve yields in complex production processes by finding unexpected relationships between process parameters and deíect rates.
• Identiíy suspicious (unusual) behavior, as part o f a fraud detection process.
In short, Data M ining can be applied anywhere in your business or organization where you are interested in identiíying and exploiting predictable outcomes.
II.A 3 H ow does d a ta m in in g w ork?
Data mining bridges the gap between transaction systems and analytical systems in large-scale information technology by analyzing relationships and patterns in stored transaction data through open-ended user queries Data mining software leverages statistical methods, machine learning, and neural networks to uncover these insights Generally, four types of relationships are sought.
Stored data is organized into predefined groups, or classes, to help locate and analyze patterns more efficiently For example, a restaurant chain can mine customer purchase data to determine when customers visit and what they typically order, uncovering insights that can be used to increase traffic by offering targeted daily specials.
Faculty of lnformation Technology Graduation thesis
Supervisor: Dr Hoang Tuan Hao student: Vu Van Do
Faculty of lnformation Technology c 07 Graduation thesis
• C lusters: Data items are grouped according to logical relationships or consumer preíerences For example, data can be mined to identiíy market seements or consumer affmities.
• Associations: Data can be mined to identiíy associations The beer-diaper example is an example o f associative mining.
• Sequential patterns: Data is mined to anticipate behavior pattems and trends
For example, an outdoor equipment retailer could predict the likelihood o f a backpack being purchased based on a consumer's purchase o f sleeping bags and hiking shoes.
Data mining algorithms in use today are commonly broken up into 2 categories:
• Classical Techniquẽs: Statistics, Neighborhoods and Clustering
• Next Generation TechniquesiTrees, Networks and Rules
And now, I am going to give the deíĩnition o f the techniques mentioned in the two a b o v e s e c t i o n s , s o t h a t w e c a n u n d e r s t a n d t h e m a l i t t l e b i t m o r e s p e c i í ì c a l l y
This article is organized into two sections based on when data mining techniques were developed and when they matured enough to be used in business, with a focus on optimizing customer relationship management (CRM) systems; the first section describes techniques that have been used for decades and are well established in practice, while the second section covers techniques that only began to see wide adoption since the early 1980s.
Supervisor: Dr Hoang Tuan Hao student: Vu Van Do
Faculty of lnformation Technology Graduation thesis n.A.4.i.a Statistics
By strict definition, statistics or statistical techniques are not data mining They have been used long before the term data mining was coined to apply to business applications However, statistical techniques are data-driven and are used to discover patterns and to build predictive models From the user’s perspective, you will face a conscious choice when tackling a data mining problem about whether to apply statistical methods or other data mining techniques For this reason, it is helpful to understand how statistical techniques work and how they can be applied within the data mining process.
Clustering and the nearest-neighbor prediction technique are among the oldest methods used in data mining Most people have an intuition about clustering, understanding that similar records tend to be grouped together to form clusters These core techniques help reveal structure in data and enable straightforward, interpretable predictions, making them foundational to data mining practice.
Nearest neighbor is a prediction technique closely related to clustering, where the predicted value for a new record is determined by identifying historical records with similar predictor values and using the outcome from the closest match The method relies on measuring similarity among predictor features and selecting the nearest neighbor to infer the target value for the unlabeled record By leveraging past observations, it provides an estimate based on what is observed in nearby cases While it shares ideas with clustering, its main aim is to predict individual data points rather than group them See related techniques in the clustering literature, such as the n.A.4.i.c Clustering section.
Clustering is the process of grouping similar records together, turning raw data into organized clusters This technique helps analysts and end users see patterns and relationships in the data, providing a high-level view of what is happening in the database By aggregating like records, clustering simplifies complex datasets and reveals the overall structure and activity within the database.
Clustering is sometimes used to meansegmentation - which most marketing people will tell you is useíul for coming up with a birđs eye view o f the business.
Supervísor: Dr Hoang Tuan Hao student: Vu Van Do