ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive ad
Trang 2Organizational Data Mining
Hamid R Nemati1and Christopher D Barko2
1 Information Systems and Operations Management Department
Bryan School of Business and Economics
The University of North Carolina at Greensboro
nemati@uncg.edu
2 Customer Analytics, Inc
7009 Austin Creek Drive
Summerfield, NC 27358
chris.barko@customer-analytics.com
Summary Many organizations today possess substantial quantities of business information but have very little real business knowledge A recent survey of 450 business executives re-ported that managerial intuition and instinct are more prevalent than hard facts in driving or-ganizational decisions To reverse this trend, businesses of all sizes would be well advised to adopt Organizational Data Mining (ODM) ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage ODM has helped many organi-zations optimize internal resource allocations while better understanding and responding to the needs of their customers The fundamental aspects of ODM can be categorized into Arti-ficial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT), with
OT being the key distinction between ODM and Data Mining In this chapter, we introduce ODM, explain its unique characteristics, and report on the current status of ODM research Next we illustrate how several leading organizations have adopted ODM and are benefiting from it Then we examine the evolution of ODM to the present day and conclude our chapter
by contemplating ODM’s challenging yet opportunistic future
Key words: Organizational Data Mining, Customer Relationship Management
55.1 Introduction
Data experts estimate that in 2002 the world generated 5 exabytes of information This amount
of data is more than all the words ever spoken by human beings And the rate of growth is just
as staggering – the amount of data produced in 2002 was up 68% from just two years earlier The size of the typical business database has grown a hundred-fold during the past five years as
a result of Internet commerce, ever-expanding computer systems and mandated record keeping
O Maimon, L Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_55, © Springer Science+Business Media, LLC 2010
Trang 31042 Hamid R Nemati and Christopher D Barko
by government regulations To better grasp how much data this is, consider the following: if one byte of data is the equivalent of this dotB
, the amount of data produced globally in
2002 would equal the diameter of 4,000 suns And that amount has probably doubled since then (Hardy, 2004)
In spite of this enormous growth in enterprise databases, research from IBM reveals that organizations use less than 1 percent of their data for analysis (Brown, 2002) This is the fun-damental irony of the Information Age we live in: organizations possess enormous amounts of business information, yet have so little real business knowledge And to magnify the problem further, a leading business intelligence firm recently surveyed executives at 450 companies and discovered that 90 percent of these organizations rely on gut instinct rather than hard facts for most of their decisions because they lack the necessary information when they need it (Brown, 2002) And in cases where sufficient business information is available, those organizations are only able to utilize less than 7 percent of it (The Economist, 2001)
This proclamation about data volume growth is no longer surprising, but continues to amaze even the experts Although for businesses, more data isn’t always better Organizations must assess what data they need to collect and how to best leverage it Collecting, storing and managing business data and associated databases can be costly, and expending scarce resources to acquire and manage extraneous data fuels inefficiency and hinders optimal per-formance The generation and management of business data also loses much of its potential organizational value unless important conclusions can be extracted from it quickly enough
to influence decision making while the business opportunity is still present Managers must rapidly and thoroughly understand the factors driving their business in order to sustain a com-petitive advantage Organizational speed and agility supported by fact-based decision making are critical to ensure an organization remains at least one step ahead of its competitors
In the past, companies have struggled to make decisions because of the lack of data But in the current environment, more and more organizations are struggling to overcome ”informa-tion paralysis” – there is so much data available that it is difficult to determine what is relevant and how to extract meaningful knowledge Organizations today routinely collect and man-age terabytes of data in their databases, thereby making information paralysis a key challenge
in enterprise decision-making Once the essential data elements are identified, the data must
be reformatted, pre-processed and analyzed to generate knowledge The resulting knowledge
is then delivered to the decision-makers for collaboration, review and action Once decided upon, the final decision must be communicated to the appropriate parties in a rapid, efficient and cost-effective manner
55.2 Organizational Data Mining
The manner in which organizations execute this intricate decision-making process is critical to their well-being and industry competitiveness Those organizations making swift, fact-based decisions by optimally leveraging their data resources will outperform those organizations that
do not A robust technology that facilitates this process of optimal decision-making is known
as Organizational Data Mining (ODM) ODM is defined as leveraging Data Mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage (Nemati and Barko, 2001) ODM elimi-nates the guesswork that permeates so much of corporate decision making By adopting ODM,
an organization’s managers and employees are able to act sooner rather than later, be proactive rather than reactive and know rather than guess ODM technology has helped many
Trang 4organiza-tions optimize internal resource allocaorganiza-tions while better understanding and responding to the needs of their customers
ODM spans a wide array of technologies, including, but not limited to, e-business intelli-gence, data analysis, online analytical processing (OLAP), customer relationship management (CRM), electronic CRM (e-CRM), executive information systems (EIS), digital dashboards and information portals ODM enables organizations to answer questions about the past (what has happened?), the present (what is happening?), and the future (what might happen?) Armed with this capability, organizations can generate valuable knowledge from their data, which in turn enhances enterprise decisions This decision-enhancing technology enables many advan-tages in operations (faster product development, increased market share with quicker time
to market, optimal supply chain management), marketing (higher profitability and increased customer loyalty through more effective marketing campaigns and customer profitability anal-yses) finance (improved performance through financial analytics and economic evaluation of business units and products) and strategy implementation (business performance management (BPM), the Balanced Scorecard, and related strategy alignment and measurement systems) The result of this enhanced decision making at all levels of the organization is optimal re-source allocation and improved business performance
Profitability in business today relies on speed, agility and efficiency at quality levels thought unobtainable just a few years ago The slightest imbalance along the supply chain can increase costs, lengthen internal cycle times and delay new product introductions These im-balances can eventually lead to a loss in both market share and competitive advantage Mean-while, organizations are also forging closer relationships with their customers and suppliers
by defining tighter agreements in terms of shared processes and risks As a result, many busi-nesses are deeply immersed in continuously reengineering their processes to improve quality Six sigma and Balanced Scorecard type efforts are increasingly prevalent ODM enables or-ganizations to remove supply chain imbalances while improving the speed, flexibility and ef-ficiency of their business processes This leads to stronger customer and partner relationships and a sustainable competitive advantage
55.3 ODM versus Data Mining
Data Mining is the process of discovering and interpreting previously unknown patterns in databases It is a powerful technology that converts data into information and potentially ac-tionable knowledge However, obtaining new knowledge in an organizational vacuum does not facilitate optimal decision making in a business setting The unique organizational challenge
of understanding and leveraging ODM to engineer actionable knowledge requires assimilat-ing insights from a variety of organizational and technical fields and developassimilat-ing a comprehen-sive framework that supports an organization’s quest for a sustainable competitive advantage These multidisciplinary fields include Data Mining, business strategy, organizational learn-ing and behavior, organizational culture, organizational politics, business ethics and privacy, knowledge management, information sciences and decision support systems These funda-mental elements of ODM can be summarized into three main groups: Artificial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT) Our research and indus-try experience suggest that successfully leveraging ODM requires integrating insights from all three categories in an organizational setting typically characterized by complexity and un-certainty This is the essence and uniqueness of ODM Obtaining maximum value from ODM involves a cross-department team effort that includes statisticians/data miners, software
Trang 5engi-1044 Hamid R Nemati and Christopher D Barko
neers, business analysts, line-of-business managers, subject matter experts, and upper man-agement support
55.3.1 Organizational Theory and ODM
Organizations are primarily concerned with studying how operating efficiencies and profitabil-ity can be achieved through the effective management of customers, suppliers, partners, and employees To achieve these goals, research in Organizational Theory (OT) suggests that orga-nizations use data in three vital knowledge creation activities This organizational knowledge creation and management is a learned ability that can only be achieved via an organized and deliberate methodology This methodology is a foundation for successfully leveraging ODM within the organization The three knowledge creation activities (Choo, 1997) are:
• Sense making is the ability to interpret and understand information about the environment
and events happening both inside and outside the organization
• Knowledge making is the ability to create new knowledge by combining the expertise of
members to learn and innovate
• Decision making is the ability to process and analyze information and knowledge in order
to select and implement the appropriate course of action
First, organizations use data to make sense of changes and developments in the external environments – a process called sense making This is a vital activity wherein managers dis-cern the most significant changes, interpret their meaning, and develop appropriate responses Secondly, organizations create, organize, and process data to generate new knowledge through organizational learning This knowledge creation activity enables the organization to develop new capabilities, design new products and services, enhance existing offerings, and improve organizational processes Third, organizations search for and evaluate data in order to make decisions This data is critical since all organizational actions are initiated by decisions and all decisions are commitments to actions, the consequences of which will, in turn, lead to the creation of new data Adopting an OT methodology enables an enterprise to enhance the knowledge engineering and management process
In another OT study, researchers and academic scholars have observed that there is no direct correlation between information technology (IT) investments and organizational perfor-mance Research has confirmed that identical IT investments in two different companies may give a competitive advantage to one company but not the other Therefore, a key factor for the competitive advantage in an organization is not the IT investment but the effective utilization
of information as it relates to organizational performance (Brynjolfsson and Hitt, 1996) This finding emphasizes the necessity of integrating OT practices with robust information technol-ogy and artificial intelligence techniques in successfully leveraging ODM
55.4 Ongoing ODM Research
Given the scarcity of past research in ODM along with its growing acceptance and importance
in organizations, we conducted empirical research during the past several years that explored the utilization of ODM in organizations along with project implementation factors critical for success We surveyed ODM professionals from multiple industries in both domestic and international organizations Our initial research examined the ODM industry status and best practices, identified both technical and business issues related to ODM projects, and elaborated
Trang 6on how organizations are benefiting through enhanced enterprise decision-making (Nemati and Barko, 2001) The results of our research suggest that ODM can improve the quality and accuracy of decisions for any organization willing to make the investment
After exploring the status and utilization of ODM in organizations, we decided to focus subsequent research on how organizations implement ODM projects and the factors critical for its success Similar to our initial research, this was pursued in response to the scarcity
of empirical research investigating the implementation of ODM projects To that end, we de-veloped a new ODM Implementation Framework based on data, technology, organizations, and the Iron Triangle (Nemati and Barko, 2003) Our research demonstrated that selected or-ganizational Data Mining project factors, when modeled under this new framework, have a significant influence on the successful implementation of ODM projects
Our latest research has focused on a specific ODM technology known as Electronic Cus-tomer Relationship Management (e-CRM) and its data integration role within organizations
We developed a new e-CRM Value Framework to better examine the significance of integrat-ing data from all customer touch-points with the goal of improvintegrat-ing customer relationships and creating additional value for the firm Our research findings suggest that despite the cost and complexity, data integration for e-CRM projects contributes to a better understanding of the customer and leads to higher return on investment (ROI), a greater number of benefits, im-proved user satisfaction and a higher probability of attaining a competitive advantage (Nemati
et al., 2003).
55.5 ODM Advantages
A 2002 Strategic Decision Making study conducted by Hackett Best Practices determined that ”world-class” companies have adopted ODM technologies at more than twice the rate of
”average” companies (Hoblitzell, 2002) ODM technologies provide these world-class organi-zations greater opportunities to understand their business and make informed decisions ODM also enables world-class organizations to leverage their internal resources more efficiently and effectively than their ”average” counterparts who have not fully embraced ODM
Many of today’s leading organizations credit their success to the development of an in-tegrated, enterprise-level ODM system For example, Harrah’s Entertainment has saved over
$20 million per year since implementing its Total Rewards CRM program This ODM sys-tem has given Harrah’s a better understanding of its customers and enabled the company to create targeted marketing campaigns that almost doubled the profit per customer and deliv-ered same-store sales growth of 14 percent after only the first year In another notable case, Travelocity.com, an Internet-based travel agency, implemented an ODM system and improved total bookings and earnings by 100 percent in 2000 Gross profit margins improved 150 per-cent, and booker conversion rates rose to 8.9 perper-cent, the highest in the online travel services industry
In another significant study, executives from twenty-four leading companies in customer-knowledge management, including FedEx, Frito-Lay, Harley-Davidson, Procter & Gamble and 3M, all realized that in order to succeed, they must go beyond simply collecting customer data and translate it into meaningful knowledge about existing and potential customers
(Dav-enport et al., 2001) This study revealed that several objectives were common to all of the
leading companies, and these objectives can be facilitated by ODM A few of these objectives are segmenting the customer base, prioritizing customers, understanding online customer be-havior, engendering customer loyalty, and increasing cross-selling opportunities
Trang 71046 Hamid R Nemati and Christopher D Barko
55.6 ODM Evolution
55.6.1 Past
Initially, IT systems were developed to automate expensive manual systems This automation provided cost savings through labor reductions and more accurate, faster processes Over the last three decades, the organizational role of information technology has evolved from effi-ciently processing large amounts of batch transactions to providing information in support
of tactical and strategic decision-making activities This evolution from automating expensive manual systems to providing strategic organizational value led to the birth of Decision Support Systems (DSS) such as data warehousing and Data Mining Operational and decision support systems are now a vital part of many organizations The organizational need to combine data from multiple stand-alone systems (e.g financial, manufacturing and distribution) grew as cor-porations began to acknowledge the power of combining these data sources for reporting This spurred the growth of data warehousing where multiple data sources were stored in a format that supported advanced data analysis
The slowness in adoption of ODM techniques in the 1990s was partly due to an orga-nizational and cultural resistance Business management has always been reluctant to trust something it doesn’t fully understand Until recently, most businesses were managed by in-stinct, intuition and ”gut feel” The transition over the past twenty years to a method of man-aging by the numbers is both the result of technology advances as well as a generational shift
in the business world as younger managers arrive with information technology training and experience
55.6.2 Present
Many current ODM techniques trace their origins to traditional statistics and artificial intel-ligence research from the 1980s Today, there are extensive vertical Data Mining applica-tions providing analysis in the domains of banking and credit, bioinformatics, CRM, e-CRM, healthcare, human resources, e-commerce, insurance, investment, manufacturing, marketing, retail, entertainment, and telecommunications Our latest survey findings indicate that the banking, accounting/financial, e-commerce, and retail industries display the highest ODM maturity level to date The need for service organizations (banking, financial, healthcare and insurance) to build a holistic view of their customers through a mass customization marketing strategy is critical to remaining competitive And organizations in the e-commerce industry are continuing to improve online customer relationships and overall profitability via e-CRM technologies (Nemati and Barko, 2001) Continuous technological innovations now enable the affordable exploration of enormous volumes of data It is the combination of technological innovation, creation of new advanced pattern-recognition and data-analysis techniques, ongo-ing research in organizational theory, and the availability of large quantities of data that have guided ODM to where it is today
55.6.3 Future
The number of ODM projects is projected to grow more than 300 percent in the next decade (Linden, 1999) As the collection, organization and storage of data rapidly increases, ODM will be the only means of extracting timely and relevant knowledge from large corporate
Trang 8databases The growing mountains of business data coupled with recent advances in Orga-nizational Theory and technological innovations provide organizations with a framework to effectively use their data to gain a competitive advantage An organization’s future success will depend largely on whether or not they adopt and leverage this ODM framework ODM will continue to expand and mature as the corporate demand for one-to-one marketing, CRM, e-CRM, Web personalization, and related interactive media increases
As information technology advances, organizations are able to collect, store, process, an-alyze and distribute an ever-increasing amount of data Data and information are rampant, but knowledge is scarce As a result, most organizations today are governed by managerial intu-ition and historical reporting This is the byproduct of years of system automation However,
we believe organizations are slowly moving from the Information Age to the Knowledge Age where decision-makers will leverage ODM and Internet technologies to augment intuition in order to allocate scarce enterprise resources for optimal performance
As organizations set a strategic course into the Knowledge Age, there are a number of difficulties awaiting them As its name suggests, ODM is part technological and part orga-nizational Organizations are comprised of individuals, management, politics, culture, hierar-chies, teams, processes, customers, partners, suppliers, and shareholders The never-ending challenge is to successfully integrate Data Mining technologies with organizations to enhance decision-making with the objective of optimally allocating scarce enterprise resources As many consultants, professionals, industry leaders and authors of this chapter can attest, this is not an easy task The media can oversimplify the effort, but successfully implementing ODM
is not accomplished without political battles, project management struggles, cultural shocks, business process reengineering, personnel changes, term financial and budgetary short-ages, and overall disarray ODM is a journey, not a destination, so there must be a continual effort in revising existing knowledge bases and generating new ones But the benefits far out-weigh both the technical and organizational costs, and the enhanced decision-making capabil-ities can lead to a sustainable competitive advantage
Recent ODM research has revealed a number of industry predictions that are expected
to be key ODM issues in the future (Nemati and Barko, 2001) About 80 percent of survey respondents expect web farming/mining and consumer privacy to be significant issues, while over 90 percent predict ODM integration with external data sources to be important We also foresee the development of widely accepted standards for ODM processes and techniques to be
an influential factor for knowledge seekers in the 21stcentury One attempt at ODM standard-ization is the creation of the Cross Industry Standard Process for Data Mining (CRISP-DM) project that developed an industry and tool neutral data-mining process model to solve busi-ness problems Another attempt at industry standardization is the work of the Data Mining Group in developing and advocating the Predictive Model Markup Language (PMML), which
is an XML-based language that provides a quick and easy way for companies to define predic-tive models and share models between compliant vendors’ applications Lastly, Microsoft’s OLE DB for Data Mining is a further attempt at industry standardization and integration This specification offers a common interface for Data Mining that will enable developers to embed data-mining capabilities into their existing applications One only has to consider Microsoft’s industry-wide dominance of the office productivity (Microsoft Office), software development (Visual Basic and Net) and database (SQL Server) markets to envision the potential impact this could have on the ODM market and its future direction
Trang 91048 Hamid R Nemati and Christopher D Barko
55.7 Summary
Although many improvements have materialized over the last decade, the knowledge gap in many organizations is still prevalent Industry professionals have suggested that many corpo-rations could maintain current revenues at half the current costs if they optimized their use of corporate data Whether this finding is true or not, it sheds light on an important issue Leading corporations in the next decade will adopt and weave these ODM technologies into the fabric
of their organizations at all levels, from upper management all the way down to the lowest organizational level Those enterprises that see the strategic value of evolving into knowledge organizations by leveraging ODM will benefit directly in the form of improved profitabil-ity, increased efficiency, and a sustainable competitive advantage Once the first organization within an industry realizes a competitive advantage through ODM, it is only a matter of time before one of three events transpires: its industry competitors adopt ODM, change industries,
or vanish By adopting ODM, an organization’s managers and employees are able to act sooner rather than later, anticipate rather than react, know rather than guess, and ultimately, succeed rather than fail
References
Anonymous (2001), ”The slow progress of fast wires”, The Economist, London, Vol 358,
No 8209, February 17
Brown, E (2002), ”Analyze This”, Forbes, Vol 169, No 8, April 1, pp 96-98
Brynjolfsson, E and Hitt, L (1996), “The Customer Counts”, InformationWeek, September 9,www.informationweek.com/596/96mit.htm
Choo, C W (1997), The Knowing Organization: How Organizations Use Information to Construct Meaning, Create Knowledge, and Make Decisions, Oxford University Press, www.choo.fis.utoronto.ca/fis/ko/default.html
Davenport, T H., Harris, J G and Kohli, A K (2001), “How Do They Know Their Cus-tomers So Well?”, Sloan Management Review, Vol 42, No 2, Winter, pp 63-73 Hardy, Q (2004), “Data of Reckoning”, Forbes, Vol 173, No 10, May 10, pp 151-154 Hoblitzell, T (2002), ”Disconnects in Today’s BI Systems”, DM Review, Vol 12, No 6, July, pp 56-59
Linden, A (1999), CIO Update: Data Mining Applications of the Next Decade, Inside Gart-ner Group, GartGart-ner Inc., July 7
Nemati, H R and Barko, C D (2001), ”Issues in Organizational Data Mining: A Survey of Current Practices”, Journal of Data Warehousing, Vol 6, No 1, Winter, pp 25-36 Nemati, H R and Barko, C D (2003), ”Key Factors for Achieving Organizational Data Mining Success”, Industrial Management and Data Systems, Vol 103, No 4, pp 282-292
Nemati, H R., Barko, C D and Moosa, A (2003), ”E-CRM Analytics: The Role of Data Integration”, Journal of Electronic Commerce in Organizations, Vol 1, No 3, July-Sept,
pp 73-89
Trang 10Mining Time Series Data
Chotirat Ann Ratanamahatana1, Jessica Lin1, Dimitrios Gunopulos1, Eamonn Keogh1, Michail Vlachos2, and Gautam Das3
1 University of California, Riverside
2 IBM T.J Watson Research Center
3 University of Texas, Arlington
Summary Much of the world’s supply of data is in the form of time series In the last decade, there has been an explosion of interest in mining time series data A number of new algo-rithms have been introduced to classify, cluster, segment, index, discover rules, and detect anomalies/novelties in time series While these many different techniques used to solve these problems use a multitude of different techniques, they all have one common factor; they re-quire some high level representation of the data, rather than the original raw data These high level representations are necessary as a feature extraction step, or simply to make the storage, transmission, and computation of massive dataset feasible A multitude of representations have been proposed in the literature, including spectral transforms, wavelets transforms, piecewise polynomials, eigenfunctions, and symbolic mappings This chapter gives a high-level survey
of time series Data Mining tasks, with an emphasis on time series representations
Key words: Data Mining, Time Series, Representations, Classification, Clustering, Time Se-ries Similarity Measures
56.1 Introduction
Time series data accounts for an increasingly large fraction of the world’s supply of data A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974
to 1989 found that more than 75% of all graphics were time series (Tufte, 1983) Given the ubiquity of time series data, and the exponentially growing sizes of databases, there has been recently been an explosion of interest in time series Data Mining In the medical domain alone, large volumes of data as diverse as gene expression data (Aach and Church, 2001), electrocar-diograms, electroencephalograms, gait analysis and growth development charts are routinely created Similar remarks apply to industry, entertainment, finance, meteorology and virtually every other field of human endeavour Although statisticians have worked with time series for more than a century, many of their techniques hold little utility for researchers working with massive time series databases (for reasons discussed below)
Below are the major task considered by the time series Data Mining community
O Maimon, L Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_56, © Springer Science+Business Media, LLC 2010