1. Trang chủ
  2. » Công Nghệ Thông Tin

IT training search driven business analytics khotailieu

29 17 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 29
Dung lượng 6,56 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Andy Oram Designing a New Search Engine for Data Search-Driven Business Analytics... Andy OramSearch-Driven Business Analytics Designing a New Search Engine for Data... 1 A New Generati

Trang 1

Andy Oram

Designing a New Search Engine for Data Search-Driven

Business Analytics

Trang 2

Make Data Work

strataconf.com

Presented by O’Reilly and Cloudera, Strata + Hadoop World is where cutting-edge data science and new business fundamentals intersect— and merge.

n Learn business applications of data technologies

nDevelop new skills through trainings and in-depth tutorials

nConnect with an international community of thousands who work with data

Trang 3

Andy Oram

Search-Driven Business Analytics

Designing a New Search

Engine for Data

Trang 4

[LSI]

Search-Driven Business Analytics

by Andy Oram

Copyright © 2015 O’Reilly Media, Inc All rights reserved.

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

Editor: Shannon Cutt

Interior Designer: David Futato

Cover Designer: Randy Comer

Illustrator: Rebecca Demarest August 2015: First Edition

Revision History for the First Edition

2015-09-02: First Release

2015-10-20: Second Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc Search-Driven

Business Analytics, the cover image, and related trade dress are trademarks of

O’Reilly Media, Inc.

While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is sub‐ ject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.

Trang 5

Table of Contents

Search-Driven Business Analytics 1

A New Generation of Vendors Offering Interactive Visualizations 2

Data Access Methods Are Being Transformed by Search 4

Getting Insights from Diverse Data 7

Interpreting User Input 9

Translating Queries into Answers 13

Validating Answers 15

Creating the Simplicity of a Search-Like Query 17

Creating Instant Visualizations 20

Sharing Answers and Visualizations 21

Bringing Search-Driven Analytics to the Masses 22

iii

Trang 7

Search-Driven Business Analytics

We are all accustomed to instant results with the use of major websearch engines However, when we pull up a business intelligence(BI) product at work, the situation is quite different In comparison

to Internet services that we use every day, these products seem stiffand unresponsive Business leaders are served with pre-built reportsand dashboards put together by their BI teams, and they wait days

or weeks to get reports on new inquiries about customers, products,

or markets Thus, when a business manager moves from Facebook,Amazon.com, or Google to her BI tool, it feels like time travel back

to a different century

This report examines what it takes to make business intelligence assimple and responsive as today’s consumer search engines, wherethe user gets answers and visualizations as quickly as questionscome to mind

We’ll look at:

• The convergence of BI and search

• What a search-driven user experience looks like

• The intelligence required for analytical search

• Data sources and their associated data modeling requirements

• Turning on-the-fly calculations into visualizations

• Applying enterprise scale and security to search

The techniques described here are general and draw on established practices in the field The main reference platform forthis report is the ThoughtSpot Analytical Search Appliance Theauthor will also incorporate information gleaned from discussions

well-1

Trang 8

with technical staff from Microsoft’s Power BI service and from

Adatao, a firm that offers collaborative and predictive analytics

A New Generation of Vendors Offering

Interactive Visualizations

ThoughtSpot’s Analytical Search engine allows the user to ask hoc questions of their data through a search interface The enginecomputes results on-the-fly based on the search query, and offersvisualizations of interest to the user It features an interactive inter‐face that allows you to search through billions of rows and computeresults on-the-fly from any data source

ad-Figure 1 Data display in ThoughtSpot

Microsoft’s PowerBI service lets you quickly create dashboards,share reports, and directly connect to (and incorporate) all the dataavailable within the organization, through partners, or publicly pos‐ted to the Internet Power BI Desktop enables you to transform dataand create reports and visualizations Figure 2 shows a typical dash‐board created in the Desktop

Trang 9

Figure 2 Dashboard produced by Microsoft Power BI

Adatao takes a problem-solving approach to all data, big and small,where the user starts with a hypothesis and pulls answers out of datasources to validate or invalidate the hypothesis Figure 3 shows typi‐

cal output from Adatao, known as a narrative, which enables data

discovery and presentation in the form of attractive visualizations

Figure 3 Narrative produced by Adatao

A New Generation of Vendors Offering Interactive Visualizations | 3

Trang 10

Data Access Methods Are Being

Transformed by Search

So how have these new-generation technologies transformed datainteraction for the business user? An enlightening analogy can bedrawn between the way managers use BI today and how informationaccess on the Internet has evolved

Typically, a manager at a data-rich company has access to certaincanned business reports The managers have generated a list of busi‐ness questions such as “a chart showing the product revenue fromeach store, to compare same-store sales year-by-year” and a pro‐grammer has dutifully coded up an analytics application to providethose answers If the business managers want a different report con‐taining metrics and relationships not provided ahead of time, arecoding effort is involved This severely limits the data analysis sys‐tems, leaving them unresponsive to intuitive questioning by thebusiness managers The systems and humans are operating at verydifferent paces in this world of old-generation BI software

Drawing an analogy to the evolution of the Internet, this is similar

to the sites that curated content for users more than a decade ago.Users would subscribe to forums to find out what was new Hotproducts like Encarta (introduced by Microsoft in the early 1990swhen the Web was quite young) provided predetermined sets ofinformation in an encyclopedia format Getting access to theseresources was much easier than pacing through the card catalog ofone’s local library, but they opened access only to a limited set ofinformation chosen by the site Existing BI reports are similar tothese offerings in their inelasticity and lack of real-time interactivity

to serve the needs of the business user

The advent of the AltaVista search engine, and subsequently Google,transformed information access The search engines didn’t add a jot

to the information already available But they radically broadenedthe sites to which we had access, and put us only a few seconds and

a few clicks away from the wealth of information and opinions onthe Web Immediate options are now taken for granted as we search

an online bookseller for books, a travel site for hotels and airlinetickets, etc Within minutes we sample a mind-boggling range ofopinions from around the world, whether the subject is the best datastore for fast-moving input or the latest sports news

Trang 11

What does it take to bring the same kind of instant feedback andbroad searchability to business intelligence? Some requirementsinclude:

Real-time interactivity

When you start typing “flowers” into a modern search enginesuch as Google or Bing, it anticipates what you want and sug‐gests popular completions, such as “flowers online” and “flowersfor algernon” (a popular book and movie title) Typing “restau‐rants” will probably offer you local results Similarly, a BI solu‐tion should instantly fashion charts or other answers while youare typing, predicting what you want based on its knowledge ofprevious queries and the data sets themselves It should get bet‐ter over time as it learns more about what each user wants andoffer more relevant suggestions

A single, accurate answer

Unlike web search engines that can return multiple results inrelevance-ranked order, the BI interface should return just whatthe user asked for, leaving out extraneous results Ideally, whenthe user wants a simple answer such as “revenue for Californialast year” the interface should return a single figure instead of atable of values the user has to interpret, or a list of links to pastreports or dashboards for the user to sift through to find theanswer

Diverse data sets

The BI solution should be able to use structured data through‐out the organization, from many different databases and evenmore informal sources such as spreadsheets All these sourcesshould be combined smoothly, and the solution should recog‐nize relationships among the columns of databases so that it cancombine this data in visualizations and other results

Trang 12

many columns of many tables and still return results in realtime.

in using their corporate credentials

Administrators should be able to set up security for individualusers or for groups, controlling access at the level of a saveddashboard or chart, a column (such as a column in an HR tablethat has compensation data), or a row (customer informationfor the West Coast might be hidden from a sales rep in the EastCoast, for example)

How does a BI solution like this change the way we do business?How does the reduction in response time for a query, from days toseconds, lead to a higher top line and lower costs?

Instead of waiting to see past performance of sales, the general man‐ager of a business unit can see real-time sales performance andmake inventory allocation decisions based on real-time demand.Business processes are undergoing complete disruptions as pre-calculated transformations are now possible on demand

The impact becomes even greater as interfaces are able to anticipatewhat a user wants and bring into sharp focus ideas that are justemerging This anticipation can be based on previous queries—forinstance, if someone searches for information on California, theinterface would check its cached queries and notice similar searchesfor information on New York, then suggest a related result Every‐one has a unique approach to asking questions, so personalizing thesuggestions makes the experience a lot more relevant and user-friendly The interface can also look at the data itself: for instance, ineach column the interface anticipates that the user is likely torequest values that are more commonly found there

Trang 13

Getting Insights from Diverse Data

Enterprises’ data sources come in several flavors:

• Data warehouses often store tens, hundreds, or terabytes of his‐torical data in relational tables accessed through SQL

• Applications, both on-premise and in the cloud, produce resultsthat can be input into BI Recent years have seen a notableincrease in cloud enterprise applications offered by vendorssuch as Salesforce and NetSuite

• The ubiquitous spreadsheets spread across desktops and laptopsacross the enterprise that individuals use to analyze subsets ofdata

• With the increasing spread of Hadoop, Spark, and other “bigdata” technologies within the enterprise, data sources with rela‐tively loose document formats are becoming an important cate‐gory as well

The more sources of data a search engine can handle, the more use‐ful it becomes—not only because more of the organization’s data issearchable, but because the different sources can work together andadd extra meaning However, one of the most time-consumingproblems faced by BI analysts is the integration of multiple datasources, especially non-relational data A search-driven interface canhelp with this, by offering a visual and easy way for analysts to dis‐cover bad or stale data, and exclude it from the scope of data that’svisible to business users

Therefore, integrating sources and indexing their content for quickretrieval is the key initial task for interactive BI and analytics TheThoughtSpot Analytical Search Appliance uses a variety of interfaces

to integrate data from various sources:

• Data is loaded from data marts or data warehouses through theenterprises’ chosen ETL tools, and through a JDBC/ODBCinterface that can be used to connect data sources directly toThoughtSpot Data can also be directly loaded into Thought‐Spot through bulk data load scripts These are highly efficient,loading the data at multi-terabyte-per-hour speeds in a scale-outfashion across all the nodes

Getting Insights from Diverse Data | 7

Trang 14

• For cloud data sources, in addition to the above options,ThoughtSpot has partnered with vendors to use their individualproducts, such as Informatica’s Cloud Connector, to load data.

• Spreadsheets can be uploaded by individual users through aninterface in the product that guides the user through the pro‐cess As part of that workflow, the user can also specify whethershe wishes to link a column from this spreadsheet to any othercolumn present in the system so that she can analyze local datapresent on her computer against company-wide data from theirdata warehouse

ThoughtSpot understands the underlying schema and relationshipsbetween your data when you load it, so as soon as it is loaded, it isready to be searched without any additional modeling work Thesystem also works across any time granularity—weekly, quarterly,yearly—without requiring the BI team to build new aggregate tables,OLAP cubes, and materialized views This helps business users tostart using the system as soon as the IT/BI team has loaded data into

it And as the user types queries that connect multiple tablestogether, the multiple join path choices are all handled under thehood so the user does not have to know any SQL terminology toconnect diverse data sets together and complete her query Thought‐Spot is able to provide sub-second response times for searches overbillions of rows of data because of its purpose-built, in-memoryrelational cache This cache understands search semantics and secu‐rity rules, as well as query plans, and is able to scale out across hun‐dreds of nodes

Once the data is loaded, ThoughtSpot creates an index to maximizethe speed of queries For data volumes in terabytes, the index needs

to be efficiently sharded and distributed across multiple nodeswithout compromising on search latency The creation of the indexitself must be distributed so that there is minimal delay betweenwhen new data shows up in the system and when it is ready to besearched

Microsoft’s Power BI features integration with external tools, bothfrom Microsoft and from partners such as Salesforce and Zendesk.The Power BI interface helps the user find these resources—databa‐ses, spreadsheets, Hadoop data stores, even social media sites—andconnect to them A relational database provides its own schema,whereas Power BI creates the schema for a spreadsheet, normallyusing the first row as column names Figure 4 shows an entity-

Ngày đăng: 12/11/2019, 22:29