9 Aligning Technologists and Business Stakeholders 10 Achieving the “Outrageous” with Big Data 11 Monetizing Big Data 13 Why Vertica?. 41 Don’t Forget to Consider Your End User When Desi
Trang 4Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
Trang 5[LSI]
The Big Data Transformation
by Alice LaPlante
Copyright © 2017 O’Reilly Media Inc All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://safaribooksonline.com) For more information, contact our corporate/institutional sales department:
800-998-9938 or corporate@oreilly.com.
Editors: Tim McGovern and
Debbie Hardin
Production Editor: Colleen Lobner
Copyeditor: Octal Publishing Inc.
Interior Designer: David Futato
Cover Designer: Randy Comer
Illustrator: Rebecca Demarest November 2016: First Edition
Revision History for the First Edition
2016-11-03: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc The Big Data
Transformation, the cover image, and related trade dress are trademarks of O’Reilly
Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limi‐ tation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsi‐ bility to ensure that your use thereof complies with such licenses and/or rights.
Trang 6Table of Contents
1 Introduction 1
Big Data: A Brief Primer 1
A Crowded Marketplace for Big Data Analytical Databases 2
Yes, You Need Another Database: Finding the Right Tool for the Job 4
Sorting Through the Hype 7
2 Where Do You Start? Follow the Example of This Data-Storage Company 9
Aligning Technologists and Business Stakeholders 10
Achieving the “Outrageous” with Big Data 11
Monetizing Big Data 13
Why Vertica? 13
Choosing the Right Analytical Database 14
Look for the Hot Buttons 16
3 The Center of Excellence Model: Advice from Criteo 17
Keeping the Business on the Right Big-Data Path 20
The Risks of Not Having a CoE 22
The Best Candidates for a Big Data CoE 22
4 Is Hadoop a Panacea for All Things Big Data? YPSM Says No 23
YP Transforms Itself Through Big Data 25
5 Cerner Scales for Success 29
A Mammoth Proof of Concept 30
Providing Better Patient Outcomes 32
v
Trang 7Vertica: Helping to Keep the LightsOn 33
Crunching the Numbers 35
6 Whatever You Do, Don’t Do This, Warns Etsy 41
Don’t Forget to Consider Your End User When Designing Your Analytics System 41
Don’t Underestimate Demand for Big-Data Analytics 42
Don’t Be Nạve About How Fast Big-Data Grows 43
Don’t Discard Data 44
Don’t Get Burdened with Too Much “Technical Debt” 44
Don’t Forget to Consider How You’re Going to Get Data into Your New Database 45
Don’t Build the Great Wall of China Between Your Data Engineering Department and the Rest of the Company 46
Don’t Go Big Before You’ve Tried It Small 47
Don’t Think Big Data Is Simply a Technical Shift 47
vi | Table of Contents
Trang 8CHAPTER 1
Introduction
We are in the age of data Recorded data is doubling in size everytwo years, and by 2020 we will have captured as many digital bits asthere are stars in the universe, reaching a staggering 44 zettabytes, or
44 trillion gigabytes Included in these figures is the business datagenerated by enterprise applications as well as the human data gen‐erated by social media sites like Facebook, LinkedIn, Twitter, andYouTube
Big Data: A Brief Primer
Gartner’s description of big data—which focuses on the “three Vs”:volume, velocity, and variety—has become commonplace Big datahas all of these characteristics There’s a lot of it, it moves swiftly,and it comes from a diverse range of sources
A more pragmatic definition is this: you know you have big datawhen you possess diverse datasets from multiple sources that are toolarge to cost-effectively manage and analyze within a reasonabletimeframe when using your traditional IT infrastructures This datacan include structured data as found in relational databases as well
as unstructured data such as documents, audio, and video
IDG estimates that big data will drive the transformation of ITthrough 2025 Key decision-makers at enterprises understand this
Eighty percent of enterprises have initiated big data–driven projects
as top strategic priorities And these projects are happening acrossvirtually all industries Table 1-1 lists just a few examples
1
Trang 9Table 1-1 Transforming business processes across industries
Industry Big data use cases
Automotive Auto sensors reporting vehicle location problems
Financial services Risk, fraud detection, portfolio analysis, new product development
Manufacturing Quality assurance, warranty analyses
Healthcare Patient sensors, monitoring, electronic health records, quality of care
Oil and gas Drilling exploration sensor analyses
Retail Consumer sentiment analyses, optimized marketing, personalized targeting,
market basket analysis, intelligent forecasting, inventory management Utilities Smart meter analyses for network capacity, smart grid
Law enforcement Threat analysis, social media monitoring, photo analysis, traffic optimization Advertising Customer targeting, location-based advertising, personalized retargeting, churn
to have the right tool for the job Gartner calls this “best-fit engi‐neering.”
This is especially true when it comes to databases Databases formthe heart of big data They’ve been around for a half century Butthey have evolved almost beyond recognition during that time.Today’s databases for big data analytics are completely different ani‐mals than the mainframe databases from the 1960s and 1970s,although SQL has been a constant for the last 20 to 30 years
There have been four primary waves in this database evolution
Mainframe databases
The first databases were fairly simple and used by government,financial services, and telecommunications organizations toprocess what (at the time) they thought were large volumes oftransactions But, there was no attempt to optimize eitherputting the data into the databases or getting it out again Andthey were expensive—not every business could afford one
2 | Chapter 1: Introduction
Trang 10Online transactional processing (OLTP) databases
The birth of the relational database using the client/servermodel finally brought affordable computing to all businesses.These databases became even more widely accessible throughthe Internet in the form of dynamic web applications and cus‐tomer relationship management (CRM), enterprise resourcemanagement (ERP), and ecommerce systems
Data warehouses
The next wave enabled businesses to combine transactional data
—for example, from human resources, sales, and finance—together with operational software to gain analytical insight intotheir customers, employees, and operations Several databasevendors seized leadership roles during this time Some werenew and some were extensions of traditional OLTP databases
In addition, an entire industry that brought forth business intel‐ligence (BI) as well as extract, transform, and load (ETL) toolswas born
Big data analytics platforms
During the fourth wave, leading businesses began recognizingthat data is their most important asset But handling the vol‐ume, variety, and velocity of big data far outstripped the capa‐bilities of traditional data warehouses In particular, previouswaves of databases had focused on optimizing how to get data
into the databases These new databases were centered on get‐ ting actionable insight out of them The result: today’s analytical
databases can analyze massive volumes of data, both structuredand unstructured, at unprecedented speeds Users can easilyquery the data, extract reports, and otherwise access the data tomake better business decisions much faster than was possiblepreviously (Think hours instead of days and seconds/minutesinstead of hours.)
One example of an analytical database—the one we’ll explore in thisdocument—is Vertica from Hewlett Packard Enterprise (HPE).Vertica is a massively parallel processing (MPP) database, whichmeans it spreads the data across a cluster of servers, making it possi‐ble for systems to share the query-processing workload Created bylegendary database guru and Turing award winner Michael Stone‐braker, and then acquired by HP, the Vertica Analytics Platform waspurpose-built from its very first line of code to optimize big-dataanalytics
A Crowded Marketplace for Big Data Analytical Databases | 3
Trang 11Three things in particular set Vertica apart, according to Colin Mah‐ony, senior vice president and general manager for HPE SoftwareBig Data:
• Its creators saw how rapidly the volume of data was growing,and designed a system capable of scaling to handle it from theground up
• They also understood all the different analytical workloads thatbusinesses would want to run against their data
• They realized that getting superb performance from the data‐base in a cost-effective way was a top priority for businesses
Yes, You Need Another Database: Finding the Right Tool for the Job
According to Gartner, data volumes are growing 30 percent to 40percent annually, whereas IT budgets are only increasing by 4 per‐cent Businesses have more data to deal with than they have money.They probably have a traditional data warehouse, but the sheer size
of the data coming in is overwhelming it They can go the data lake
route, and set it up on Hadoop, which will save money while captur‐ing all the data coming in, but it won’t help them much with theanalytics that started off the entire cycle This is why these busi‐nesses are turning to analytical databases
Analytical databases typically sit next to the system of record—whether that’s Hadoop, Oracle, or Microsoft—to perform speedyanalytics of big data
In short: people assume a database is a database, but that’s not true.Here’s a metaphor created by Steve Sarsfield, a product-marketingmanager at HPE, to articulate the situation (illustrated in
Figure 1-1):
If you say “I need a hammer,” the correct tool you need is deter‐ mined by what you’re going to do with it.
4 | Chapter 1: Introduction
Trang 12Figure 1-1 Different hammers are good for different things
The same scenario is true for databases Depending on what youwant to do, you would choose a different database, whether an MPPanalytical database like Vertica, an XML database, or a NoSQL data‐base—you must choose the right tool for the job you need to do.You should choose based upon three factors: structure, size, andanalytics Let’s look a little more closely at each:
Still, though, the three main considerations remain structure, size,and analytics Vertica’s sweet spot, for example, is performing long,deep queries of structured data at rest that have fixed schemas Buteven then there are ways to stretch the spectrum of what Vertica can
Yes, You Need Another Database: Finding the Right Tool for the Job | 5
Trang 13do by using technologies such as Kafka and Flex Tables, as demon‐strated in Figure 1-2.
Figure 1-2 Stretching the spectrum of what Vertica can do
In the end, the factors that drive your database decision are the sameforces that drive IT decisions in general You want to:
Increase revenues
You do this by investing in big-data analytics solutions thatallow you to reach more customers, develop new product offer‐ings, focus on customer satisfaction, and understand your cus‐tomers’ buying patterns
Enhance efficiency
You need to choose big data analytics solutions that reducesoftware-licensing costs, enable you to perform processes moreefficiently, take advantage of new data sources effectively, andaccelerate the speed at which that information is turned intoknowledge
Improve compliance
Finally, your analytics database must help you to comply withlocal, state, federal, and industry regulations and ensure thatyour reporting passes the robust tests that regulatory mandatesplace on it Plus, your database must be secure to protect theprivacy of the information it contains, so that it’s not stolen orexposed to the world
6 | Chapter 1: Introduction
Trang 14Sorting Through the Hype
There’s so much hype about big data that it can be difficult to knowwhat to believe We maintain that one size doesn’t fit all when itcomes to big-data analytical databases The top-performing organi‐zations are those that have figured out how to optimize each part oftheir data pipelines and workloads with the right technologies.The job of vendors in this market: to keep up with standards so thatbusinesses don’t need to rip and replace their data schemas, queries,
or frontend tools as their needs evolve
In this document, we show the real-world ways that leading busi‐nesses are using Vertica in combination with other best-in-class big-data solutions to solve real business challenges
Sorting Through the Hype | 7
Trang 16When selling big data to your company, you need to know youraudience Big data can deliver massive benefits to the business, butyou must know your audience’s interests.
For example, you might know that big data gets you the following:
• 360-degree customer view (improving customer “stickiness”)via cloud services
• Rapid iteration (improving product innovation) via engineeringinformatics
• Force multipliers (reducing support costs) via support automa‐tion
But if others within the business don’t realize what these benefitsmean to them, that’s when you need to begin evangelizing:
• Envision the big-picture business value you could be gettingfrom big data
9
Trang 17• Communicate that vision to the business and then explainwhat’s required from them to make it succeed
• Think in terms of revenues, costs, competitiveness, and sticki‐ness, among other benefits
Table 2-1 shows what the various stakeholders you need to convincewant to hear
Table 2-1 Know your audience
Analysts want: Business owners
want: IT professionals want: Data scientists want:
SQL and ODBC New revenue
ACID for consistency Sheer speed for
critical answers MPP shared-nothingarchitecture R for in-databaseanalytics The ability to integrate
big-data solutions into
current BI and reporting
tools
Increased operational efficiency
Lower TCO from a reduced footprint Tools to creativelyexplore the big data
Aligning Technologists and Business
Stakeholders
Larry Lancaster, a former chief data scientist at a company offeringhardware and software solutions for data storage and backup, thinksthat getting business strategists in line with what technologists know
is right is a universal challenge in IT “Tech people talk in a languagethat the business people don’t understand,” says Lancaster “Youneed someone to bridge the gap Someone who understands fromboth sides what’s needed, and what will eventually be delivered,”
he says
The best way to win the hearts and minds of business stakeholders:show them what’s possible “The answer is to find a problem, andmake an example of fixing it,” says Lancaster
The good news is that today’s business executives are well aware ofthe power of data But the bad news is that there’s been a certainamount of disappointment in the marketplace “We hear storiesabout companies that threw millions into Hadoop, but got nothingout of it,” laments Lancaster These disappointments make execu‐tives reticent to invest large sums
10 | Chapter 2: Where Do You Start? Follow the Example of This Data-Storage Company
Trang 18Lancaster’s advice is to pick one of two strategies: either start smalland slowly build success over time, or make an outrageous claim toget people’s attention Here’s his advice on the gradual tactic:
The first approach is to find one use case, and work it up yourself,
in a day or two Don’t bother with complicated technology; use Excel When you get results, work to gain visibility Talk to people above you Tell them you were able to analyze this data and that Bob in marketing got an extra 5 percent response rate, or that your support team closed cases 10 times faster.
Typically, all it takes is one or two persons to do what Lancaster calls
“a little big-data magic” to convince people of the value of the tech‐nology
The other approach is to pick something that is incredibly aggres‐sive, and you make an outrageous statement Says Lancaster:
Intrigue people Bring out amazing facts of what other people are doing with data, and persuade the powers that be that you can do
it, too.
Achieving the “Outrageous” with Big Data
Lancaster knows about taking the second route As chief data scien‐tist, he built an analytics environment from the ground up that com‐pletely eliminated Level 1 and Level 2 support tickets
Imagine telling a business that it could almost completely make rou‐tine support calls disappear No one would pass up that opportunity
“You absolutely have their attention,” said Lancaster
This company offered businesses a unique storage value proposition
in what it calls predictive flash storage Rather than forcing busi‐
nesses to choose between hard drives (cheap but slow) and solidstate drives, (SSDs—fast but expensive) for storage, they offered thebest of both worlds By using predictive analytics, they built systemsthat were very smart about what data went onto the different types
of storage For example, data that businesses were going to read ran‐domly went onto the SSDs Data for sequential reads—or perhaps
no reads at all—were put on the hard drives
How did they accomplish all this? By collecting massive amounts ofdata from all the devices in the field through telemetry, and sending
it back to its analytics database, Vertica, for analysis
Achieving the “Outrageous” with Big Data | 11
Trang 19Lancaster said it would be very difficult—if not impossible—to sizedeployments or use the correct algorithms to make predictive stor‐age products work without a tight feedback loop to engineering.
We delivered a successful product only because we collected enough information, which went straight to the engineers, who kept iterating and optimizing the product No other storage vendor understands workloads better than us They just don’t have the tele‐ metry out there.
And the data generated by the telemetry was huge The companywere taking in 10,000 to 100,000 data points per minute from eacharray in the field And when you have that much data and beginrunning analytics on it, you realize you could do a lot more, accord‐ing to Lancaster
We wanted to increase how much it was paying off for us, but we needed to do bigger queries faster We had a team of data scientists and didn’t want them twiddling their thumbs That’s what brought
us to Vertica.
Without Vertica helping to analyze the telemetry data, they wouldhave had a traditional support team, opening cases on problems inthe field, and escalating harder issues to engineers, who would thenneed to simulate processes in the lab
“We’re talking about a very labor-intensive, slow process,” said Lan‐caster, who believes that the entire company has a better under‐standing of the way storage works in the real world than any otherstorage vendor—simply because it has the data
As a result of the Vertica deployment, this business opens and closes
80 percent of its support cases automatically Ninety percent areautomatically opened There’s no need to call customers up and askthem to gather data or send log posts Cases that would ordinarilytake days to resolve get closed in an hour
They also use Vertica to audit all of the storage that its customershave deployed to understand how much of it is protected “We knowwith local snapshots, how much of it is replicated for disaster recov‐ery, how much incremental space is required to increase retentiontime, and so on,” said Lancaster This allows them to go to custom‐ers with proactive service recommendations for protecting theirdata in the most cost-effective manner
12 | Chapter 2: Where Do You Start? Follow the Example of This Data-Storage Company
Trang 20Monetizing Big Data
Lancaster believes that any company could find aspects of support,marketing, or product engineering that could improve by at leasttwo orders of magnitude in terms of efficiency, cost, and perfor‐mance if it utilized data as much as his organization did
More than that, businesses should be figuring out ways to monetizethe data
For example, Lancaster’s company built a professional services offer‐ing that included dedicating an engineer to a customer account, notjust for the storage but also for the host side of the environment, tooptimize reliability and performance This offering was fairly expen‐sive for customers to purchase In the end, because of analyses per‐formed in Vertica, the organization was able to automate nearly all
of the service’s function Yet customers were still willing to pay topdollar for it Says Lancaster:
Enterprises would all sign up for it, so we were able to add 10 per‐ cent to our revenues simply by better leveraging the data we were already collecting Anyone could take their data and discover a sim‐ ilar revenue windfall.
Already, in most industries, there are wars as businesses race for acompetitive edge based on data
For example, look at Tesla, which brings back telemetry from everycar it sells, every second, and is constantly working on optimizingdesigns based on what customers are actually doing with their vehi‐cles “That’s the way to do it,” says Lancaster
But as he began to use Vertica more and more, he realized that theperformance benefits achievable were another order of magnitude
Monetizing Big Data | 13
Trang 21beyond what you would expect with just the column-store effi‐ciency.
It’s because Vertica allows you to do some very efficient types of encoding on your data So all of the low cardinality columns that would have been wasting space in a row store end up taking almost
no space at all.
According to Lancaster, Vertica is the data warehouse the marketneeded for 20 years, but didn’t have “Aggressive encoding comingtogether with late materialization in a column store, I have to say,was a pivotal technological accomplishment that’s changed the data‐base landscape dramatically,” he says
On smaller Vertica queries, his team of data scientists were onlyexperiencing subsecond latencies On the large ones, it was gettingsub-10-second latencies
It’s absolutely amazing It’s game changing People can sit at their desktops now, manipulate data, come up with new ideas and iterate without having to run a batch and go home It’s a dramatic increase
in productivity.
What else did they do with the data? Says Lancaster, “It was more
like, ‘what didn’t we do with the data?’ By the time we hired BI peo‐
ple everything we wanted was uploaded into Vertica, not just tele‐metry, but also Salesforce, and a lot of other business systems, and
we had this data warehouse dream in place,” he says
Choosing the Right Analytical Database
As you do your research, you’ll find that big data platforms are often
suited for special purposes But you want a general solution with lots
of features, such as the following:
Trang 22Even before being acquired by what was at that point HP, Verticawas the biggest big data pure-play analytical database A feature-richgeneral solution, it had everything that Lancaster’s organizationneeded:
• Scale-out MPP architecture
• SQL database with ACID compliance
• R-integrated window functions, distributed R
Vertica’s performance-first design makes big data smaller in motionwith the following design features:
Even when it didn’t use in-line compression, the company stillachieved approximately 25 times reduction in storage footprint withVertica post compression This resulted in radically lower TCO for
Choosing the Right Analytical Database | 15
Trang 23the same performance and significantly better performance for thesame TCO.
Look for the Hot Buttons
So, how do you get your company started on a big-data project?
“Just find a problem your business is having,” advised Lancaster
“Look for a hot button And instead of hiring a new executive tosolve that problem, hire a data scientist.”
Say your product is falling behind in the market—that means yourfeedback to engineering or product development isn’t fast enough.And if you’re bleeding too much in support, that’s because you don’thave sufficient information about what’s happening in the field
“Bring in a data scientist,” advises Lancaster “Solve the problemwith data.”
Of course, showing an initial ROI is essential—as is having a vision,and a champion “You have to demonstrate value,” says Lancaster
“Once you do that, things will grow from there.”
16 | Chapter 2: Where Do You Start? Follow the Example of This Data-Storage Company
Trang 24Could you benefit from a big-data CoE? Criteo has, and it has someadvice for those who would like to create one for their business.According to Justin Coffey, a senior staff development lead at theperformance marketing technology company, whether you formallycall it a CoE or not, your big-data analytics initiatives should be led
by a team that promotes collaboration with and between users andtechnologists throughout your organization This team should alsoidentify and spread best practices around big-data analytics to drivebusiness- or customer-valued results HPE uses the term “datademocratization” to describe organizations that increase access todata from a variety of internal groups in this way
17
Trang 25That being said, even though the model tends to be variable acrosscompanies, the work of the CoE tends to be quite similar, including(but not limited to) the following:
• Defining a common set of best practices and work standardsaround big data
• Assessing (or helping others to assess) whether they are utiliz‐ing big data and analytics to best advantage, using the afore‐mentioned best practices
• Providing guidance and support to assist engineers, program‐mers, end users, and data scientists, and other stakeholders toimplement these best practices
Coffey is fond of introducing Criteo as “the largest tech companyyou’ve never heard of.” The business drives conversions for advertis‐ers across multiple online channels: mobile, banner ads, and email.Criteo pays for the display ads, charges for traffic to its advertisers,and optimizes for conversions Based in Paris, it has 2,200 employ‐ees in more than 30 offices worldwide, with more than 400 engi‐neers and more than 100 data analysts
Criteo enables ecommerce companies to effectively engage and con‐vert their customers by using large volumes of granular data It hasestablished one of the biggest European R&D centers dedicated toperformance marketing technology in Paris and an internationalR&D hub in Palo Alto By choosing Vertica, Criteo gets deepinsights across tremendous data loads, enabling it to optimize theperformance of its display ads delivered in real-time for each indi‐vidual consumer across mobile, apps, and desktop
The breadth and scale of Criteo’s analytics stack is breathtaking.Fifty billion total events are logged per day Three billion bannersare served per day More than one billion unique users per monthvisit its advertisers’ websites Its Hadoop cluster ingests more than
25 TB a day The system makes 15 million predictions per secondout of seven datacenters running more than 15,000 servers, withmore than five petabytes under management
18 | Chapter 3: The Center of Excellence Model: Advice from Criteo
Trang 26Overall, however, it’s a fairly simple stack, as Figure 3-1 illustrates.Criteo decided to use:
• Hadoop to store raw data
• HPE Vertica database for data warehousing
• Tableau as the frontend data analysis and reporting tool
With a thousand users (up to 300 simultaneously during peak peri‐ods), the right setup and optimization of the Tableau server was crit‐ical to ensure the best possible performance
Figure 3-1 The performance marketing technology company’s big-data analytics stack
Criteo started by using Hadoop for internal analytics, but soonfound that its users were unhappy with query performance, andthat direct reporting on top of Hadoop was unrealistic “We havepetabytes available for querying and add 20 TB to it every day,” saysCoffey
Using a Hadoop framework as calculation engine and HPE Vertica
to analyze structured and unstructured data, Criteo generates intelli‐gence and profit from big data The company has experienceddouble-digit growth since its inception, and Vertica allows it to keep
up with the ever-growing volume of data Criteo uses Vertica to dis‐tribute and order data to optimize for specific query scenarios ItsVertica cluster is 75 TB on 50 CPU heavy nodes and growing
The Center of Excellence Model: Advice from Criteo | 19
Trang 27Observed Coffey, “Vertica can do many things, but is best at acceler‐ating ad hoc queries.” He made a decision to load the business-critical subset of the firm’s Hive data warehouse into Vertica, and tonot allow data to be built or loaded from anywhere else.
The result: with a modicum of tuning, and nearly no day-to-daymaintenance, analytic query throughput skyrocketed Criteo loadsabout 2 TB of data per day into Vertica It arrives mostly in dailybatches and takes about an hour to load via Hadoop streaming jobsthat use the Vertica command-line tool (vsql) to bulk insert
Here are the recommended best practices from Criteo:
Without question, the most important thing is to simplify
For example: sole-sourcing data for Vertica from Hadoop pro‐vides an implicit backup It also allows for easy replication tomultiple clusters Because you can’t be an expert in everything,focus is key Plus, it’s easier to train colleagues to contribute to asimple architecture
Optimizations tend to make systems complex
If your system is already distributed (for example, in Hadoop,Vertica), scale out (or perhaps up) until that no longer works InCoffey’s opinion, it’s okay to waste some CPU cycles “Hadoopwas practically designed for it,” states Coffey “Vertica lets us dothings we were otherwise incapable of doing and with very littleDBA overhead—we actually don’t have a Vertica databaseadministrator—and our users consistently tell us it’s their favor‐ite tool we provide.”
Coffey estimates that thanks to its flexible projections, performancewith Vertica can be orders of magnitude better than Hadoop solu‐tions with very little effort
Keeping the Business on the Right Big-Data Path
Although Criteo doesn’t formally call it a “Center of Excellence,” itdoes have a central team dedicated to making sure that all activitiesaround big-data analytics follow best practices Says Coffey:
It fits the definition of a Center of Excellence because we have a mix
of professionals who understand how databases work at the inner‐
20 | Chapter 3: The Center of Excellence Model: Advice from Criteo