8. b What is Big Data English(1) tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các lĩnh...
Trang 1© 2009 IBM Corporation
What is Big Data?
Paul Zikopoulos
Director - IM WW Technical Professionals, WW Competitive Database,
WW Big Data Tiger Team
Trang 2© 2012 IBM Corporation 2
Agenda
What is Big Data?
What makes Big Data different?
What can you do with Big Data?
Big Data use cases
The IBM Big Data platform
Getting started
Trang 3© 2012 IBM Corporation 3
New IM Technology Trends
Information Integration and Governance & Big Data
Analyze Integrate
Manage
Business Analytics Applications
Data
Content Data
Streaming Information
Govern
Quality
Security &
Privacy Lifecycle
Data Warehouses
Trusted Relevant Governed
Trang 4© 2012 IBM Corporation 4
Trang 5© 2012 IBM Corporation 5
An increasingly sensor-enabled and instrumented
business environment generates HUGE volumes of
data with MACHINE SPEED characteristics…
1 BILLION lines of code EACH engine generating 10 TB every 30 minutes!
Trang 6© 2012 IBM Corporation 6
350B
Transactions/Year
Meter Reads every 15 min
3.65B – meter reads/day 120M – meter reads/month
Trang 7© 2012 IBM Corporation 7
In August of 2010, Adam Savage, of “Myth Busters,”
took a photo of his vehicle using his smartphone He then posted the photo to his Twitter account including the phrase “Off to work.”
Since the photo was taken by his smartphone, the image contained metadata revealing the exact geographical
location the photo was taken
By simply taking and posting
a photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work
Trang 8© 2012 IBM Corporation 8
The Social Layer in a Instrumented Interconnected World
12+ TBs
of tweet data every day
25+ TBs of log data every day
people
on the Web by end 2011
30 billion RFID tags today (1.3B in 2005)
4.6 billion
camera phones world wide
100s of millions
of GPS enabled
devices sold annually
76 million smart meters in 2009…
200M by 2014
Trang 9© 2012 IBM Corporation 9
Twitter Tweets per Second Record Breakers of 2011
Trang 10© 2012 IBM Corporation 10
Can a Social Media Persona be Monetized?
Trang 11© 2012 IBM Corporation 11
Extract Intent, Life Events, Micro Segmentation Attributes
Jo Jobs Tina Mu Tom Sit
Trang 12© 2012 IBM Corporation 12
4Trillion 8GB iPods
1.8 ZB
1 ZB
1 ZB=1T GB
Trang 13© 2012 IBM Corporation 13
What is “BIG DATA”?
All kinds of data Large volumes Valuable insight, but difficult to extract Often extremely time sensitive
Trang 14© 2012 IBM Corporation 14
What makes big data technology different?
Jobs distributed across affordable hardware.
Manages and analyzes all kinds of data
Analyzes data in native format.
Trang 15© 2012 IBM Corporation 15
Extracting insight from an immense volume, variety and velocity of data,
in context, beyond what was previously possible
Big Data Includes Any of the following Characteristics
Manage the complexity of data in many different
structures, ranging from relational, to logs,
Variety:
Velocity:
Volume:
Trang 16© 2012 IBM Corporation 16
What can you do with big data?
Analyze a Variety of Information
RFID tracking & analysis
Analyze Extreme Volumes
of Information
Transaction analysis to create insight-based
product/service offerings
Fraud modeling & detection
Risk modeling & management
Social media/sentiment analysis
Manage and Plan
Operational analytics – BI reporting
Planning and forecasting analysis
Predictive analysis
…
Trang 17© 2012 IBM Corporation 17
Applications for Big Data Analytics
Trang 18© 2012 IBM Corporation 18
systems
Cybersecurity
Health & Life Sciences
Epidemic early warning
Trang 19© 2012 IBM Corporation 19
Data AVAILABLE to
an organization
Data an organization can PROCESS
The Big Data Conundrum
The economies of deletion have changed….
– Leading us into new opportunities and challenges
The percentage of available data an enterprise can analyze is
decreasing proportionately to the available to that enterprise
Quite simply, this means as enterprises, we are getting
“more naive” about our business over time
Trang 20© 2012 IBM Corporation 20
20
Public wind data is available on 284km
x 284 km grids (2.5o LAT/LONG)
More data means more accurate and richer models (adding hundreds of variables)
- Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term
- Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations
Reduced turbine placement identification from weeks to hours
Perspective: The Vestas Wind library,
as HD TV would take 70 years to watch
20
Trang 21© 2012 IBM Corporation 21
21
Optimize building energy consumption with centralized monitoring and control of
building monitoring system
Automates preventive and corrective maintenance of building corrective systems
Uses Streams, InfoSphere BigInsights and Cognos
- Log Analytics
- Energy Bill Forecasting
- Energy consumption optimization
- Detection of anomalous usage
- Presence-aware energy mgt.
- Policy enforcement
Trang 22© 2012 IBM Corporation 22
Capture weather sensor data, analyses hurricane
predicted path
Estimate impact on inventories
Supply Chain Recommendation for Natural Disasters
Compute shipping and logistics costs
Make recommendations and notify
Capture market data to calculate cost of stock outs (high volume)
DHTML Result rendering
Trang 23© 2012 IBM Corporation 23
Real-time projections
of hurricane path
Dynamically updated risk assessment for assets in projected path
Correlate combined risk and
impending weather threats to
optimize inventory and
determine supply chain
recommendations
Trang 24© 2012 IBM Corporation 24
Retailers collect click-stream data from Web site interactions and loyalty card-drive transaction data
– This traditional POS information is used by retailer for shopping basket analysis, inventory
replenishment, +++
– But data is being provided to suppliers for customer buying analysis
Healthcare has traditionally been dominated by paper-based systems, but this information is getting digitized
Science is increasingly dominated by big science initiatives
– Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center; then sent to laboratories
Financial services are seeing larger volumes through smaller trading sizes, increased market
volatility, and technological improvements in automated and algorithmic trading
Improved instrument and sensory technology
– Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider Oil and Gas industry
Bigger and Bigger Volumes of Data
Trang 25© 2012 IBM Corporation 25
Social Network Public
Database How valuable is Amy to my retail sales? Who does she influence?
What do they spend?
Telco Score: 91CPG Score: 76Fashion Score: 88
Monetizing Relationships, Not Just Transactions
Merged Network Calling Network
Trang 26© 2012 IBM Corporation 26
Watson’s advanced analytic capabilities can sort through the equivalent of
200 MILLION pages of data to uncover an answer in 3 SECONDS.
Trang 27© 2012 IBM Corporation 27
Why Didn’t We Use All of the Big Data Before?
Trang 28© 2012 IBM Corporation 28
Trang 29© 2012 IBM Corporation 29
The IBM Big Data Platform
Trang 30© 2012 IBM Corporation 30
The IBM Big Data Platform
IBM Netezza 1000
BI+Ad Hoc Analytics on Structured Data
IBM Smart Analytics System
Operational Analytics on Structured Data
IBM Informix Timeseries
Trang 31© 2012 IBM Corporation 31
What Does a Big Data Platform Do?
Analyze Information in Motion
Streaming data analysis Large volume data bursts and ad-hoc analysis
Analyze a Variety of Information
Novel analytics on a broad set of mixed information that could not be analyzed before
discover and Experiment
Ad-hoc analytics, data discovery and experimentation
Analyze Extreme Volumes of Information
Cost-efficiently process and analyze PBs of information Manage & analyze high volumes of structured, relational data
Manage and Plan
Enforce data structure, integrity and control to ensure consistency for repeatable queries
Trang 32© 2012 IBM Corporation 32
Big Data Enriches the Information Management Ecosystem
Who Ran What, Where, and When?
Audit MapReduce Jobs and tasks
Managing a Governance Initiative
OLTP Optimization (SAP, checkout, +++)
Master Data Enrichment via Life Events, Hobbies, Roles, +++
Establishing
Information
as a Service
Active Archive Cost Optimization
Trang 33© 2012 IBM Corporation 33
– Business uses Big Data
– Roadmap and capabilities gaps
– Business value assessment
Trang 34© 2012 IBM Corporation 34
34
THINK