1. Trang chủ
  2. » Công Nghệ Thông Tin

What leaders must know about data for machine learning

7 3 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề What Leaders Must Know About Data for Machine Learning
Trường học Massachusetts Institute of Technology
Chuyên ngành Management and Data Strategy
Thể loại Guide
Năm xuất bản 2020
Thành phố Cambridge
Định dạng
Số trang 7
Dung lượng 4,45 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

M A N A G E R ’ S G U I D E What Leaders Must Know About Data for Machine Learning ON BEHALF OF MIT SMR CONNECTIONS MANAGER’S GUIDE — WHAT LEADERS MUST KNOW ABOUT DATA FOR MACHINE LEARNING MIT SMR CON.

Trang 1

What Leaders Must Know About

Data for Machine Learning

Trang 2

MIT SMR Connections develops content in collaboration with our sponsors

It operates independently of the MIT Sloan Management Review editorial group.

Copyright © Massachusetts Institute of Technology, 2020 All rights reserved.

What Leaders Must Know About Data to Drive Success With Machine Learning 2

1 Align machine learning initiatives with business priorities

2 Create and maintain a comprehensive view of all data assets

3 Lay the groundwork for data governance

4 Identify the specific roles required to build a strong data foundation

for machine learning

Data Management Strategy Checklist 5

Sponsor’s Viewpoint: Your Data Strategy Is Key to Machine Learning;

a Data Lake Can Help 6

Trang 3

Machine learning is taking predictive analytics to the

next level to drive tangible business value for a wide

array of industries Algorithms allow credit card

companies to detect fraud in real time and help retailers direct

offers to the customers most likely to respond In health care,

tools powered by machine learning help doctors transcribe

notes more easily so they can focus on patient care

Manufac-turers can take in data from sensors on plant equipment and

recommend maintenance before malfunctions cause

produc-tion delays

But machine learning models are only as good as the data

they ingest “If data is not clean, if it’s not accessible, if it isn’t

stitched together to form a strong foundation, the machine

learning and artificial intelligence capabilities built on top of it

will have problems,” warns Ashok Srivastava, senior vice

pres-ident and chief data officer at financial software provider

In-tuit This can lead to difficulties such as inaccurate insights or

inherent bias — factors that can hamper intelligent business

decision-making

Fortunately, businesses can avoid these perils by designing a

data management strategy that develops new capabilities,

ini-tiatives, and roles around machine learning This guide aims to

share lessons from business leaders and industry experts on

how, with the right policies and frameworks in place, data can

serve as a strategic corporate asset

1 Align machine learning initiatives with business priorities

The first step in creating an enterprise data management

strat-egy is understanding the business’s goal for machine learning

For example, Intuit’s machine learning initiatives aim to im-prove customer service by providing personalized recommen-dations to subscribers of its accounting and tax software pro-grams An online retailer may plan to use machine learning to create more-effective targeted marketing campaigns, while an automotive manufacturer may be building machine learning systems to predict equipment failures

Establishing which of a business’s strategic priorities have the best potential to be advanced via machine learning provides clarity around which data sets are most important to collect, store, and prepare for analysis

“Being focused on knowing what data is truly driving your business and matters most is the first piece to a data strategy,” says Juan Tello, chief data officer at Deloitte Consulting and principal in its Strategy & Analytics practice “So, for example,

if business priorities are to win more customers and provide more-competitive pricing based on the products a company sells, that requires three critical data domains: customer data, pricing data, and product data Prioritizing the data strategy

on those areas as a starting point will maximize business out-comes Organizations should also reevaluate and adjust as their business priorities change.”

This focus is essential, given the vast volumes of data gener-ated by enterprise applications, connected devices, and cus-tomer interactions via the web or social media platforms, to name just a few sources However, by narrowing the scope for data management to three or four key sources, businesses can focus on those data sets that will deliver the most value What Leaders Must Know About Data to

Drive Success With Machine Learning

Trang 4

2 Create and maintain a comprehensive view of all data assets

For data to be useful, a business must know it exists

Unfor-tunately, legacy systems, mergers and acquisitions, and poor

data onboarding practices can create silos of unidentified and

untagged information

At Intuit, data management experts “meet with the teams that

own data systems or data pipelines, and we start to build a

cat-alog of that information That means understanding what data

they have and how it is stored.” The result, says Srivastava, is

“a robust list of data assets that we have within the company.”

But data troves are constantly evolving as businesses deploy

new systems GE Healthcare offers a perfect example of how

to stay ahead of the curve The manufacturer of diagnostic

im-aging equipment, which uses machine learning algorithms to

improve traditional imaging technologies like CT scanning and

X-ray, continuously works with collaborators and partners to

inventory and onboard de-identified data A dedicated team

of data specialists receives, processes, and properly catalogs

contractually de-identified data sets and then uploads them

for use in AI development This process leads to greater data

transparency and availability

Business leaders must also be held accountable for

maintain-ing a comprehensive view of data assets At GE Healthcare,

chief data officer Derek Danois says, broad communication

and transparency are key to building trust: Business units now collaborate so that the company knows the moment a new data set becomes available

3 Lay the groundwork for data governance

At the core of every data management strategy is data gov-ernance — a set of rules and systems that ensures that data

is secure, handled in compliance with applicable regulations, accessible, and useable

Data security and compliance with privacy laws are table stakes and as such have been the primary drivers of data governance for most enterprises In addition to guarding against intrud-ers via cybintrud-ersecurity measures that protect the IT perimeter, businesses must also establish controls that limit how data is accessed, used, and managed by employees This typically means granting different access levels depending on vari-ables such as role, tenure, and function Compliance with regulations such as the European Union’s GDPR (General Data Protection Regulation) and similar requirements in other jurisdictions means that companies must also be prepared

to explain to consumers how their data is being used to make decisions that affect them

Another key component of data governance is quality: A machine learning model’s output depends on the quality of its training data

At Intuit, data management experts meet with

the teams that own data to build a catalog of

that information, resulting in a robust list of data

assets within the company.

Trang 5

metrics A medical imaging study might be vetted for standard-

of-care parameters (such as slice thickness or scan geometry),

field of view (the area of a scanned object), and metadata

content requirements If quality standards are met, GE

Health-care de-identifies or anonymizes the data and establishes a

chain of custody that chronicles the data’s control, transfer,

and analysis, before it’s uploaded for use in AI development

Maintaining consistently high levels of data quality calls for

continuous monitoring of metrics and key performance

indi-cators such as accuracy, timeliness, consistency, and integrity

— a process that can become overwhelming, according to

Tello Using AI-powered data quality tools can accelerate the

ability to manage and govern data, he says Enterprise master

data management software can also ease the burden by creating

a single master reference source for all critical business data,

thereby reducing redundancies and the likelihood of errors

4 Identify the specific roles required to build a strong

data foundation for machine learning

An explosion of new data science job titles has raised questions

regarding who is responsible for which tasks within a machine

learning practice A well-thought-out organizational structure

can make sense of this landscape by clarifying roles and

delin-eating responsibilities

some of the key roles required to execute a data management strategy include the following:

Chief digital/data officer: Oversees all digital functions,

provides support and leadership, and articulates a strategy for data governance that’s consistent across the company

Data scientist: Creates tools or processes based on

machine learning and applies them to well-defined business problems

Decision scientist: Uses expertise in technology, math,

and statistics, along with business domain knowledge,

to enable informed decision-making

Compliance/legal team member: Handles privacy,

compliance, data rights, and regulatory aspects impacting

a business

Ancillary positions include data management specialist, busi-ness intelligence specialist, and data architect

But there’s also a place for sales executives, HR managers, and chief marketing officers in machine learning initiatives “The business owners who are making decisions on a daily basis are some of the most important contributors to our overall data strategy,” says Intuit’s Srivastava

That’s because business leaders possess domain knowledge

— an in-depth understanding of the relevant data within the enterprise, the processes that generate useful data, what data might be useful for a model, and how different variables might impact a model’s output Without this guidance, businesses risk creating machine learning applications that don’t deliver useful results

Looking Forward

Machine learning has the potential to improve results in nearly every aspect of business But to harness it, businesses need a data management strategy that will continuously improve the quality, integrity, access, and security of data l

“The business owners

who are making decisions

on a daily basis are some

of the most important

contributors to our overall

data strategy”

ASHOK SRIVASTAVA, INTUIT

Trang 6

[ 3 ] Establish rules and processes around how data is sourced, managed, accessed,

and used across the business.

[ 3 ] Ascertain which data sets are driving the business and how they can be used to help solve problems,

generate revenue, and deliver customer benefits.

[ 3 ] Inventory known data assets, classify them, and organize them in a data catalog.

[ 3 ] Meet with the teams that own and operate data systems to better understand what data they

have and how it is stored.

[ 3 ] Understand where your data comes from, who has access, and how it can be used.

[ 3 ] Establish internal security precautions (such as provisioning user access), as well as external safeguards

(such as anonymizing data), to protect sensitive data.

[ 3 ] Create access controls that set limitations around how data is accessed and how it might be used.

[ 3 ] Design processes and systems to ensure that data created is accurate and useful.

[ 3 ] Identify specific roles required to build a strong data foundation, including chief digital officer,

data scientist, decision scientist, and compliance team member.

DATA MANAGEMENT

STRATEGY CHECKLIST

Keep the following practices in mind to successfully design and execute

a data management strategy in support of machine learning:

Trang 7

et ma quunt lam, volorei untio Commodio es delibus aut ex

eum quiatur sa desci aut magnam eum raeprat utassint

volup-tio Et voluVident Ehenitatis mo omni ut magnis sitiist, siti odis

cone doles pore laborum et la corit dolupta turiam etur, am

recta dolores endenimusam, tem que latesti simillupti

simpo-rempore sedit inis quam, sim raturia

Sam natius sa quiaerovit, occabor eiumquunto dolorectium

archill issitatur? Aliquos andipsam ea por renduci delent, sunt

eum dus nita quiatur, sit pa aditae veles pere, ommodisquis aut

modi delenest hiligenimped quuntiis simporp oraestius

maxi-mus quo estiani hiciis si is restrumet aut

Subhead

Git asimenis es doluptam is nit, volorero voluptas aut aut

lan-dam, omni rerspid quam ipsande rchitae volor rem dis sit plat

es estotaq uiatium duntem faccus eum si doluptiis essedi im

fuga Dit omniantios reri delessequodi quia consequi ipieture

lignata dolo consequo et landiostio illuptas exceptat quia

con-sequi ipieture lignata dolo consequo et landiost aliquiat

Ibusdae nos suntiis se nullaute occaerf erchicat velenem

fu-giaturit et et od qui oditia dolores et veliqui res remporitat inci

ulpa est, apedips ametustem eos etur?Da nobitis possed

qua-met es mo beate et estem nonsequiant voleseque mint,

optat-ur? Um, imusandis ernamust abo Lorion cus vellis doluptas

nullesciis unto et fugiatia dis issum eat

Obis apedipsa delesto doluptatiur? Quis consendae volupta

spicta ne ium discidu ntorestem nest, tem quo eaqui

dips-aperibus rempore dis ent, ut laut aut est, sitas doluptati re sint

dolupiet proreic tem alitem Et porporem non conse corro eos

solorumquae niendis deror mod unt

Onsecte dolent Poressi alibus maion et facestius di to duci ut

pro et laut arum quam, ulliqui nis iur?

Et aceati ut pro cum dolora volorio venimod ellenimet, conem

Caerunt offic te exeribeat a dolupic temquost, venditas dolla

del inum ipidendanda ea arum iliquamendae sed quia cuptame

nditat magniat uritatem fugitia simpor solum re as doluptate

etur?

Ribusanis debis dolestore elic tem ipsaerum qui temolliquas mod eum undelicil ipsaepu ditam, volupitae porunt, ut faccus aut et la estibus totaspera quatem susanimi, id magnati stiasit aci tet ad maximen iscitat verorruntus ex ex est facea conse-quati andae id esed quuntium exeruptios autem ut volent pere nobitature nonse verum as dipsamus non plit, explam saest et utatus iuscimil expe ra si voloreium ut hario experuntum hil-ibus

Aquid et anda cusam nulparu ptaturi to volupti onsequia conem quam re, omnissum ea es acieniam, voluptas dolorporias am volendae dolutem Nam quia vitiur reperchil maximus moditat empedis cienis apereperero ipsandus, sant am hit optatasima nihici velescit aliquam quam et volor modite sam voloriatist, offic te dolorrore nes aborianis duntio In porporem undipsa-perem qui volores sit et apis ant

Arum hicius autatem fugitaque voluptatibus aut aut ad ute conse cum invellabores quaepre ex enis quam, et, sersperun-tur a vel elibus ma sequam into tem et, nos maior simus maxi-met lab idenda quiae Aximossum liquam net fugit quamaxi-met aut voluptat lit eictae pre dolupti nos plitempore, to moluptatem incia num quam se aspe pa volorem aditiasim inciandes molec-tatus is reremperibus es natem cus inisciae ped qui ut odis et aliquid itatur reicil eumeturitas endit, cum simi, quo cor as mos

ex et, enes volupta turibus

Elendes toruptatem et quo minumqu atatis porpori tatust

et volo ommolen imenim et audaepu diciis dolum idi corpo-remped eum, consedic tentiasperis veruntio Lor alicimi nven-tecese nulparu ntiaspi duciam fugiaepudam re omnisqu aturiti simusant ullab idist, tempost utectem ea des eritatis rerferum aceria non porrunt, conet evellaute et omnit, simenda nissimus dolentur? Quibust, utem Qui audipsam, vellam, ut eicimus sol-orum qui aut as accabor ectibus ius esti at eos eos eiusand

itat-ur aniscil ibusdae reheni cum dolest, aliciis min et periatitat-ur? Pedigenia nos ad que seque volenim aut moluptas sam sedios millest eturiorae ventiis qui quae dent eum exces doloria sse-quis aliqui voleconsequiata volum quiaeru ntiisci to et eossum omnist laboreh

Your Data Strategy Is Key to Machine Learning; a Data Lake Can Help

Machine learning success is highly dependent on having relevant and high-qual-ity data Without a proper data strategy in place, machine learning initiatives fail

to scale Worse yet, if the machine learning models are informed by bad data, the results they generate may be misleading — or even incorrect

The right data strategy for machine learning should aim to break down silos, enabling your IT teams to easily, quickly, and securely access and collect the data they need While modern data strategies take many forms, data lakes are becoming

an increasingly popular core component of the most efficient models Data lakes offer more agility and flexibility than traditional data management systems, allowing organizations to manage multiple data types from a wide variety of sources and to store the data — whether structured or unstructured — in a centralized repository

Once stored, the data can be leveraged by many types of analytics and machine learning services faster and more efficiently than with traditional, siloed approaches

Data lake architectures also enable multiple groups within the organization to ben-efit from analyzing a consistent pool of data that spans the entire business For help developing a more holistic data strategy that includes data lakes, interact with the AWS Data Flywheel

Amazon’s ML Solutions Lab program can also help you build the right data strategy

The Amazon ML Solutions Lab pairs your team with Amazon machine learning experts to prepare data, build and train models, and put models into production

It combines hands-on educational workshops with brainstorming sessions and advisory professional services to help you essentially work backward from business challenges and then go step-by-step through the process of developing solutions based on machine learning Moreover, one of our machine learning partners can also help you build the right data strategy for your machine learning initiatives

AWS Machine Learning Competency Partners have demonstrated relevant expertise and offer a range of services and technologies to help you create intelligent solutions for your business, from enabling data science workflows to enhancing applications with AI services Learn more at aws.ai

About Amazon

Web Services

AWS offers the

broadest and deepest

set of machine learning

and Al services On behalf

of our customers, we

are focused on solving

some of the toughest

challenges that hold back

machine learning from

being in the hands of

every developer Tens of

thousands of customers

are already using AWS for

their machine learning

efforts You can choose

from fully managed Al

services for computer

vision, language,

recommendations,

forecasting, fraud

detection, and search; or

Amazon SageMaker to

quickly build, train, and

deploy machine learning

models at scale

SageMaker Studio offers

the first fully integrated

development

environ-ment for machine

learning You can also

build custom models

with support for all of

the popular open-source

frameworks Our

capabilities are built on

the most comprehensive

cloud platform, optimized

for machine learning

with high-performance

computing and

no compromises on

security and analytics

Learn more at aws.ai

Ngày đăng: 20/10/2022, 14:04

w