1. Trang chủ
  2. » Thể loại khác

Dữ liệu lớn trong y tế phân tích thống kê y tế hồ sơ y tế

575 19 4

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Big Data In Healthcare: Statistical Analysis Of The Electronic Health Record
Tác giả Farrokh Alemi
Trường học Health Administration Press
Chuyên ngành Medical Statistics
Thể loại Book
Năm xuất bản 2019
Thành phố Chicago
Định dạng
Số trang 575
Dung lượng 9,16 MB
File đính kèm Big Data in Healthcare.rar (5 MB)

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Title: Big data in healthcare : statistical analysis of the electronic health record / by Farrokh Alemi.. | Summary: “This book introduces health administrators, nurses, physician assi

Trang 3

Stephen J O’Connor, PhD, FACHE, Chairman

University of Alabama at Birmingham

West Virginia University

Lynn T Downs, PhD, FACHE

University of the Incarnate Word

Trang 4

Association of University Programs in Health Administration, Washington, DC

Trang 5

This publication is intended to provide accurate and authoritative information in regard to the subject matter covered It is sold, or otherwise provided, with the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional should be sought.

The statements and opinions contained in this book are strictly those of the authors and do not represent the official positions of the American College of Healthcare Executives, the Foundation

of the American College of Healthcare Executives, or the Association of University Programs in Health Administration.

Copyright © 2020 by the Foundation of the American College of Healthcare Executives Printed

in the United States of America All rights reserved This book or parts thereof may not be reproduced in any form without written permission of the publisher

Library of Congress Cataloging-in-Publication Data

Names: Alemi, Farrokh, author

Title: Big data in healthcare : statistical analysis of the electronic

health record / by Farrokh Alemi

Description: Chicago, IL : Health Administration Press, [2019] | Includes

bibliographical references and index | Summary: “This book introduces

health administrators, nurses, physician assistants, medical students,

and data scientists to statistical analysis of electronic health records

(EHRs) The future of medicine depends on understanding patterns in

EHRs This book shows how to use EHRs for precision and predictive

medicine” Provided by publisher

Identifiers: LCCN 2019026815 (print) | LCCN 2019026816 (ebook) | ISBN

9781640550636 (hardcover) | ISBN 9781640550643 (ebook) | ISBN

9781640550650 | ISBN 9781640550667 (epub) | ISBN 9781640550674 (mobi)

Subjects: LCSH: Medical statistics | Data mining.

Classification: LCC RA409 A44 2019 (print) | LCC RA409 (ebook) | DDC

610.2/1 dc23

LC record available at https://lccn.loc.gov/2019026815

LC ebook record available at https://lccn.loc.gov/2019026816

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984 ∞ ™

Acquisitions editor: Jennette McClain; Project manager: Theresa L Rothschadl; Cover designer: James Slate; Layout: PerfecType

Found an error or a typo? We want to know! Please e-mail it to hapbooks@ache.org, mentioning the book’s title and putting “Book Error” in the subject line

For photocopying and copyright information, please contact Copyright Clearance Center at www copyright.com or at (978) 750-8400.

Health Administration Press Association of University Programs

A division of the Foundation of the American in Health Administration

College of Healthcare Executives 1730 M Street, NW

300 S Riverside Plaza, Suite 1900 Suite 407

Trang 8

Acknowledgments xvii

Chapter 1 Introduction 1

Chapter 2 Preparing Data Using Structured Query Language (SQL) 11

Chapter 3 Introduction to Probability and Relationships 55

Chapter 4 Distributions and Univariate Analysis 77

Chapter 5 Risk Assessment: Prognosis of Patients with Multiple Morbidities 101

Chapter 6 Comparison of Means .135

Chapter 7 Comparison of Rates 173

Chapter 8 Time to Adverse Events 203

Chapter 9 Analysis of One Observation per Time Period: Tukey’s Chart 223

Chapter 10 Causal Control Charts 239

Chapter 11 Regression 255

Chapter 12 Logistic Regression .309

Chapter 13 Propensity Scoring 327

Chapter 14 Multilevel Modeling: Intercept Regression 345

Chapter 15 Matched Case Control Studies 361

Chapter 16 Stratified Covariate Balancing 383

Chapter 17 Application to Benchmarking Clinicians: Switching Distributions 409

Trang 9

Chapter 18 Stratified Regression: Rethinking Regression

Coefficients 427

Chapter 19 Association Network 459

Chapter 20 Causal Networks 487

Index 527

About the Author 551

About the Contributors 553

Trang 10

Acknowledgments xvii

Chapter 1 Introduction 1

Why Management by Numbers? 1

Why a New Book on Statistics? .5

Digital Aids and Multimedia 8

Relationship to Existing Courses 8

Audience 8

Five Courses in One Book 8

Supplemental Resources 9

References 9

Chapter 2 Preparing Data Using Structured Query Language (SQL) 11

SQL Is a Necessary Skill 12

What Is SQL? 14

Learn by Searching 14

Common SQL Commands 14

Cleaning Data 38

Should Data Be Ignored? 45

Time Confusion: Landmark, Forward, and Backward Looks 47

Confusion in Unit of Analysis and Timing of Covariates 52

Summary 53

Supplemental Resources 53

References 53

Chapter 3 Introduction to Probability and Relationships 55

Probability 56

Probability Calculus .58

Conditional Probability 61

Odds 62

Bayes’s Formula .62

Independence Simplifies Bayes’s Formula 64

Trang 11

Contingency Tables in Excel 68

The Chi-Square Test 71

Relationship Among Continuous Variables 73

Correlation 74

Summary 76

Supplemental Resources 76

References 76

Chapter 4 Distributions and Univariate Analysis 77

Introduction 78

Variables 78

Sample 80

Average 82

Expected Values 83

Standard Deviation 85

Transformation of Variables 87

Transformation of Variables Using Excel 89

Distribution 90

Histogram 92

How to Make a Histogram in Excel .94

Transformation of Distributions 97

Observing Variables over Time 98

Minimum Observations for Control Charts 99

Summary 100

Supplemental Resources 100

Reference 100

Chapter 5 Risk Assessment: Prognosis of Patients with Multiple Morbidities 101

Introduction 102

Alternatives to the Multimorbidity Index 102

The Theory Behind Multimorbidity Index 105

Estimating Parameters of the MM Index 106

Calculation of Likelihood Ratios 106

Adjustment for Rare Diseases 108

Adjustment for Revision 10 108

Sample Size Needed to Construct the MM Index 109

Cross-Validation 109

Checking the Accuracy of Predictions 112

MM Index Compared to Physiological Markers 117

MM Indexes Compared to Other Diagnoses-Based

Trang 12

Example of the Use of the MM Index 119

Summary 119

Supplemental Resources 121

References 121

Note 124

Appendix 5.1 125

Appendix 5.2 133

Chapter 6 Comparison of Means .135

Normal Distribution 136

Hypothesis Testing 144

Comparison of Two-Sample Means 151

Control Chart with Normal Distribution 152

Summary 171

Supplemental Resources 172

References 172

Chapter 7 Comparison of Rates 173

Summarizing Discrete Variables 174

The Bernoulli Process and the Binomial Distribution 175

Normal Approximation 179

Inference for a Single Rate 181

Comparison of Two Rates 183

Confidence Interval for Odds Ratio 186

Probability Control Chart 189

Risk-Adjusted P-chart .194

Summary 199

Supplemental Resources 201

Reference 201

Chapter 8 Time to Adverse Events 203

Distribution of Sentinel Events 203

Days to Event 208

Time-Between Charts 209

Example: Analysis of Online Reviews 212

Example: Sticking to Exercise Resolutions 219

Are Insights into Data Worth the Effort? 220

Summary 221

Supplemental Resources 221

References .221

Trang 13

Chapter 9 Analysis of One Observation per Time Period:

Tukey’s Chart 223

Tukey’s Chart 223

Example 1: Time to Pain Medication 224

Example 2: Exercise Time and Weight Loss 228

Example 3: Keeping Exercise Patterns 230

Example 4: Medication Errors 232

Example 5: Budget Variation 233

Comparison of Tukey’s and Other Charts 236

Summary 237

Supplemental Resources 237

References 237

Chapter 10 Causal Control Charts 239

Assumptions of Causal Claims 240

Attributable Risk 241

Example: Fires in the Operating Room 242

Causal Analysis in the Context of Control Charts 244

Methods 245

A Simulated Example of Emergency Department Delays 247

Application to Stock Market Prices .250

Summary 252

Supplemental Resources 252

References 253

Chapter 11 Regression 255

Regression Is Everywhere 256

Types of Regression 258

Introduction to Equations 259

Fitting Data to an Equation: Residuals 262

Example: Analysis of Costs 264

Example with a Single Predictor of Cost 266

Independent Variables .271

Main Effect and Interactions .274

Coefficient of Determination .277

Model Building 280

Regression Assumptions: Correctness of Model Form 282

Regression Assumptions: Independence of Error Terms 285

Regression Assumptions: Homoscedasticity 286

Trang 14

Regression Assumptions: Normally Distributed Errors 288

Transformation of Data to Remedy Model Violations 290

Effects of Collinearity 291

Importance of Cross-Validation 292

Weighted Regression 294

Shrinkage Methods and Ridge or LASSO Regression .294

Context-Specific Hypothesis Testing 295

Changing Units of Measurement 296

Interpretation of Regression as Cause and Effect 296

Summary 297

Supplemental Resources 297

References 298

Appendix 11.1 300

Chapter 12 Logistic Regression .309

Widespread Use 310

Case Study 312

Logistic Regression Model 313

Example of Ordinary Regression with Logit Transformation 315

Predictors of the Use of an MFH Using R 318

Interpretation of Coefficients 320

Context Dependent Hypothesis Test 321

Measures of Goodness of Fit 322

Summary 324

Supplemental Resources 324

References 324

Chapter 13 Propensity Scoring 327

Widespread Use 328

Propensity Scoring Is a Simulation 329

Three Steps in Propensity Scoring 330

Balancing Through Propensity Scores 331

Propensity Score Quintile Matching 332

Propensity Score Weighting 337

Double Regression 338

Example for Weighted Propensity Scoring 338

Verification of Propensity Scores 342

Overlap and Related Concepts 343

Summary 343

Supplemental Resources 343

Trang 15

References 344

Chapter 14 Multilevel Modeling: Intercept Regression 345

Increasing Use 345

Ideas Behind Multilevel Modeling 346

Multilevel Modeling Using Standard Query Language 354

Application of Multilevel Modeling to Other Data Types 358

Measurement Issues 358

Summary 359

Supplemental Resources 359

References 359

Chapter 15 Matched Case Control Studies 361

Widespread Application 362

Representative Data Are Needed 364

Definition of Cases and Controls 364

Measurement of Exposure to Treatment 365

Enrollment and Observation Period 366

Matching Criteria 368

Measurement of Outcomes 371

Verification of Matching 373

Analysis of Outcomes 373

Analysis of Time to Event 377

Overlap 378

Summary 378

Supplemental Resources 379

References 379

Notes 381

Chapter 16 Stratified Covariate Balancing 383

Introduction 384

The History of Stratification 385

Combination of Covariates .385

Impact of Treatment on Binary Outcomes 386

Impact of Treatment on Continuous Outcomes: Difference Models 389

Impact of Treatment on Continuous Outcomes: Weighted Data 390

Comparison to Propensity Scoring 392

Overlap Problem and Solutions 398

Trang 16

Automated Removal of Confounding 403

R Package 406

Summary 406

Supplemental Resources 407

References 407

Chapter 17 Application to Benchmarking Clinicians: Switching Distributions 409

Introduction 410

Switching Probabilities 411

Example with Multiple Comorbidities 413

Overlap of Clinician’s and Peer Group’s Patients 416

Synthetic Controls 418

Limitations 420

Summary 421

Supplemental Resources 422

References 424

Chapter 18 Stratified Regression: Rethinking Regression Coefficients 427

Not in Widespread Use 428

Background 428

Multilinear Regression 429

Example: Predicting Cost of Insurance 430

Estimation of Impact of Independent Variables 433

Estimation of Correction Factors 434

Final Write-Up of the Equation 436

Replacing the Multilinear Model with a Multiplicative Model 436

Estimation of Parameters in a Multiplicative Model 437

Determination of Overall Constant k 439

Application to Prognosis of Lung Cancer 439

Structured Query Language Code for Stratified Regression 447

Summary 450

Supplemental Resources 451

References 451

Appendix 18.1 452

Chapter 19 Association Network 459

Not in Widespread Use 459

Trang 17

Shrinking Universe of Possibilities 461

Product of Marginal Probabilities 464

Chi-Square Test of Independence 467

Visual Display of Dependence 467

Independence for Three Variables 469

Chi-Square Testing for Three Variables 471

Spurious Correlation 475

Mutual Information 477

Poisson Regression and Tests of Dependence 478

Example Construction of Association Network 481

Summary 484

Supplemental Resources 485

References 485

Chapter 20 Causal Networks 487

Causal Thinking Is Fundamental 488

Use of Network Models 488

So What Is Causation? 489

Key Concepts in Causal Networks 491

Relationship Between Regression and Causal Networks 497

Predicting the Probability of an Event 501

A Numerical Example 503

Causal Impact 506

Back-Door Paths and Markov Blankets 507

Estimating Structure and Parameters of Causal Networks 510

Learning Associations Among Pairs of Variables 511

Directing the Arcs in the Network 514

Learning the Parameters of the Network 516

Verification of Blocked Back Doors 517

Calculating Causal Impact of Cancer on Survival 518

Calculating the Causal Impact of Eating Disability on Death 520

Summary 522

Supplemental Resources 524

References 524

Index 527

About the Author 551

About the Contributors 553

Trang 18

From conception through production, this book has been in preparation for

more than a decade During that time, numerous changes were made The

whole plan of the book changed Entirely new chapters were introduced; in

previously drafted chapters, the presentation changed radically and often All

of these changes followed feedback from students, colleagues, and editors,

whom now I need to thank

The book was first edited by Nancy Freeborne, PhD—mostly for

grammar She looked at the first ten chapters Theresa Rothschadl brought

consistency to the references and exhibits as well as the writing style for

the entire book She transformed the awkward language of an immigrant

mathematician to normal English Good editors make you put in the time to

explain yourself more clearly, and I am grateful to both Theresa and Nancy

for taking on this project My colleague and friend, Harold Erdman, was kind

enough to look through corrections of several chapters

As you can see throughout this book, I am heavily influenced by Judea

Pearl’s ideas on causal analysis I like how he connects his work to

sociolo-gists and economists who were also working on causal analysis At times,

however, his writings can be hard to understand Fortunately, when I queried

Dr Pearl for this book, he was gracious in answering my emails This book

also shows how regression applies to causal networks, which was also

influ-enced the scholarship of Ali Shojaie I wrote to him and benefited from his

timely responses, as well In addition, I appreciate the guidance of Kathryn

Blackmond Laskey I presented my half-baked ideas to her, and she gracefully

corrected them I was lucky to have these communications

You cannot write a book without having time to do so The chair of

my department, Peggy Jo Maddox, was kind enough to provide me with

sufficient time to do this In academia, a good chair is rare PJ was godsend

Managing faculty like me is hard A lot of ego is involved, and we don’t

take direction kindly She was gracious and effective Without her advice and

direction, this book would not have been possible I should also mention

Tracy Shevlin and Regina Young, both of whom radically reduced my

admin-istrative burdens Part of faculty–student advising and research budget

super-vision is paperwork—not a little, but a lot of paperwork Tracy and Regina

made my life easier, which in turn allowed me more time to write this book

Trang 19

In the decade the book was under preparation, clinicians Raya bek, Mark Schwartz, Allison Williams, and Cari Levy heavily influenced my thinking When they complained about models with thousands of variables, they forced me to explain myself Many examples in the book came from interactions with these clinicians The questions they asked mattered I changed design methods to be more relevant to their needs

Kheir-Sanja Avramovic is one of the closest colleagues I have had in the last decade I would go to her with my standard query language problems For that, I am very much in her debt Janusz Wojtusiak also was a great help, and many of our exchanges appear in this book For example, the work on synthetic controls came from our conversation at a seminar that he made me attend, in spite of my reluctance He was the first person to show me how propensity scoring works I am grateful to him

If you want to understand what enables a book, follow the money During this period, I was supported by grants from the Centers for Disease Control and Prevention (to Lorna Thorpe at New York University) and from the Veterans Administration (to Schwartz, Kheirbek, and Levy) These principal investigators actively supported me In fact, many of the research projects they paid for finished as examples in this book

The body of work presented here has been used as required reading

in three courses that I regularly teach The students in these courses played a large role in the improving this book They pointed out parts that were not clear They created “each one teach one” videos to help their peers in answer-ing problem sets (you can see many of them in the supplementary materials)

I am grateful to all of my many students, but would like to highlight the contributions of Steve Brown, Amr ElRafey, and Timothy P Coffin When

I first started teaching a course on causal analysis, I would start by saying that I did not know the topic well and that it was still changing, which it was Sometimes, when students asked very relevant questions, they would be surprised to hear my answer: “I don’t know.” hey put up with me while I learned, and now I am not only grateful but must also apologize for the pain

I caused when I could not give them answers They truly paved the way for later, more successful classes

I am grateful to my daughter Roshan Badii Alemi When she was working for the Advisory Board, I would pump her for information about her work When I needed to provide examples here of analyses that would

be useful to hospital and clinic administrators, she knew, firsthand, what they wanted to I benefited from her insights Her work on benchmarking was also eye opening It forced me to rethink how synthetic controls should be organized It helped me explain data balancing in ways hospital administra-tors and clinicians can understand

Trang 20

I am also grateful to my daughter Yara Badii Alemi She helped

pre-pare a number of videos for the book’s supplementary materials As a theater

student, she knew how to present complex issues She forced me out of dull,

monotone, repetitious, talking-head narrations She showed me how to show

my enthusiasm for the topic She is also the person who took me to a remote

island in Greece, where I thought through stratified covariate balancing while

looking at beautiful blue sea That vacation proves that the best ideas come

to you when you are having fun

I am surprised at how much statistics has changed, even in a short

decade I once thought statistics was a stable science I was wrong—it is

in constant turmoil I thought I knew how to do hypothesis testing I was

wrong I thought I knew how to do statistical modeling I was wrong I am

grateful that at the infancy of data science, when we had just began to look

at massive data sets, I had the opportunity to learn I was there when the

work of data scientists went from obscure, behind-the-scenes jobs done in

basement offices to strategic, frontline positions of primary importance to

their organizations How cool is it to witness and chronicle radical change in

statistics? When I was a student, there was no introductory book on statistics

like this one I am grateful for the opportunity to write it

Farrokh Alemi

Trang 22

Chapter at a Glance

This book introduces health administrators, nurses, physician assistants,

medical students, and data scientists to statistical analysis of electronic health

records (EHRs) The future of medicine depends on understanding patterns

in EHRs This book shows how to use EHRs for precision and predictive

medicine This chapter introduces why a new book on statistical analysis is

needed and how healthcare managers, analysts, and practitioners can benefit

from fresh educational tools in this area

Why Management by Numbers?

This textbook provides a radically different alternative to books on statistical

analysis It de‑emphasizes hypothesis testing It focuses primarily on remov‑

ing confounding in EHRs It emphasizes data obtained from EHRs and thus,

by necessity, involves a great deal of structured query language (SQL)

The management and practice of healthcare are undergoing revo‑

lutionary changes (McAfee and Brynjolfsson 2012) More information is

available than ever before, both inside and outside of organizations Massive

databases, often referred to as big data, are available and accessible These

data can inform management and practitioners’ decisions The growing use

of EHRs has enabled healthcare organizations, especially hospitals and insur‑

ance companies, to access large data sets Inside organizations, EHRs can

measure countless operational and clinical metrics that enhance the organiza‑

tion’s productivity

All sorts of data points are available for scrutiny Analysts can track

who is doing what and who is achieving which outcomes Providers can be

benchmarked; front desk staff efficiency can be monitored Data are available

on the true cost of operations, as nearly every activity is tracked Contracts

with health maintenance organizations can be negotiated with real data on

cost of services Data are available on profitability of different operations, so

unprofitable care can be discontinued Managers can detect unusual patterns

in the data For example, they can see that hospital occupancy affects emer‑

gency department backup

1

Trang 23

In the healthcare field, data are available on pharmaceutical costs and their relationship to various outcomes Many organizations have lists of med‑ications on their formulary, and now such lists can be based on both cost and outcome data Medications can be prescribed with more precision and less waste Data can be used to predict future illnesses; diseases can be prevented before they occur The wide availability of massive amounts of data has made managing with numbers easier and more insightful The following are some examples of how healthcare organizations are gathering massive databases to enable insights into best practices (Jaret 2013):

1 The Personalized Medicine Institute at Moffitt Cancer Center tracks more than 90,000 patients at 18 different sites around the country

2 In any given year, the Veterans Affairs Informatics and Computing Infrastructure (VINCI) collects data on more than 6 million veterans across 153 medical centers

3 Kaiser Permanente has a database of 9 million patients

4 Aurora Health Care system has 1.2 million patients in its data systems

5 The University of California’s medical centers and hospitals have a database with more than 11 million patients

6 The US Food and Drug Administration has the combined medical records of more than 100 million individuals to track the postlaunch effectiveness of medications

7 The Agency for Healthcare Research and Quality has compiled claims data across 50 states

8 The Centers for Medicare & Medicaid Services releases 5 percent samples of its massive data

In addition to planned efforts to collect information, data gather on their own on the web Patients’ preferences, organization market share, and competitive advantages can all be determined from analysis of internet com‑ments (Alemi et al 2012) The internet of things collects massive data on consumers’ behavior Most web data are in text format Analysis of these data requires text processing, a growing analytical field

Big data is influencing which managers will succeed and which will not “As the tools and philosophies of big data spread, they will change the long‑lasting ideas about the practice of management” (Eshkenazi 2012) Companies that get insights through analysis of big data are expected to

do better than those that do not, and therefore these managers will suc‑ceed more often There are many examples of how data‑driven companies succeed over counterparts that ignore data analysis At Mercy Hospital in Iowa City, Iowa, managers who benchmark their clinicians and pay them

Trang 24

for performance report 6.6 percent improvements in the quality of care

(Izakovic 2007)

Many investigators point out that the Veterans Health Administration

(VHA) was able to reinvent itself because it focused on measurement of

performance (Longman 2010) The VHA healthcare system had poor qual‑

ity of care—until the VHA became data driven Then, over a short interval,

VHA managers and clinicians were able to not only change the culture but

also change patient outcomes According to Longman (2010), the VHA

system now reports some of the best outcomes for patients anywhere in

United States

A recent study of 330 North American companies showed widespread

positive attitudes toward data evaluation The more companies characterized

themselves as data driven, the more they were likely to outperform their

competitors financially and operationally Data‑driven companies were 5

percent more productive and 6 percent more profitable than less data‑driven

companies (Brynjolfsson, Hitt, and Kim 2011)

In healthcare, companies that rely heavily on Lean (a process improve‑

ment tool) and other similar tools can be classified as data driven, even if they

rely on small data sets These companies use statistical process control to

verify that changes have led to improvements Many studies show that when

organizations fully implement statistical process control tools, including an

emphasis on measurement (Nelsonet al 2000), they deliver better care at

lower cost (Shortell, Bennett, and Byck 1998) The use of these techniques

is widespread, making it an essential capability of modern managers (Vest and

Gamm 2009)

In healthcare, the use of EHRs has been associated with reductions

in medication errors (Stürzlinger et al 2009) Managers have used EHRs

to maximize reimbursement in ways that have surprised insurers (Abelson,

Creswell, and Palmer 2012) Other managers report analyzing data in EHRs

to reduce “never events” (unreimbursable accidents) in their facilities and

to measure quality of care (Glaser and Hess 2011) These efforts show that

analysts are finding ways to use the data in EHRs to improve their organiza‑

tions Such efforts are expected to continue, creating an unprecedented shift

toward the heavy use of data

Big data has changed and continues to change health insurance Insur‑

ance companies are trimming their networks using data on the performance

of their doctors New start‑up insurance companies are competing more

effectively with well‑established insurance companies by situating their sec‑

ondary providers near their target market Insurance companies are deciding

what to cover and what to discourage through data analysis Risk assessment

is changing, and more accurate models are reducing the risk of insurance In

risk rating, chronological age may not be as important as history of illness

Trang 25

Value‑based payment systems have transformed who assumes risk Value‑based reimbursement has changed how hospitals and clinics are paid With this paradigm shift, insurers hold hospital managers accountable for quality of care inside and outside of hospitals For example, a hospital that does a hip replacement is paid a fixed amount of money for expenses, including the cost of surgery and out‑of‑hospital costs 90 days after sur‑gery The hospital manager needs to make sure not only that the healthcare organization’s surgeons are effective and that its operation does not lead

to unnecessary long stays, but also that patients are discharged to nursing homes or other institutions that actively work on the patients’ recovery Affiliation with a home health care organization or nursing home could help decrease readmission and could easily reduce the hospital’s payments For 90 days, no matter where the patient is cared for, the hospital manager

is at risk for cost overruns Value‑based reimbursements have increased the need to analyze data and affiliate with providers and institutions that are cost‑effective

Big data is changing clinical practice as well The availability of data has enabled managers and insurers to go beyond traditional roles and address clinical questions For the first time, analysts can measure the comparative effectiveness of different healthcare interventions They can talk to physi‑cians, nurse practitioners, and physician assistants about their clinical prac‑tices They can discourage patients from undergoing unnecessary operations For years, clinical decisions were made by clinicians, but the availability of data is beginning to change this For example, the Centers for Disease Con‑trol and Prevention uses Data to Care (D2C) procedures to identify HIV patients who have stopped taking their medications Careful communication with these patients can bring them back to care In addition, payers such as Amazon are organizing population‑level interventions to improve delivery of care Analysts are alerting primary care providers about potential substance abuse and alerting patients about the need for flu shots These efforts are giving extended clinical roles to data analysts

Data are changing the healthcare equation Today, managers have data on what is best for patients, and they can work with their clinicians to change practices For example, analysts have been able to examine pairs of drugs that cause a side effect not associated with the use of either drug on its own They found that Paxil, a widely used antidepressant, and Pravas‑tatin, a cholesterol‑lowering drug, raise patients’ blood sugar level when used together (Tatonetti et al 2012) In this example, and other compara‑tive effectiveness studies, we see an emerging new role for data scientists

Trang 26

Why a New Book on Statistics?

Big Data in Healthcare differs from existing introductory statistics books in

many ways Exhibit 1.1 lists how this textbook’s emphasis differs from that of

other managerial statistics books First, it exclusively focuses on the applica‑

tion of statistics to EHRs All examples in this book come from healthcare

They include use of statistics for healthcare marketing, cost accounting,

strategic management, personnel selection, pay‑for‑performance, value‑based

payment systems, insurance contracting, and clinician benchmarking These

examples are given to illustrate the importance of quantitative analysis to

management of healthcare

Second, the book de‑emphasizes traditional hypothesis testing and

emphasizes statistical process control For healthcare managers, hypothesis

testing is of little use; such testing requires the use of static populations and

context‑free tests that simply do not exist in the real world In contrast,

healthcare managers have to examine their hypotheses over time and thus

need to rely on statistical process control Alternately, they need to test a

hypothesis while controlling for other conditions and must therefore rely on

multivariate analysis as opposed to univariate hypothesis tests

Most existing books focus on hypothesis testing through confidence

intervals and standardized normal distributions Big Data in Healthcare intro‑

duces these concepts through statistical process control Confidence intervals

are discussed in terms of 95 percent upper and lower control limits in control

charts The use of geometric distributions in time‑between control charts is

discussed This book covers the use of Bernoulli and binomial distributions

in creating probability control charts It discusses the use of normal distribu‑

tions in creating X‑bar control charts and provides students with knowledge

of hypothesis testing in the context of observational data collected over time

Third, this book differs from most other introductory statistics text‑

books in that it mostly relies on EHR‑based data Healthcare is swimming in

data Data analysts need to structure and delete large amounts of data before

they can address a specific problem EHR data are observational, not experi‑

mental Managers rarely have the option to run randomized experiments

Because data come from operational EHRs, where data are collected from

patients who voluntarily participate in various treatments, a number of steps

must be taken to remove confounding in data In jest, analysts call these steps

“torturing data until they confess.”

In EHRs, data are available in numerous small tables, and not in one

large matrix, as most statistical books require This book gives considerable

Trang 27

attention to how data from different tables should be merged Throughout the book, I have relied on SQL to make the manipulation of data easier Because the data are inside EHRs, SQL is required to manage the data—other statistical packages are just not available for EHRs Statistical analysis is really just the tip of the iceberg; much more work and time go into prepar‑

ing the data than into analyzing them Big Data in Healthcare: Statistical

Analysis of the Electronic Health Record pays special attention to preparation

Topic Emphasis of Other Books Emphasis of This Book

Distributions • Normal, uniform, and

other continuous tributions with little coverage of discrete probability theory

dis-• Probability distribution in discrete events, including Bernoulli, binomial, geomet-ric, and Poisson distributions

• Normal distribution as an approximation

Data • Measures collected

from independent samples

• Prospective data collection

• Longitudinal, time-based, repeated measures

• Estimation of upper and lower control limits in pro-cess control charts

• Bootstrapped estimates of variability

Univariate methods of inference

• Comparison of mean to population

• Comparison of two means

• Paired t-test and

com-parison of dependent means

• Analysis of variance

• Statistical process control tools such as XmR charts, p-charts, time-between charts, Tukey’s charts

• Risk-adjusted process trol tools

con-Multivariate analysis

Trang 28

In comprehensive EHRs, data are available on patients from birth

until death To use these data, we need to understand their time frame Sev‑

eral statistical methods have been designed based on the sequenced order of

events EHR data enable new methods of analysis not otherwise available

Data are collected passively as events occur Over time, more data

are available, and one major task of the manager is to decide which data are

relevant The data themselves never stop flowing, and the manager must

decide which period he would like to examine and why EHRs are also full of

surprises, and some data must be discarded because they are erroneous (e.g.,

male pregnancy, visits after death)

Perhaps most important, this book focuses on causal interpretation of

statistics In the past, statisticians have focused on association among vari‑

ables They have worked under the slogan that “correlation is not causation.”

While that statement is valid, policymakers, managers, and other decision

makers act on the statistical findings as if correlation was causal Any action

assumes that the statistical findings are causal—that is, that changing one

variable will lead to the desired impact Statisticians who insist on avoiding

causal interpretation of their findings are naive and are ignoring the obvious:

their findings might be used differently than their planned precautions might

have indicated At the same time, they are also right to assert that causes are

more than correlations To interpret a variable as causing a change in another

variable, we need to establish four principles:

1 Association Causes have a statistically significant impact on effects.

2 Sequence Causes occur before effects.

3 Mechanism A third variable mediates the impact of the cause on the

effect

4 Counterfactual In the absence of causes, effects are less likely to occur.

These four criteria allow us to discuss and vet causes rather than

simply evaluating associations In recent decades, statisticians have revisited

their approach of avoiding causal interpretation and have introduced new

techniques and methods that allow for evaluation of causality For example,

causal network models are an alternative to regression analysis Network

models allow the verification of the four assumptions of causality; regression

models do not Another example, propensity scoring, allows statisticians to

remove confounding in multivariate analysis and provides a causal estimate

of the impact of a variable This book starts with associations and conditional

probabilities, but it uses these concepts to move on to propensity‑matched

regression analysis or causal networks Even in early chapters, where we

discuss stratification and distributions, we lay the foundation for causal inter‑

pretations In openly discussing causality, this book differs from many other

Trang 29

Digital Aids and Multimedia

The book is accompanied by (1) slides to teach the course content, (2) video lectures, (3) video examples to illustrate the points made in the lectures, (3) extensive end‑of‑chapter exercises, (4) solutions to odd‑numbered examples, and (5) a sample test set for midterm and finals Topics in these supplements may be broader than the book, so take a look at them

Relationship to Existing Courses

Students often do not understand the relationship between an introduc‑tory statistics course and other material they cover in health administration

Big Data in Healthcare: Statistical Analysis of the Electronic Health Record

makes these linkages explicit At the end of each chapter, the book directs you to the course website for problems to solve Each problem is tied to

a specific health administration or health informatics course For example, problems in statistical process control are linked to courses in quality improvement A problem in fraud detection is tied to the course in account‑ing For still another example, comparative effectiveness analysis is linked to courses in strategy, informatics, and program evaluation The expectation is that students will not only learn statistical concepts but also understand the connections between this course and various other courses in health admin‑istration programs

Audience

The primary audience of this book is health administration and informatics students In addition, nursing, physician assistant, and medical students may benefit This book is not intended for a nonhealthcare audience

Five Courses in One Book

This book can be used to teach many different courses:

1 The chapter on data preparation (chapter 2) and the chapter on risk assessment (chapter 5) can be used to teach an introductory course about SQL These chapters present basic SQL commands and their use

in constructing predictive models Throughout the book, numerous

Trang 30

examples of SQL code are provided that can further help students learning

database design and analysis The supplemental material of this chapter

provides a syllabus for how to use this book to teach a course on SQL

2 Chapters 3 through 7 can be used to replace an introductory course in

statistics that focuses on hypothesis testing These chapters introduce

the concept of hypothesis testing and distributions A syllabus is

provided for courses that are exclusively focused on traditional

hypothesis testing The syllabus lists specific chapters and parts of

chapters that may be helpful

3 Chapters that focus on process control (chapters 5 through 10) can be

used in a course on quality improvement Many quality improvement

courses discuss the general concepts but not the statistical tools, which

is unfortunate This book can improve the content of courses on

quality improvement A syllabus is provided for this type of course

4 Chapters 11 and 12 can be used to teach a course on multivariate

regression analysis Chapters 13 (on propensity scoring), 14 (on

hierarchical modeling), and 18 (on stratified regression) further show

the value of ordinary regression Again, a syllabus is provided for how

to use this book to teach regression

5 Chapters 13 through 20 can also be used to teach a course on causal

analysis, especially in the context of comparative effectiveness analysis

These chapters enable students to remove confounding in EHR data

The supplemental material includes a syllabus for how to use this book

to teach causal and comparative effectiveness courses

Supplemental Resources

See tools for course design and syllabuses for various types of courses on

the web

References

Abelson, R., J Creswell, and G Palmer 2012 “Medicare Bills Rise as Records

Turn Electronic.” New York Times Published September 21 www.nytimes

.com/2012/09/22/business/medicare‑billing‑rises‑at‑hospitals‑with ‑electronic

‑records.html

Alemi, F., M Torii, L Clementz, and D C Aron 2012 “Feasibility of Real‑Time

Satisfaction Surveys Through Automated Analysis of Patients’ Unstructured

Comments and Sentiments.” Quality Management Health Care 21 (1): 9–19.

Trang 31

Brynjolfsson, E., L Hitt, and H Kim 2011 “Strength in Numbers: How Does Data‑Driven Decisionmaking Affect Firm Performance?” Accessed October

15, 2018 www.a51.nl/storage/pdf/SSRN_id1819486.pdf

Eshkenazi, A 2012 “Joining the Big Data Revolution.” SCM NOW Magazine

Accessed April 10, 2019 www.apics.org/apics‑for‑individuals/apics

‑magazine ‑home/magazine‑detail‑page/2012/10/26/joining ‑the‑big

‑data‑revolution

Glaser, J., and R Hess 2011 “Leveraging Healthcare IT to Improve Operational

Performance.” Healthcare Financial Management 65 (2): 82–85.

Izakovic, M 2007 “New Trends in the Management of Inpatients in U.S Hos‑pitals—Quality Measurements and Evidence‑Based Medicine in Practice.”

Bratislavské Lekárske Listy 108 (3): 117–21.

Jaret, P 2013 “Mining Electronic Records for Revealing Health Data.” New York

Times Published January 14 www.nytimes.com/2013/01/15/health/

mining ‑electronic ‑records‑for‑revealing‑health‑data.html

Longman, P 2010 Best Care Anywhere: Why VA Health Care Is Better Than Yours,

2nd ed San Francisco: Berrett‑Koehler Publishers

McAfee, A., and E Brynjolfsson 2012 “Big Data: The Management Revolution.”

Harvard Business Review 90 (10): 60–66.

Nelson, E C., M E Splaine, M M Godfrey, V Kahn, A Hess, P Batalden, and

S K Plume 2000 “Using Data to Improve Medical Practice by Measur‑

ing Processes and Outcomes of Care.” Joint Commission Journal on Quality

Improvement 26 (12): 667–85.

Shortell, S M., C L Bennett, and G R Byck 1998 “Assessing the Impact of Continuous Quality Improvement on Clinical Practice: What It Will Take to

Accelerate Progress.” Milbank Quarterly 76 (4): 593–624.

Stürzlinger, H., C Hiebinger, D Pertl, and P Traurig 2009 “Computerized Phy‑sician Order Entry: Effectiveness and Efficiency of Electronic Medication

Ordering with Decision Support Systems.” GMS Health Technology Assessment

19 (5): Doc07

Tatonetti, N P., P P Ye, R Daneshjou, and R B Altman 2012 “Data‑Driven

Prediction of Drug Effects and Interactions.” Science Translational Medicine

4 (125): 125ra31

Vest, J R., and L D Gamm 2009 “A Critical Review of the Research Literature

on Six Sigma, Lean and StuderGroup’s Hardwiring Excellence in the United States: The Need to Demonstrate and Communicate the Effectiveness of

Transformation Strategies in Healthcare.” Implementation Science: 35.

Trang 32

PREPARING DATA USING STRUCTURED

QUERY LANGUAGE (SQL)

Learning Objectives

1 Use basic standard query language (SQL) commands to manipulate

data

2 Select an appropriate set of predictors, including predictors that are

rare, obvious, and not in the causal path from treatment to outcome

3 Identify and clean typical contradictory data in electronic health

records

Key Concepts

• Structured query language (SQL)

• Primary and foreign keys

• SELECT, FROM, CREATE, WHERE, HAVING, GROUP BY,

ORDER BY, and other commands

• Inner, outer, left, right, full, and cross joins

• GETDATE, CONCAT, STUFF functions

• RANK, RAND functions

• Rare, obvious, causal pathways

• Comorbidity versus complications

• Landmark, forward, and backward looks

Chapter at a Glance

This chapter introduces standard query language (SQL) and how data can be

prepared for analysis Data preparation is fundamental to analysis Without

proper preparation of the data, the analysis can be misleading and erroneous

Details matter—the way each variable in the analysis is defined affects how

predictive it will be Nothing works better for data preparation than SQL

Therefore, this chapter spends a great deal of time on the use of SQL It then

2

Trang 33

shows how SQL can be used to avoid some common data errors (e.g., dead

or unborn patients visiting the clinic)

SQL Is a Necessary Skill

Data in electronic health records (EHRs) are in multiple tables Patient mation is in one table Prescription data are in another Data on diagnoses are often in an outpatient encounter table Hospital data are in still another table An important first step in any data analysis is to pull various variables

infor-of interest into the same table Combining data from multiple tables leads

to a large—often sparse—new table, where all the variables are present but many have missing values For example, patient X could have a diagnosis and prescription data but no hospital data if she was never hospitalized Patient Y could have a diagnosis, prescription, and hospital data but be missing some other data (e.g., surgical procedure) if he did not have any surgery The pro-cedure to pull the data together requires the use of standard query language (SQL)

Before any analysis can be done, data must be merged into a single

table, often called the matrix format, so that all relevant variables are

pres-ent in the same place Many statistical books do not show how this can be done and thus leave the analyst at a disadvantage in handling data from EHRs These books do not teach use of SQL In contrast, I do I take a dif-ferent approach from most statistical books and believe that SQL and data preparation are essential components of data analysis An analyst who wants

to handle data in EHRs needs to know SQL; there are no ifs, ands, or buts about this Accurate statistical analysis requires careful data preparation, and data preparation requires SQL Statisticians who learn statistics without a deep understanding of data preparation may remain confused about their data, a situation akin to living your life not knowing your parents, where you

came from, or, for that matter, who you are You can live your life in a fog,

but why do so? Knowing the source of the data and its unique features can give the analyst insight into anomalies in the data

Statisticians spend most of their time preparing data—perhaps 80 cent, which is more than is spent actually conducting the analysis Ignoring tools for better preparation of data would significantly handicap the statisti-cian Knowing SQL helps with the bulk of what statistical analysts do, which

per-is why training in it per-is essential and fundamental

Decisions made in preparing the data could radically change cal findings These decisions need to be made carefully and transparently; the analyst must make every attempt to communicate the details of these

Trang 34

statisti-preparations to the manager Decisions made in preparing the data should be

well thought out—otherwise good data may be ruined with poor

preprocess-ing Some common errors in preparing data include the following:

• Visits and encounters reported for deceased patients For example, when a

patient’s date of visit or date of death is entered incorrectly, it may look

like dead patients (zombies) are visiting the provider Errors in entry

of dates of events would skew results; thus, cleaning up these errors is

crucial

• Inconsistent data Examples might be a pregnant male or negative cost

values Inconsistent data must be identified and steps must be taken to

resolve these inconsistencies

• Incongruous data After a medication error, one would expect to see

long hospital stays rather than a short visit If that is not the case, the

statistician should review the details to see why not

• Missing information Sometimes, missing information could be replaced

with the most likely response; other times, missing information could

be used as a predictor For example, if a diagnosis is not reported in

the medical record, the most common explanation is that the patient

did not suffer from the condition Sometimes the reverse could be

true If a dead emergency room patient is missing a diagnosis of cardiac

arrest, it is possible that there was no time to diagnose the patient but

the patient had the diagnosis For example, Alemi, Rice, and Hankins

(1990) found that missing diagnoses in emergency department patients

increases the risk of subsequent mortality Before proceeding with

the analysis, missing values must be imputed One must check to see

whether data are missing at random or associated with outcomes

There are many different strategies for dealing with missing values, and

the rationale for each imputation should be examined

• Double-counted information When data are duplicated because analysts

joined two tables using variables that have duplicate values, errors

commonly occur

In short, a great deal must be done before any data analysis

com-mences The analyst needs a language and software that can assist in

prepa-ration of data Of course, we do not need statisticians to become computer

programmers Thankfully, SQL programming is relatively easy (there are few

commands) and can be picked up quickly This chapter exposes the reader

to the most important SQL commands These include SELECT, GROUP

BY, WHERE, JOIN, and some key text manipulation functions These

com-mands are for the most part sufficient for most data preparation tasks

Trang 35

What Is SQL?

SQL is a language for accessing and manipulating relational databases SQL was organized by the American National Standards Institute, meaning that its core commands are the same across vendors The current standard is from

1999, which is a long time for a standard to remain stable This longevity is

in part a result of the fact that SQL is well suited to the task of data lation The data manipulation portion of SQL is designed to add, change, and remove data from a database In this chapter, we primarily focus on data manipulation commands, which include things such as commands to retrieve data from a database, insert data in a database, update data already in the database, and delete data from a database

manipu-SQL also includes data definition language These commands are used to create a database, modify its structure, and destroy it when you

no longer need it There are also different types of tables—for example, temporary tables of data that are deleted when you close your SQL data management software We will also discuss data definition commands later

in this chapter

Finally, SQL also includes data control language These commands protect the database from unauthorized access, from harmful interaction among multiple database users, and from power failures and equipment mal-functions We will not cover these commands in this chapter

Learn by Searching

Users usually learn the format for an SQL command through searches on the web I assume that you can do so on your own In fact, whenever you run into an error, you should always search for the error on the web On the web, you will see many instances of others posting solutions to your problem

Do this first, because it is the best way to get your problems solved Most students of SQL admit that they learned more from web searches than any instruction or instructor The beauty of such learning is that you learn just enough to solve the problem at hand

Common SQL Commands

Different implementations of SQL exist In this chapter, we use the soft SQL Server’s version Other versions of SQL, such as dynamic SQL or Microsoft Access, are also available If the reader is familiar with the concept

Micro-of code laid out here, she can also find on the web the equivalent version Micro-of

Trang 36

the code in a different language Learn one and you have learned almost all

SQL languages

Primary and Foreign Keys

In EHRs, data reside in multiple tables One of the fields in the table is

a primary key, a unique value for each row of data in the table All of the

fields in the table provide information about this primary key For example,

we may have a table about the patient, which would include gender, race,

birthday, and contact information, and a separate table about the encounter

The primary key in the patient table is a patient identifier, such as medical

record number The primary key for the encounter table is a visit

identifica-tion number

The fields in the patient table (e.g., address) are all about the patient;

the fields in the encounter table (e.g., diagnoses) are all about the

encoun-ter The relationships among the tables are indicated through repeating the

primary key of one table in another table In these situations, the key is

referred to as a foreign key For example, in the encounter table, we indicate

the patient by providing the field “patient ID.” To have efficient databases

with no duplication, database designers do not provide any other information

about the patient (e.g., his address) in the encounter table They provide the

address in the patient table, and if the user needs the address of the patient,

then she looks up the address using the ID in the patient table In other

words, databases use as little information as they can to preserve space and to

improve data analysis time Kent (1983) described this by saying that all the

other data “must provide a fact about the key, the whole key, and nothing

but the key.” The “FROM” command specifies which tables should be used

SELECT and FROM Command

SQL reserves some words to be used as its command These words cannot

be used as name of fields or as input in other commands They are generally

referred to as reserve words, meaning these words are reserved to describe

commands in SQL The SELECT command is the most common reserve

word in SQL It is almost always used Its purpose is to filter data It focuses

the analysis on columns of data (i.e., fields) from a table Here is the general

form of the command:

SELECT column1, column2,

FROM table_name;

SELECT is usually followed by one or more field names separated by

com-mas The FROM portion of the command specifies the table it should be

read from Here is an example of the SELECT command:

Trang 37

SELECT TOP 20 * FROM #temp

The above command tells the server to return the top 20 rows of data from the temporary file titled “#temp.” The top 20 modification of the SELECT com-mand is used to restrict the display of large data and enable faster debugging.The prefix to a table must include the name of the database and whether it is a temporary or permanent table To avoid repeatedly including the name of the database in the table names, the name of the database is defined at the start of the code with the USE command:

USE Database1

The code is instructing the computer to use tables in database 1 Once the USE command has been specified, then the table paths that specify the data-base can be dropped

In addition, the query must identify the type of table that is used The place where a table is written is dictated by its prefix A prefix of “dbo” indicates that the table should be permanently written to the computer data storage unit, essentially written as a permanent table inside the database These tables do not disappear until they are deleted

FROM dbo.data

This command says that the query is referencing the permanent table named

“data.” One can also reference temporary tables such as

FROM #data

The hash tag preceding the table name says that the query is referencing a temporary table These types of tables disappear when the query that has cre-ated it is closed These data are not written to the computer’s storage unit

Trang 38

A prefix of double hash tags, ##, indicates that the table is temporary

but should be available to all open windows of SQL code, not just the

win-dow for the session that created it This is particularly helpful in transferring

temporary data to procedures, which are parts of code that are in a

differ-ent location Thus, a single hash tag prefix indicates a temporary local file, a

double hash tag prefix indicates a global temporary file, and the prefix dbo

marks a permanent file

Creating Tables and Inserting Values

In this section, we review how CREATE TABLE and INSERT VALUES can

be used to create three tables and link them together using SQL Assume

that we need to prepare a database that contains three entities: patients,

providers, and encounters For each of these three entities, we need to

cre-ate separcre-ate tables Each table will describe the attributes of one of the three

entities Each attribute will be a separate field Most of the time, there is no

need to create a table or insert its values, as the data needed are imported

Imports often include the table definition and field names Sometimes the

tables are not imported and must be created using SQL To create a table,

we need to specify its name and its fields The command syntax is the

The column parameters specify the names of the fields of the table The

“datatype” parameter specifies the type of data the column can hold Data

types are discussed on various online sites, but the most common are variable

character, integer, float, date, and text Always consult the web for the exact

data types allowed in your implementation of SQL code, as there are

varia-tions in different implementavaria-tions

The patient attributes include first name, last name, date of birth,

address (street name, street number, city, state, zip code), and e-mail First

name is a string of maximum size 20 Last name is a string of maximum

size 50 These are not reasonable maximum lengths; many names and last

names will exceed these sizes, but we are trying a simple example Zip code

is a string of five characters, all of which are digits Date of birth is a date

The state field contains the state the patient lives in The patient’s telephone

number should be text A patient ID (autonumber) should be used as the

Trang 39

primary key for the table When the ID is set to autonumber, the software assigns each record the last number plus one—each record has a unique ID, and the numbers are sequential and with no gap.

Note that, in exhibit 2.1, two patients are shown to live in the same household and have the same last names States are entered in different ways, sometimes referring to Virginia by its abbreviation and others times spelling

it out Note how the letter L in McLean is sometimes capitalized and other times not Note for some phone numbers, the area code is in parentheses and for others not All of this variability in data entry can create errors in data processing, and these variations must be corrected before proceeding Here is code that can create the patient table Field names are put in brackets because they contain spaces As mentioned earlier, the # before the table name indicates that the table is a temporary table that will disappear once the SQL window is closed The patient ID is generated automatically as

an integer that is increased by 1 for each row of data:

CREATE TABLE #Patient (

[ID] INT IDENTITY(1,1) PRIMARY KEY,

[First Name] CHAR(20),

[Last Name] CHAR(50),

[Street Number] INT,

[Street] TEXT,

[City] TEXT,

[State] TEXT,

[Zip Code] CHAR(5),

[Birth Date] DATE,

Zip Code

Date of Birth Email Telephone

Trang 40

The provider attributes are assumed to be first name (size 20), last

name (size 50), whether they are board certified (a yes/no value), date of

hire, telephone entered as text, and e-mail entered as no longer than 75

characters Employee’s ID number should be the primary key for the table

Exhibit 2.2 shows the first three rows of data for providers; note that one of

the providers, Jill Smith, was previously described in exhibit 2.1 as a patient

In SQL Server, there is no “Yes/No” field The closest data type is a

bit type, which assigns it a value of 1, 0, or NULL Also, note again that the

provider ID is generated automatically Here is the code that will create this

table:

CREATE TABLE #Provider (

[ID] INT IDENTITY(1,1) PRIMARY KEY,

[First Name] CHAR(20),

[Last Name] CHAR(50),

[Board Certified] BIT,

[Date of Hire] DATE,

[Email] CHAR(75),

[Phone] TEXT

);

The encounter entity is assumed to have the following attributes:

patient ID, provider ID, diagnosis (size 50), treatment (size 50), and date of

encounter, with encounter ID as a primary key Each encounter should have

its own ID number and is generated automatically Patient and provider IDs

are also in the table, although now they are foreign keys and not primary

keys Exhibit 2.3 shows the first five rows of the encounter table Here is the

code that will create this table:

EXHIBIT 2.2

Three Rows of Data for Example Providers Table

ID First Name Last Name

Board Certified? Date of Hire Email Telephone

Ngày đăng: 23/08/2021, 15:03

TỪ KHÓA LIÊN QUAN

TRÍCH ĐOẠN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w