Title: Big data in healthcare : statistical analysis of the electronic health record / by Farrokh Alemi.. | Summary: “This book introduces health administrators, nurses, physician assi
Trang 3Stephen J O’Connor, PhD, FACHE, Chairman
University of Alabama at Birmingham
West Virginia University
Lynn T Downs, PhD, FACHE
University of the Incarnate Word
Trang 4Association of University Programs in Health Administration, Washington, DC
Trang 5This publication is intended to provide accurate and authoritative information in regard to the subject matter covered It is sold, or otherwise provided, with the understanding that the publisher is not engaged in rendering professional services If professional advice or other expert assistance is required, the services of a competent professional should be sought.
The statements and opinions contained in this book are strictly those of the authors and do not represent the official positions of the American College of Healthcare Executives, the Foundation
of the American College of Healthcare Executives, or the Association of University Programs in Health Administration.
Copyright © 2020 by the Foundation of the American College of Healthcare Executives Printed
in the United States of America All rights reserved This book or parts thereof may not be reproduced in any form without written permission of the publisher
Library of Congress Cataloging-in-Publication Data
Names: Alemi, Farrokh, author
Title: Big data in healthcare : statistical analysis of the electronic
health record / by Farrokh Alemi
Description: Chicago, IL : Health Administration Press, [2019] | Includes
bibliographical references and index | Summary: “This book introduces
health administrators, nurses, physician assistants, medical students,
and data scientists to statistical analysis of electronic health records
(EHRs) The future of medicine depends on understanding patterns in
EHRs This book shows how to use EHRs for precision and predictive
medicine” Provided by publisher
Identifiers: LCCN 2019026815 (print) | LCCN 2019026816 (ebook) | ISBN
9781640550636 (hardcover) | ISBN 9781640550643 (ebook) | ISBN
9781640550650 | ISBN 9781640550667 (epub) | ISBN 9781640550674 (mobi)
Subjects: LCSH: Medical statistics | Data mining.
Classification: LCC RA409 A44 2019 (print) | LCC RA409 (ebook) | DDC
610.2/1 dc23
LC record available at https://lccn.loc.gov/2019026815
LC ebook record available at https://lccn.loc.gov/2019026816
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984 ∞ ™
Acquisitions editor: Jennette McClain; Project manager: Theresa L Rothschadl; Cover designer: James Slate; Layout: PerfecType
Found an error or a typo? We want to know! Please e-mail it to hapbooks@ache.org, mentioning the book’s title and putting “Book Error” in the subject line
For photocopying and copyright information, please contact Copyright Clearance Center at www copyright.com or at (978) 750-8400.
Health Administration Press Association of University Programs
A division of the Foundation of the American in Health Administration
College of Healthcare Executives 1730 M Street, NW
300 S Riverside Plaza, Suite 1900 Suite 407
Trang 8Acknowledgments xvii
Chapter 1 Introduction 1
Chapter 2 Preparing Data Using Structured Query Language (SQL) 11
Chapter 3 Introduction to Probability and Relationships 55
Chapter 4 Distributions and Univariate Analysis 77
Chapter 5 Risk Assessment: Prognosis of Patients with Multiple Morbidities 101
Chapter 6 Comparison of Means .135
Chapter 7 Comparison of Rates 173
Chapter 8 Time to Adverse Events 203
Chapter 9 Analysis of One Observation per Time Period: Tukey’s Chart 223
Chapter 10 Causal Control Charts 239
Chapter 11 Regression 255
Chapter 12 Logistic Regression .309
Chapter 13 Propensity Scoring 327
Chapter 14 Multilevel Modeling: Intercept Regression 345
Chapter 15 Matched Case Control Studies 361
Chapter 16 Stratified Covariate Balancing 383
Chapter 17 Application to Benchmarking Clinicians: Switching Distributions 409
Trang 9Chapter 18 Stratified Regression: Rethinking Regression
Coefficients 427
Chapter 19 Association Network 459
Chapter 20 Causal Networks 487
Index 527
About the Author 551
About the Contributors 553
Trang 10Acknowledgments xvii
Chapter 1 Introduction 1
Why Management by Numbers? 1
Why a New Book on Statistics? .5
Digital Aids and Multimedia 8
Relationship to Existing Courses 8
Audience 8
Five Courses in One Book 8
Supplemental Resources 9
References 9
Chapter 2 Preparing Data Using Structured Query Language (SQL) 11
SQL Is a Necessary Skill 12
What Is SQL? 14
Learn by Searching 14
Common SQL Commands 14
Cleaning Data 38
Should Data Be Ignored? 45
Time Confusion: Landmark, Forward, and Backward Looks 47
Confusion in Unit of Analysis and Timing of Covariates 52
Summary 53
Supplemental Resources 53
References 53
Chapter 3 Introduction to Probability and Relationships 55
Probability 56
Probability Calculus .58
Conditional Probability 61
Odds 62
Bayes’s Formula .62
Independence Simplifies Bayes’s Formula 64
Trang 11Contingency Tables in Excel 68
The Chi-Square Test 71
Relationship Among Continuous Variables 73
Correlation 74
Summary 76
Supplemental Resources 76
References 76
Chapter 4 Distributions and Univariate Analysis 77
Introduction 78
Variables 78
Sample 80
Average 82
Expected Values 83
Standard Deviation 85
Transformation of Variables 87
Transformation of Variables Using Excel 89
Distribution 90
Histogram 92
How to Make a Histogram in Excel .94
Transformation of Distributions 97
Observing Variables over Time 98
Minimum Observations for Control Charts 99
Summary 100
Supplemental Resources 100
Reference 100
Chapter 5 Risk Assessment: Prognosis of Patients with Multiple Morbidities 101
Introduction 102
Alternatives to the Multimorbidity Index 102
The Theory Behind Multimorbidity Index 105
Estimating Parameters of the MM Index 106
Calculation of Likelihood Ratios 106
Adjustment for Rare Diseases 108
Adjustment for Revision 10 108
Sample Size Needed to Construct the MM Index 109
Cross-Validation 109
Checking the Accuracy of Predictions 112
MM Index Compared to Physiological Markers 117
MM Indexes Compared to Other Diagnoses-Based
Trang 12Example of the Use of the MM Index 119
Summary 119
Supplemental Resources 121
References 121
Note 124
Appendix 5.1 125
Appendix 5.2 133
Chapter 6 Comparison of Means .135
Normal Distribution 136
Hypothesis Testing 144
Comparison of Two-Sample Means 151
Control Chart with Normal Distribution 152
Summary 171
Supplemental Resources 172
References 172
Chapter 7 Comparison of Rates 173
Summarizing Discrete Variables 174
The Bernoulli Process and the Binomial Distribution 175
Normal Approximation 179
Inference for a Single Rate 181
Comparison of Two Rates 183
Confidence Interval for Odds Ratio 186
Probability Control Chart 189
Risk-Adjusted P-chart .194
Summary 199
Supplemental Resources 201
Reference 201
Chapter 8 Time to Adverse Events 203
Distribution of Sentinel Events 203
Days to Event 208
Time-Between Charts 209
Example: Analysis of Online Reviews 212
Example: Sticking to Exercise Resolutions 219
Are Insights into Data Worth the Effort? 220
Summary 221
Supplemental Resources 221
References .221
Trang 13Chapter 9 Analysis of One Observation per Time Period:
Tukey’s Chart 223
Tukey’s Chart 223
Example 1: Time to Pain Medication 224
Example 2: Exercise Time and Weight Loss 228
Example 3: Keeping Exercise Patterns 230
Example 4: Medication Errors 232
Example 5: Budget Variation 233
Comparison of Tukey’s and Other Charts 236
Summary 237
Supplemental Resources 237
References 237
Chapter 10 Causal Control Charts 239
Assumptions of Causal Claims 240
Attributable Risk 241
Example: Fires in the Operating Room 242
Causal Analysis in the Context of Control Charts 244
Methods 245
A Simulated Example of Emergency Department Delays 247
Application to Stock Market Prices .250
Summary 252
Supplemental Resources 252
References 253
Chapter 11 Regression 255
Regression Is Everywhere 256
Types of Regression 258
Introduction to Equations 259
Fitting Data to an Equation: Residuals 262
Example: Analysis of Costs 264
Example with a Single Predictor of Cost 266
Independent Variables .271
Main Effect and Interactions .274
Coefficient of Determination .277
Model Building 280
Regression Assumptions: Correctness of Model Form 282
Regression Assumptions: Independence of Error Terms 285
Regression Assumptions: Homoscedasticity 286
Trang 14Regression Assumptions: Normally Distributed Errors 288
Transformation of Data to Remedy Model Violations 290
Effects of Collinearity 291
Importance of Cross-Validation 292
Weighted Regression 294
Shrinkage Methods and Ridge or LASSO Regression .294
Context-Specific Hypothesis Testing 295
Changing Units of Measurement 296
Interpretation of Regression as Cause and Effect 296
Summary 297
Supplemental Resources 297
References 298
Appendix 11.1 300
Chapter 12 Logistic Regression .309
Widespread Use 310
Case Study 312
Logistic Regression Model 313
Example of Ordinary Regression with Logit Transformation 315
Predictors of the Use of an MFH Using R 318
Interpretation of Coefficients 320
Context Dependent Hypothesis Test 321
Measures of Goodness of Fit 322
Summary 324
Supplemental Resources 324
References 324
Chapter 13 Propensity Scoring 327
Widespread Use 328
Propensity Scoring Is a Simulation 329
Three Steps in Propensity Scoring 330
Balancing Through Propensity Scores 331
Propensity Score Quintile Matching 332
Propensity Score Weighting 337
Double Regression 338
Example for Weighted Propensity Scoring 338
Verification of Propensity Scores 342
Overlap and Related Concepts 343
Summary 343
Supplemental Resources 343
Trang 15References 344
Chapter 14 Multilevel Modeling: Intercept Regression 345
Increasing Use 345
Ideas Behind Multilevel Modeling 346
Multilevel Modeling Using Standard Query Language 354
Application of Multilevel Modeling to Other Data Types 358
Measurement Issues 358
Summary 359
Supplemental Resources 359
References 359
Chapter 15 Matched Case Control Studies 361
Widespread Application 362
Representative Data Are Needed 364
Definition of Cases and Controls 364
Measurement of Exposure to Treatment 365
Enrollment and Observation Period 366
Matching Criteria 368
Measurement of Outcomes 371
Verification of Matching 373
Analysis of Outcomes 373
Analysis of Time to Event 377
Overlap 378
Summary 378
Supplemental Resources 379
References 379
Notes 381
Chapter 16 Stratified Covariate Balancing 383
Introduction 384
The History of Stratification 385
Combination of Covariates .385
Impact of Treatment on Binary Outcomes 386
Impact of Treatment on Continuous Outcomes: Difference Models 389
Impact of Treatment on Continuous Outcomes: Weighted Data 390
Comparison to Propensity Scoring 392
Overlap Problem and Solutions 398
Trang 16Automated Removal of Confounding 403
R Package 406
Summary 406
Supplemental Resources 407
References 407
Chapter 17 Application to Benchmarking Clinicians: Switching Distributions 409
Introduction 410
Switching Probabilities 411
Example with Multiple Comorbidities 413
Overlap of Clinician’s and Peer Group’s Patients 416
Synthetic Controls 418
Limitations 420
Summary 421
Supplemental Resources 422
References 424
Chapter 18 Stratified Regression: Rethinking Regression Coefficients 427
Not in Widespread Use 428
Background 428
Multilinear Regression 429
Example: Predicting Cost of Insurance 430
Estimation of Impact of Independent Variables 433
Estimation of Correction Factors 434
Final Write-Up of the Equation 436
Replacing the Multilinear Model with a Multiplicative Model 436
Estimation of Parameters in a Multiplicative Model 437
Determination of Overall Constant k 439
Application to Prognosis of Lung Cancer 439
Structured Query Language Code for Stratified Regression 447
Summary 450
Supplemental Resources 451
References 451
Appendix 18.1 452
Chapter 19 Association Network 459
Not in Widespread Use 459
Trang 17Shrinking Universe of Possibilities 461
Product of Marginal Probabilities 464
Chi-Square Test of Independence 467
Visual Display of Dependence 467
Independence for Three Variables 469
Chi-Square Testing for Three Variables 471
Spurious Correlation 475
Mutual Information 477
Poisson Regression and Tests of Dependence 478
Example Construction of Association Network 481
Summary 484
Supplemental Resources 485
References 485
Chapter 20 Causal Networks 487
Causal Thinking Is Fundamental 488
Use of Network Models 488
So What Is Causation? 489
Key Concepts in Causal Networks 491
Relationship Between Regression and Causal Networks 497
Predicting the Probability of an Event 501
A Numerical Example 503
Causal Impact 506
Back-Door Paths and Markov Blankets 507
Estimating Structure and Parameters of Causal Networks 510
Learning Associations Among Pairs of Variables 511
Directing the Arcs in the Network 514
Learning the Parameters of the Network 516
Verification of Blocked Back Doors 517
Calculating Causal Impact of Cancer on Survival 518
Calculating the Causal Impact of Eating Disability on Death 520
Summary 522
Supplemental Resources 524
References 524
Index 527
About the Author 551
About the Contributors 553
Trang 18From conception through production, this book has been in preparation for
more than a decade During that time, numerous changes were made The
whole plan of the book changed Entirely new chapters were introduced; in
previously drafted chapters, the presentation changed radically and often All
of these changes followed feedback from students, colleagues, and editors,
whom now I need to thank
The book was first edited by Nancy Freeborne, PhD—mostly for
grammar She looked at the first ten chapters Theresa Rothschadl brought
consistency to the references and exhibits as well as the writing style for
the entire book She transformed the awkward language of an immigrant
mathematician to normal English Good editors make you put in the time to
explain yourself more clearly, and I am grateful to both Theresa and Nancy
for taking on this project My colleague and friend, Harold Erdman, was kind
enough to look through corrections of several chapters
As you can see throughout this book, I am heavily influenced by Judea
Pearl’s ideas on causal analysis I like how he connects his work to
sociolo-gists and economists who were also working on causal analysis At times,
however, his writings can be hard to understand Fortunately, when I queried
Dr Pearl for this book, he was gracious in answering my emails This book
also shows how regression applies to causal networks, which was also
influ-enced the scholarship of Ali Shojaie I wrote to him and benefited from his
timely responses, as well In addition, I appreciate the guidance of Kathryn
Blackmond Laskey I presented my half-baked ideas to her, and she gracefully
corrected them I was lucky to have these communications
You cannot write a book without having time to do so The chair of
my department, Peggy Jo Maddox, was kind enough to provide me with
sufficient time to do this In academia, a good chair is rare PJ was godsend
Managing faculty like me is hard A lot of ego is involved, and we don’t
take direction kindly She was gracious and effective Without her advice and
direction, this book would not have been possible I should also mention
Tracy Shevlin and Regina Young, both of whom radically reduced my
admin-istrative burdens Part of faculty–student advising and research budget
super-vision is paperwork—not a little, but a lot of paperwork Tracy and Regina
made my life easier, which in turn allowed me more time to write this book
Trang 19In the decade the book was under preparation, clinicians Raya bek, Mark Schwartz, Allison Williams, and Cari Levy heavily influenced my thinking When they complained about models with thousands of variables, they forced me to explain myself Many examples in the book came from interactions with these clinicians The questions they asked mattered I changed design methods to be more relevant to their needs
Kheir-Sanja Avramovic is one of the closest colleagues I have had in the last decade I would go to her with my standard query language problems For that, I am very much in her debt Janusz Wojtusiak also was a great help, and many of our exchanges appear in this book For example, the work on synthetic controls came from our conversation at a seminar that he made me attend, in spite of my reluctance He was the first person to show me how propensity scoring works I am grateful to him
If you want to understand what enables a book, follow the money During this period, I was supported by grants from the Centers for Disease Control and Prevention (to Lorna Thorpe at New York University) and from the Veterans Administration (to Schwartz, Kheirbek, and Levy) These principal investigators actively supported me In fact, many of the research projects they paid for finished as examples in this book
The body of work presented here has been used as required reading
in three courses that I regularly teach The students in these courses played a large role in the improving this book They pointed out parts that were not clear They created “each one teach one” videos to help their peers in answer-ing problem sets (you can see many of them in the supplementary materials)
I am grateful to all of my many students, but would like to highlight the contributions of Steve Brown, Amr ElRafey, and Timothy P Coffin When
I first started teaching a course on causal analysis, I would start by saying that I did not know the topic well and that it was still changing, which it was Sometimes, when students asked very relevant questions, they would be surprised to hear my answer: “I don’t know.” hey put up with me while I learned, and now I am not only grateful but must also apologize for the pain
I caused when I could not give them answers They truly paved the way for later, more successful classes
I am grateful to my daughter Roshan Badii Alemi When she was working for the Advisory Board, I would pump her for information about her work When I needed to provide examples here of analyses that would
be useful to hospital and clinic administrators, she knew, firsthand, what they wanted to I benefited from her insights Her work on benchmarking was also eye opening It forced me to rethink how synthetic controls should be organized It helped me explain data balancing in ways hospital administra-tors and clinicians can understand
Trang 20I am also grateful to my daughter Yara Badii Alemi She helped
pre-pare a number of videos for the book’s supplementary materials As a theater
student, she knew how to present complex issues She forced me out of dull,
monotone, repetitious, talking-head narrations She showed me how to show
my enthusiasm for the topic She is also the person who took me to a remote
island in Greece, where I thought through stratified covariate balancing while
looking at beautiful blue sea That vacation proves that the best ideas come
to you when you are having fun
I am surprised at how much statistics has changed, even in a short
decade I once thought statistics was a stable science I was wrong—it is
in constant turmoil I thought I knew how to do hypothesis testing I was
wrong I thought I knew how to do statistical modeling I was wrong I am
grateful that at the infancy of data science, when we had just began to look
at massive data sets, I had the opportunity to learn I was there when the
work of data scientists went from obscure, behind-the-scenes jobs done in
basement offices to strategic, frontline positions of primary importance to
their organizations How cool is it to witness and chronicle radical change in
statistics? When I was a student, there was no introductory book on statistics
like this one I am grateful for the opportunity to write it
Farrokh Alemi
Trang 22Chapter at a Glance
This book introduces health administrators, nurses, physician assistants,
medical students, and data scientists to statistical analysis of electronic health
records (EHRs) The future of medicine depends on understanding patterns
in EHRs This book shows how to use EHRs for precision and predictive
medicine This chapter introduces why a new book on statistical analysis is
needed and how healthcare managers, analysts, and practitioners can benefit
from fresh educational tools in this area
Why Management by Numbers?
This textbook provides a radically different alternative to books on statistical
analysis It de‑emphasizes hypothesis testing It focuses primarily on remov‑
ing confounding in EHRs It emphasizes data obtained from EHRs and thus,
by necessity, involves a great deal of structured query language (SQL)
The management and practice of healthcare are undergoing revo‑
lutionary changes (McAfee and Brynjolfsson 2012) More information is
available than ever before, both inside and outside of organizations Massive
databases, often referred to as big data, are available and accessible These
data can inform management and practitioners’ decisions The growing use
of EHRs has enabled healthcare organizations, especially hospitals and insur‑
ance companies, to access large data sets Inside organizations, EHRs can
measure countless operational and clinical metrics that enhance the organiza‑
tion’s productivity
All sorts of data points are available for scrutiny Analysts can track
who is doing what and who is achieving which outcomes Providers can be
benchmarked; front desk staff efficiency can be monitored Data are available
on the true cost of operations, as nearly every activity is tracked Contracts
with health maintenance organizations can be negotiated with real data on
cost of services Data are available on profitability of different operations, so
unprofitable care can be discontinued Managers can detect unusual patterns
in the data For example, they can see that hospital occupancy affects emer‑
gency department backup
1
Trang 23In the healthcare field, data are available on pharmaceutical costs and their relationship to various outcomes Many organizations have lists of med‑ications on their formulary, and now such lists can be based on both cost and outcome data Medications can be prescribed with more precision and less waste Data can be used to predict future illnesses; diseases can be prevented before they occur The wide availability of massive amounts of data has made managing with numbers easier and more insightful The following are some examples of how healthcare organizations are gathering massive databases to enable insights into best practices (Jaret 2013):
1 The Personalized Medicine Institute at Moffitt Cancer Center tracks more than 90,000 patients at 18 different sites around the country
2 In any given year, the Veterans Affairs Informatics and Computing Infrastructure (VINCI) collects data on more than 6 million veterans across 153 medical centers
3 Kaiser Permanente has a database of 9 million patients
4 Aurora Health Care system has 1.2 million patients in its data systems
5 The University of California’s medical centers and hospitals have a database with more than 11 million patients
6 The US Food and Drug Administration has the combined medical records of more than 100 million individuals to track the postlaunch effectiveness of medications
7 The Agency for Healthcare Research and Quality has compiled claims data across 50 states
8 The Centers for Medicare & Medicaid Services releases 5 percent samples of its massive data
In addition to planned efforts to collect information, data gather on their own on the web Patients’ preferences, organization market share, and competitive advantages can all be determined from analysis of internet com‑ments (Alemi et al 2012) The internet of things collects massive data on consumers’ behavior Most web data are in text format Analysis of these data requires text processing, a growing analytical field
Big data is influencing which managers will succeed and which will not “As the tools and philosophies of big data spread, they will change the long‑lasting ideas about the practice of management” (Eshkenazi 2012) Companies that get insights through analysis of big data are expected to
do better than those that do not, and therefore these managers will suc‑ceed more often There are many examples of how data‑driven companies succeed over counterparts that ignore data analysis At Mercy Hospital in Iowa City, Iowa, managers who benchmark their clinicians and pay them
Trang 24for performance report 6.6 percent improvements in the quality of care
(Izakovic 2007)
Many investigators point out that the Veterans Health Administration
(VHA) was able to reinvent itself because it focused on measurement of
performance (Longman 2010) The VHA healthcare system had poor qual‑
ity of care—until the VHA became data driven Then, over a short interval,
VHA managers and clinicians were able to not only change the culture but
also change patient outcomes According to Longman (2010), the VHA
system now reports some of the best outcomes for patients anywhere in
United States
A recent study of 330 North American companies showed widespread
positive attitudes toward data evaluation The more companies characterized
themselves as data driven, the more they were likely to outperform their
competitors financially and operationally Data‑driven companies were 5
percent more productive and 6 percent more profitable than less data‑driven
companies (Brynjolfsson, Hitt, and Kim 2011)
In healthcare, companies that rely heavily on Lean (a process improve‑
ment tool) and other similar tools can be classified as data driven, even if they
rely on small data sets These companies use statistical process control to
verify that changes have led to improvements Many studies show that when
organizations fully implement statistical process control tools, including an
emphasis on measurement (Nelsonet al 2000), they deliver better care at
lower cost (Shortell, Bennett, and Byck 1998) The use of these techniques
is widespread, making it an essential capability of modern managers (Vest and
Gamm 2009)
In healthcare, the use of EHRs has been associated with reductions
in medication errors (Stürzlinger et al 2009) Managers have used EHRs
to maximize reimbursement in ways that have surprised insurers (Abelson,
Creswell, and Palmer 2012) Other managers report analyzing data in EHRs
to reduce “never events” (unreimbursable accidents) in their facilities and
to measure quality of care (Glaser and Hess 2011) These efforts show that
analysts are finding ways to use the data in EHRs to improve their organiza‑
tions Such efforts are expected to continue, creating an unprecedented shift
toward the heavy use of data
Big data has changed and continues to change health insurance Insur‑
ance companies are trimming their networks using data on the performance
of their doctors New start‑up insurance companies are competing more
effectively with well‑established insurance companies by situating their sec‑
ondary providers near their target market Insurance companies are deciding
what to cover and what to discourage through data analysis Risk assessment
is changing, and more accurate models are reducing the risk of insurance In
risk rating, chronological age may not be as important as history of illness
Trang 25Value‑based payment systems have transformed who assumes risk Value‑based reimbursement has changed how hospitals and clinics are paid With this paradigm shift, insurers hold hospital managers accountable for quality of care inside and outside of hospitals For example, a hospital that does a hip replacement is paid a fixed amount of money for expenses, including the cost of surgery and out‑of‑hospital costs 90 days after sur‑gery The hospital manager needs to make sure not only that the healthcare organization’s surgeons are effective and that its operation does not lead
to unnecessary long stays, but also that patients are discharged to nursing homes or other institutions that actively work on the patients’ recovery Affiliation with a home health care organization or nursing home could help decrease readmission and could easily reduce the hospital’s payments For 90 days, no matter where the patient is cared for, the hospital manager
is at risk for cost overruns Value‑based reimbursements have increased the need to analyze data and affiliate with providers and institutions that are cost‑effective
Big data is changing clinical practice as well The availability of data has enabled managers and insurers to go beyond traditional roles and address clinical questions For the first time, analysts can measure the comparative effectiveness of different healthcare interventions They can talk to physi‑cians, nurse practitioners, and physician assistants about their clinical prac‑tices They can discourage patients from undergoing unnecessary operations For years, clinical decisions were made by clinicians, but the availability of data is beginning to change this For example, the Centers for Disease Con‑trol and Prevention uses Data to Care (D2C) procedures to identify HIV patients who have stopped taking their medications Careful communication with these patients can bring them back to care In addition, payers such as Amazon are organizing population‑level interventions to improve delivery of care Analysts are alerting primary care providers about potential substance abuse and alerting patients about the need for flu shots These efforts are giving extended clinical roles to data analysts
Data are changing the healthcare equation Today, managers have data on what is best for patients, and they can work with their clinicians to change practices For example, analysts have been able to examine pairs of drugs that cause a side effect not associated with the use of either drug on its own They found that Paxil, a widely used antidepressant, and Pravas‑tatin, a cholesterol‑lowering drug, raise patients’ blood sugar level when used together (Tatonetti et al 2012) In this example, and other compara‑tive effectiveness studies, we see an emerging new role for data scientists
Trang 26Why a New Book on Statistics?
Big Data in Healthcare differs from existing introductory statistics books in
many ways Exhibit 1.1 lists how this textbook’s emphasis differs from that of
other managerial statistics books First, it exclusively focuses on the applica‑
tion of statistics to EHRs All examples in this book come from healthcare
They include use of statistics for healthcare marketing, cost accounting,
strategic management, personnel selection, pay‑for‑performance, value‑based
payment systems, insurance contracting, and clinician benchmarking These
examples are given to illustrate the importance of quantitative analysis to
management of healthcare
Second, the book de‑emphasizes traditional hypothesis testing and
emphasizes statistical process control For healthcare managers, hypothesis
testing is of little use; such testing requires the use of static populations and
context‑free tests that simply do not exist in the real world In contrast,
healthcare managers have to examine their hypotheses over time and thus
need to rely on statistical process control Alternately, they need to test a
hypothesis while controlling for other conditions and must therefore rely on
multivariate analysis as opposed to univariate hypothesis tests
Most existing books focus on hypothesis testing through confidence
intervals and standardized normal distributions Big Data in Healthcare intro‑
duces these concepts through statistical process control Confidence intervals
are discussed in terms of 95 percent upper and lower control limits in control
charts The use of geometric distributions in time‑between control charts is
discussed This book covers the use of Bernoulli and binomial distributions
in creating probability control charts It discusses the use of normal distribu‑
tions in creating X‑bar control charts and provides students with knowledge
of hypothesis testing in the context of observational data collected over time
Third, this book differs from most other introductory statistics text‑
books in that it mostly relies on EHR‑based data Healthcare is swimming in
data Data analysts need to structure and delete large amounts of data before
they can address a specific problem EHR data are observational, not experi‑
mental Managers rarely have the option to run randomized experiments
Because data come from operational EHRs, where data are collected from
patients who voluntarily participate in various treatments, a number of steps
must be taken to remove confounding in data In jest, analysts call these steps
“torturing data until they confess.”
In EHRs, data are available in numerous small tables, and not in one
large matrix, as most statistical books require This book gives considerable
Trang 27attention to how data from different tables should be merged Throughout the book, I have relied on SQL to make the manipulation of data easier Because the data are inside EHRs, SQL is required to manage the data—other statistical packages are just not available for EHRs Statistical analysis is really just the tip of the iceberg; much more work and time go into prepar‑
ing the data than into analyzing them Big Data in Healthcare: Statistical
Analysis of the Electronic Health Record pays special attention to preparation
Topic Emphasis of Other Books Emphasis of This Book
Distributions • Normal, uniform, and
other continuous tributions with little coverage of discrete probability theory
dis-• Probability distribution in discrete events, including Bernoulli, binomial, geomet-ric, and Poisson distributions
• Normal distribution as an approximation
Data • Measures collected
from independent samples
• Prospective data collection
• Longitudinal, time-based, repeated measures
• Estimation of upper and lower control limits in pro-cess control charts
• Bootstrapped estimates of variability
Univariate methods of inference
• Comparison of mean to population
• Comparison of two means
• Paired t-test and
com-parison of dependent means
• Analysis of variance
• Statistical process control tools such as XmR charts, p-charts, time-between charts, Tukey’s charts
• Risk-adjusted process trol tools
con-Multivariate analysis
Trang 28In comprehensive EHRs, data are available on patients from birth
until death To use these data, we need to understand their time frame Sev‑
eral statistical methods have been designed based on the sequenced order of
events EHR data enable new methods of analysis not otherwise available
Data are collected passively as events occur Over time, more data
are available, and one major task of the manager is to decide which data are
relevant The data themselves never stop flowing, and the manager must
decide which period he would like to examine and why EHRs are also full of
surprises, and some data must be discarded because they are erroneous (e.g.,
male pregnancy, visits after death)
Perhaps most important, this book focuses on causal interpretation of
statistics In the past, statisticians have focused on association among vari‑
ables They have worked under the slogan that “correlation is not causation.”
While that statement is valid, policymakers, managers, and other decision
makers act on the statistical findings as if correlation was causal Any action
assumes that the statistical findings are causal—that is, that changing one
variable will lead to the desired impact Statisticians who insist on avoiding
causal interpretation of their findings are naive and are ignoring the obvious:
their findings might be used differently than their planned precautions might
have indicated At the same time, they are also right to assert that causes are
more than correlations To interpret a variable as causing a change in another
variable, we need to establish four principles:
1 Association Causes have a statistically significant impact on effects.
2 Sequence Causes occur before effects.
3 Mechanism A third variable mediates the impact of the cause on the
effect
4 Counterfactual In the absence of causes, effects are less likely to occur.
These four criteria allow us to discuss and vet causes rather than
simply evaluating associations In recent decades, statisticians have revisited
their approach of avoiding causal interpretation and have introduced new
techniques and methods that allow for evaluation of causality For example,
causal network models are an alternative to regression analysis Network
models allow the verification of the four assumptions of causality; regression
models do not Another example, propensity scoring, allows statisticians to
remove confounding in multivariate analysis and provides a causal estimate
of the impact of a variable This book starts with associations and conditional
probabilities, but it uses these concepts to move on to propensity‑matched
regression analysis or causal networks Even in early chapters, where we
discuss stratification and distributions, we lay the foundation for causal inter‑
pretations In openly discussing causality, this book differs from many other
Trang 29Digital Aids and Multimedia
The book is accompanied by (1) slides to teach the course content, (2) video lectures, (3) video examples to illustrate the points made in the lectures, (3) extensive end‑of‑chapter exercises, (4) solutions to odd‑numbered examples, and (5) a sample test set for midterm and finals Topics in these supplements may be broader than the book, so take a look at them
Relationship to Existing Courses
Students often do not understand the relationship between an introduc‑tory statistics course and other material they cover in health administration
Big Data in Healthcare: Statistical Analysis of the Electronic Health Record
makes these linkages explicit At the end of each chapter, the book directs you to the course website for problems to solve Each problem is tied to
a specific health administration or health informatics course For example, problems in statistical process control are linked to courses in quality improvement A problem in fraud detection is tied to the course in account‑ing For still another example, comparative effectiveness analysis is linked to courses in strategy, informatics, and program evaluation The expectation is that students will not only learn statistical concepts but also understand the connections between this course and various other courses in health admin‑istration programs
Audience
The primary audience of this book is health administration and informatics students In addition, nursing, physician assistant, and medical students may benefit This book is not intended for a nonhealthcare audience
Five Courses in One Book
This book can be used to teach many different courses:
1 The chapter on data preparation (chapter 2) and the chapter on risk assessment (chapter 5) can be used to teach an introductory course about SQL These chapters present basic SQL commands and their use
in constructing predictive models Throughout the book, numerous
Trang 30examples of SQL code are provided that can further help students learning
database design and analysis The supplemental material of this chapter
provides a syllabus for how to use this book to teach a course on SQL
2 Chapters 3 through 7 can be used to replace an introductory course in
statistics that focuses on hypothesis testing These chapters introduce
the concept of hypothesis testing and distributions A syllabus is
provided for courses that are exclusively focused on traditional
hypothesis testing The syllabus lists specific chapters and parts of
chapters that may be helpful
3 Chapters that focus on process control (chapters 5 through 10) can be
used in a course on quality improvement Many quality improvement
courses discuss the general concepts but not the statistical tools, which
is unfortunate This book can improve the content of courses on
quality improvement A syllabus is provided for this type of course
4 Chapters 11 and 12 can be used to teach a course on multivariate
regression analysis Chapters 13 (on propensity scoring), 14 (on
hierarchical modeling), and 18 (on stratified regression) further show
the value of ordinary regression Again, a syllabus is provided for how
to use this book to teach regression
5 Chapters 13 through 20 can also be used to teach a course on causal
analysis, especially in the context of comparative effectiveness analysis
These chapters enable students to remove confounding in EHR data
The supplemental material includes a syllabus for how to use this book
to teach causal and comparative effectiveness courses
Supplemental Resources
See tools for course design and syllabuses for various types of courses on
the web
References
Abelson, R., J Creswell, and G Palmer 2012 “Medicare Bills Rise as Records
Turn Electronic.” New York Times Published September 21 www.nytimes
.com/2012/09/22/business/medicare‑billing‑rises‑at‑hospitals‑with ‑electronic
‑records.html
Alemi, F., M Torii, L Clementz, and D C Aron 2012 “Feasibility of Real‑Time
Satisfaction Surveys Through Automated Analysis of Patients’ Unstructured
Comments and Sentiments.” Quality Management Health Care 21 (1): 9–19.
Trang 31Brynjolfsson, E., L Hitt, and H Kim 2011 “Strength in Numbers: How Does Data‑Driven Decisionmaking Affect Firm Performance?” Accessed October
15, 2018 www.a51.nl/storage/pdf/SSRN_id1819486.pdf
Eshkenazi, A 2012 “Joining the Big Data Revolution.” SCM NOW Magazine
Accessed April 10, 2019 www.apics.org/apics‑for‑individuals/apics
‑magazine ‑home/magazine‑detail‑page/2012/10/26/joining ‑the‑big
‑data‑revolution
Glaser, J., and R Hess 2011 “Leveraging Healthcare IT to Improve Operational
Performance.” Healthcare Financial Management 65 (2): 82–85.
Izakovic, M 2007 “New Trends in the Management of Inpatients in U.S Hos‑pitals—Quality Measurements and Evidence‑Based Medicine in Practice.”
Bratislavské Lekárske Listy 108 (3): 117–21.
Jaret, P 2013 “Mining Electronic Records for Revealing Health Data.” New York
Times Published January 14 www.nytimes.com/2013/01/15/health/
mining ‑electronic ‑records‑for‑revealing‑health‑data.html
Longman, P 2010 Best Care Anywhere: Why VA Health Care Is Better Than Yours,
2nd ed San Francisco: Berrett‑Koehler Publishers
McAfee, A., and E Brynjolfsson 2012 “Big Data: The Management Revolution.”
Harvard Business Review 90 (10): 60–66.
Nelson, E C., M E Splaine, M M Godfrey, V Kahn, A Hess, P Batalden, and
S K Plume 2000 “Using Data to Improve Medical Practice by Measur‑
ing Processes and Outcomes of Care.” Joint Commission Journal on Quality
Improvement 26 (12): 667–85.
Shortell, S M., C L Bennett, and G R Byck 1998 “Assessing the Impact of Continuous Quality Improvement on Clinical Practice: What It Will Take to
Accelerate Progress.” Milbank Quarterly 76 (4): 593–624.
Stürzlinger, H., C Hiebinger, D Pertl, and P Traurig 2009 “Computerized Phy‑sician Order Entry: Effectiveness and Efficiency of Electronic Medication
Ordering with Decision Support Systems.” GMS Health Technology Assessment
19 (5): Doc07
Tatonetti, N P., P P Ye, R Daneshjou, and R B Altman 2012 “Data‑Driven
Prediction of Drug Effects and Interactions.” Science Translational Medicine
4 (125): 125ra31
Vest, J R., and L D Gamm 2009 “A Critical Review of the Research Literature
on Six Sigma, Lean and StuderGroup’s Hardwiring Excellence in the United States: The Need to Demonstrate and Communicate the Effectiveness of
Transformation Strategies in Healthcare.” Implementation Science: 35.
Trang 32PREPARING DATA USING STRUCTURED
QUERY LANGUAGE (SQL)
Learning Objectives
1 Use basic standard query language (SQL) commands to manipulate
data
2 Select an appropriate set of predictors, including predictors that are
rare, obvious, and not in the causal path from treatment to outcome
3 Identify and clean typical contradictory data in electronic health
records
Key Concepts
• Structured query language (SQL)
• Primary and foreign keys
• SELECT, FROM, CREATE, WHERE, HAVING, GROUP BY,
ORDER BY, and other commands
• Inner, outer, left, right, full, and cross joins
• GETDATE, CONCAT, STUFF functions
• RANK, RAND functions
• Rare, obvious, causal pathways
• Comorbidity versus complications
• Landmark, forward, and backward looks
Chapter at a Glance
This chapter introduces standard query language (SQL) and how data can be
prepared for analysis Data preparation is fundamental to analysis Without
proper preparation of the data, the analysis can be misleading and erroneous
Details matter—the way each variable in the analysis is defined affects how
predictive it will be Nothing works better for data preparation than SQL
Therefore, this chapter spends a great deal of time on the use of SQL It then
2
Trang 33shows how SQL can be used to avoid some common data errors (e.g., dead
or unborn patients visiting the clinic)
SQL Is a Necessary Skill
Data in electronic health records (EHRs) are in multiple tables Patient mation is in one table Prescription data are in another Data on diagnoses are often in an outpatient encounter table Hospital data are in still another table An important first step in any data analysis is to pull various variables
infor-of interest into the same table Combining data from multiple tables leads
to a large—often sparse—new table, where all the variables are present but many have missing values For example, patient X could have a diagnosis and prescription data but no hospital data if she was never hospitalized Patient Y could have a diagnosis, prescription, and hospital data but be missing some other data (e.g., surgical procedure) if he did not have any surgery The pro-cedure to pull the data together requires the use of standard query language (SQL)
Before any analysis can be done, data must be merged into a single
table, often called the matrix format, so that all relevant variables are
pres-ent in the same place Many statistical books do not show how this can be done and thus leave the analyst at a disadvantage in handling data from EHRs These books do not teach use of SQL In contrast, I do I take a dif-ferent approach from most statistical books and believe that SQL and data preparation are essential components of data analysis An analyst who wants
to handle data in EHRs needs to know SQL; there are no ifs, ands, or buts about this Accurate statistical analysis requires careful data preparation, and data preparation requires SQL Statisticians who learn statistics without a deep understanding of data preparation may remain confused about their data, a situation akin to living your life not knowing your parents, where you
came from, or, for that matter, who you are You can live your life in a fog,
but why do so? Knowing the source of the data and its unique features can give the analyst insight into anomalies in the data
Statisticians spend most of their time preparing data—perhaps 80 cent, which is more than is spent actually conducting the analysis Ignoring tools for better preparation of data would significantly handicap the statisti-cian Knowing SQL helps with the bulk of what statistical analysts do, which
per-is why training in it per-is essential and fundamental
Decisions made in preparing the data could radically change cal findings These decisions need to be made carefully and transparently; the analyst must make every attempt to communicate the details of these
Trang 34statisti-preparations to the manager Decisions made in preparing the data should be
well thought out—otherwise good data may be ruined with poor
preprocess-ing Some common errors in preparing data include the following:
• Visits and encounters reported for deceased patients For example, when a
patient’s date of visit or date of death is entered incorrectly, it may look
like dead patients (zombies) are visiting the provider Errors in entry
of dates of events would skew results; thus, cleaning up these errors is
crucial
• Inconsistent data Examples might be a pregnant male or negative cost
values Inconsistent data must be identified and steps must be taken to
resolve these inconsistencies
• Incongruous data After a medication error, one would expect to see
long hospital stays rather than a short visit If that is not the case, the
statistician should review the details to see why not
• Missing information Sometimes, missing information could be replaced
with the most likely response; other times, missing information could
be used as a predictor For example, if a diagnosis is not reported in
the medical record, the most common explanation is that the patient
did not suffer from the condition Sometimes the reverse could be
true If a dead emergency room patient is missing a diagnosis of cardiac
arrest, it is possible that there was no time to diagnose the patient but
the patient had the diagnosis For example, Alemi, Rice, and Hankins
(1990) found that missing diagnoses in emergency department patients
increases the risk of subsequent mortality Before proceeding with
the analysis, missing values must be imputed One must check to see
whether data are missing at random or associated with outcomes
There are many different strategies for dealing with missing values, and
the rationale for each imputation should be examined
• Double-counted information When data are duplicated because analysts
joined two tables using variables that have duplicate values, errors
commonly occur
In short, a great deal must be done before any data analysis
com-mences The analyst needs a language and software that can assist in
prepa-ration of data Of course, we do not need statisticians to become computer
programmers Thankfully, SQL programming is relatively easy (there are few
commands) and can be picked up quickly This chapter exposes the reader
to the most important SQL commands These include SELECT, GROUP
BY, WHERE, JOIN, and some key text manipulation functions These
com-mands are for the most part sufficient for most data preparation tasks
Trang 35What Is SQL?
SQL is a language for accessing and manipulating relational databases SQL was organized by the American National Standards Institute, meaning that its core commands are the same across vendors The current standard is from
1999, which is a long time for a standard to remain stable This longevity is
in part a result of the fact that SQL is well suited to the task of data lation The data manipulation portion of SQL is designed to add, change, and remove data from a database In this chapter, we primarily focus on data manipulation commands, which include things such as commands to retrieve data from a database, insert data in a database, update data already in the database, and delete data from a database
manipu-SQL also includes data definition language These commands are used to create a database, modify its structure, and destroy it when you
no longer need it There are also different types of tables—for example, temporary tables of data that are deleted when you close your SQL data management software We will also discuss data definition commands later
in this chapter
Finally, SQL also includes data control language These commands protect the database from unauthorized access, from harmful interaction among multiple database users, and from power failures and equipment mal-functions We will not cover these commands in this chapter
Learn by Searching
Users usually learn the format for an SQL command through searches on the web I assume that you can do so on your own In fact, whenever you run into an error, you should always search for the error on the web On the web, you will see many instances of others posting solutions to your problem
Do this first, because it is the best way to get your problems solved Most students of SQL admit that they learned more from web searches than any instruction or instructor The beauty of such learning is that you learn just enough to solve the problem at hand
Common SQL Commands
Different implementations of SQL exist In this chapter, we use the soft SQL Server’s version Other versions of SQL, such as dynamic SQL or Microsoft Access, are also available If the reader is familiar with the concept
Micro-of code laid out here, she can also find on the web the equivalent version Micro-of
Trang 36the code in a different language Learn one and you have learned almost all
SQL languages
Primary and Foreign Keys
In EHRs, data reside in multiple tables One of the fields in the table is
a primary key, a unique value for each row of data in the table All of the
fields in the table provide information about this primary key For example,
we may have a table about the patient, which would include gender, race,
birthday, and contact information, and a separate table about the encounter
The primary key in the patient table is a patient identifier, such as medical
record number The primary key for the encounter table is a visit
identifica-tion number
The fields in the patient table (e.g., address) are all about the patient;
the fields in the encounter table (e.g., diagnoses) are all about the
encoun-ter The relationships among the tables are indicated through repeating the
primary key of one table in another table In these situations, the key is
referred to as a foreign key For example, in the encounter table, we indicate
the patient by providing the field “patient ID.” To have efficient databases
with no duplication, database designers do not provide any other information
about the patient (e.g., his address) in the encounter table They provide the
address in the patient table, and if the user needs the address of the patient,
then she looks up the address using the ID in the patient table In other
words, databases use as little information as they can to preserve space and to
improve data analysis time Kent (1983) described this by saying that all the
other data “must provide a fact about the key, the whole key, and nothing
but the key.” The “FROM” command specifies which tables should be used
SELECT and FROM Command
SQL reserves some words to be used as its command These words cannot
be used as name of fields or as input in other commands They are generally
referred to as reserve words, meaning these words are reserved to describe
commands in SQL The SELECT command is the most common reserve
word in SQL It is almost always used Its purpose is to filter data It focuses
the analysis on columns of data (i.e., fields) from a table Here is the general
form of the command:
SELECT column1, column2,
FROM table_name;
SELECT is usually followed by one or more field names separated by
com-mas The FROM portion of the command specifies the table it should be
read from Here is an example of the SELECT command:
Trang 37SELECT TOP 20 * FROM #temp
The above command tells the server to return the top 20 rows of data from the temporary file titled “#temp.” The top 20 modification of the SELECT com-mand is used to restrict the display of large data and enable faster debugging.The prefix to a table must include the name of the database and whether it is a temporary or permanent table To avoid repeatedly including the name of the database in the table names, the name of the database is defined at the start of the code with the USE command:
USE Database1
The code is instructing the computer to use tables in database 1 Once the USE command has been specified, then the table paths that specify the data-base can be dropped
In addition, the query must identify the type of table that is used The place where a table is written is dictated by its prefix A prefix of “dbo” indicates that the table should be permanently written to the computer data storage unit, essentially written as a permanent table inside the database These tables do not disappear until they are deleted
FROM dbo.data
This command says that the query is referencing the permanent table named
“data.” One can also reference temporary tables such as
FROM #data
The hash tag preceding the table name says that the query is referencing a temporary table These types of tables disappear when the query that has cre-ated it is closed These data are not written to the computer’s storage unit
Trang 38A prefix of double hash tags, ##, indicates that the table is temporary
but should be available to all open windows of SQL code, not just the
win-dow for the session that created it This is particularly helpful in transferring
temporary data to procedures, which are parts of code that are in a
differ-ent location Thus, a single hash tag prefix indicates a temporary local file, a
double hash tag prefix indicates a global temporary file, and the prefix dbo
marks a permanent file
Creating Tables and Inserting Values
In this section, we review how CREATE TABLE and INSERT VALUES can
be used to create three tables and link them together using SQL Assume
that we need to prepare a database that contains three entities: patients,
providers, and encounters For each of these three entities, we need to
cre-ate separcre-ate tables Each table will describe the attributes of one of the three
entities Each attribute will be a separate field Most of the time, there is no
need to create a table or insert its values, as the data needed are imported
Imports often include the table definition and field names Sometimes the
tables are not imported and must be created using SQL To create a table,
we need to specify its name and its fields The command syntax is the
The column parameters specify the names of the fields of the table The
“datatype” parameter specifies the type of data the column can hold Data
types are discussed on various online sites, but the most common are variable
character, integer, float, date, and text Always consult the web for the exact
data types allowed in your implementation of SQL code, as there are
varia-tions in different implementavaria-tions
The patient attributes include first name, last name, date of birth,
address (street name, street number, city, state, zip code), and e-mail First
name is a string of maximum size 20 Last name is a string of maximum
size 50 These are not reasonable maximum lengths; many names and last
names will exceed these sizes, but we are trying a simple example Zip code
is a string of five characters, all of which are digits Date of birth is a date
The state field contains the state the patient lives in The patient’s telephone
number should be text A patient ID (autonumber) should be used as the
Trang 39primary key for the table When the ID is set to autonumber, the software assigns each record the last number plus one—each record has a unique ID, and the numbers are sequential and with no gap.
Note that, in exhibit 2.1, two patients are shown to live in the same household and have the same last names States are entered in different ways, sometimes referring to Virginia by its abbreviation and others times spelling
it out Note how the letter L in McLean is sometimes capitalized and other times not Note for some phone numbers, the area code is in parentheses and for others not All of this variability in data entry can create errors in data processing, and these variations must be corrected before proceeding Here is code that can create the patient table Field names are put in brackets because they contain spaces As mentioned earlier, the # before the table name indicates that the table is a temporary table that will disappear once the SQL window is closed The patient ID is generated automatically as
an integer that is increased by 1 for each row of data:
CREATE TABLE #Patient (
[ID] INT IDENTITY(1,1) PRIMARY KEY,
[First Name] CHAR(20),
[Last Name] CHAR(50),
[Street Number] INT,
[Street] TEXT,
[City] TEXT,
[State] TEXT,
[Zip Code] CHAR(5),
[Birth Date] DATE,
Zip Code
Date of Birth Email Telephone
Trang 40The provider attributes are assumed to be first name (size 20), last
name (size 50), whether they are board certified (a yes/no value), date of
hire, telephone entered as text, and e-mail entered as no longer than 75
characters Employee’s ID number should be the primary key for the table
Exhibit 2.2 shows the first three rows of data for providers; note that one of
the providers, Jill Smith, was previously described in exhibit 2.1 as a patient
In SQL Server, there is no “Yes/No” field The closest data type is a
bit type, which assigns it a value of 1, 0, or NULL Also, note again that the
provider ID is generated automatically Here is the code that will create this
table:
CREATE TABLE #Provider (
[ID] INT IDENTITY(1,1) PRIMARY KEY,
[First Name] CHAR(20),
[Last Name] CHAR(50),
[Board Certified] BIT,
[Date of Hire] DATE,
[Email] CHAR(75),
[Phone] TEXT
);
The encounter entity is assumed to have the following attributes:
patient ID, provider ID, diagnosis (size 50), treatment (size 50), and date of
encounter, with encounter ID as a primary key Each encounter should have
its own ID number and is generated automatically Patient and provider IDs
are also in the table, although now they are foreign keys and not primary
keys Exhibit 2.3 shows the first five rows of the encounter table Here is the
code that will create this table:
EXHIBIT 2.2
Three Rows of Data for Example Providers Table
ID First Name Last Name
Board Certified? Date of Hire Email Telephone