The amount of data in our world has been exploding, and analyzing large datasets—so called big data—will become a key basis of competition in business.Statisticians and researchers will
Trang 2The amount of data in our world has been exploding, and analyzing large datasets—so called big data—will become a key basis of competition in business.Statisticians and researchers will be updating their analytic approaches, methods,and research to meet the demands created by the availability of big data Thegoal of this book is to show how advances in data science have the ability tofundamentally influence and improve organizational science and practice Thisbook is primarily designed for researchers and advanced undergraduate andgraduate students in psychology, management, and statistics
Scott Tonidandel is Associate Professor, Department of Psychology, DavidsonCollege, NC He received his PhD in Industrial Organizational Psychology fromRice University in 2001 He teaches courses in Psychological Research, Designand Analysis, and Research Methods and Issues in Psychology His researchincludes issues related to computerized testing, and statistical and methodologicalissues
Eden B King is Associate Professor of Industrial Organizational Psychology atGeorge Mason University She earned her PhD from Rice University in 2006 Herresearch is mostly in the area of diversity, inclusion, and women in business She
is currently the Associate Editor of the Journal of Management and the Journal
of Business and Psychology She is also on the Editorial Board of the Academy of Management Journal.
Jose M Cortina, Professor of Industrial Organizational Psychology at GeorgeMason University, is President Elect of SIOP He received his PhD in Psychologyfrom Michigan State University He serves as Editor of the I-O research methods
journal Organizational Research Methods He has an outstanding publication
Trang 3record and a tremendously high level of visibility in this field.
Trang 4The Organizational Frontiers Series is sponsored by the Society for Industrial andOrganizational Psychology (SIOP) Launched in 1983 to make scientificcontributions accessible to the field, the series publishes books addressingemerging theoretical developments, fundamental and translational research, andtheory-driven practice in the field of Industrial-Organizational Psychology andrelated organizational science disciplines including organizational behavior,human resource management, and labor and industrial relations
Books in this series aim to inform readers of significant advances in research;challenge the research and practice community to develop and adapt new ideas;and promote the use of scientific knowledge in the solution of public policy issuesand increased organizational effectiveness
The Series originated in the hope that it would facilitate continuous learningand spur research curiosity about organizational phenomena on the part of bothscientists and practitioners
The Society for Industrial and Organizational Psychology (SIOP) is aninternational professional association with an annual membership of more than8,000 industrial-organizational (I-O) psychologists who study and apply scientificprinciples to the workplace I-O psychologists serve as trusted partners tobusiness, offering strategically focused and scientifically rigorous solutions for anumber of workplace issues SIOP’s mission is to enhance human well-being andperformance in organizational and work settings by promoting the science,practice, and teaching of I-O psychology For more information about SIOP,please visit www.siop.org
Trang 6Gilad Chen
University of Maryland
Trang 7Age Workforce: A Use-Inspired Approach
Eby/Allen: (2012) Personal Relationships: The Effect on Employee Attitudes, Behavior, and Well-being
Goldman/Shapiro: (2012) The Psychology of Negotiations in the 21st Century Workplace: New Challenges and New Solutions
Ferris/Treadway: (2012) Politics in Organizations: Theory and Research Considerations.
Jones: (2011) Nepotism in Organizations
Hofmann/Frese: (2011) Error in Organizations
Outtz: (2009) Adverse Impact: Implications for Organizational Staffing and High Stakes Selection
Kozlowski/Salas: (2009) Learning, Training, and Development in Organizations Klein/Becker/Meyer: (2009) Commitment in Organizations: Accumulated Wisdom and New Directions
Salas/Goodwin/Burke: (2009) Team Effectiveness in Complex Organizations
Trang 8De Dreu/Gelfand: (2008) The Psychology of Conflict and Conflict Management in Organizations
Ostroff/Judge: (2007) Perspectives on Organizational Fit
Baum/Frese/Baron: (2007) The Psychology of Entrepreneurship
Weekley/Ployhart: (2006) Situational Judgment Tests: Theory, Measurement and Application
Dipboye/Colella: (2005) Discrimination at Work: The Psychological and Organizational Bases.
Trang 10known or hereafter invented, including photocopying and recording, or in anyinformation storage or retrieval system, without permission in writing from thepublishers
pages cm — (The organizational frontiers series)
Includes bibliographical references and index
1. Organizational behavior. 2. Big data. 3. Organizational sociology. 4. Psychology, Industrial. I. Tonidandel, Scott. II. King, Eden. III. Cortina, JoseM
HD58.7.B534 2016
302.3'5—dc23
Trang 11ISBN: 978-1-84872-581-2 (hbk)ISBN: 978-1-84872-582-9 (pbk)ISBN: 978-1-31578-050-4 (ebk)Typeset in Minion
by Apex CoVantage, LLC
Trang 12For the questions, scientists, and loved ones who inspire us.
Trang 15It is difficult to come up with a better example of a topic that addresses the needs
of both scientists and practitioners who are interested in the human condition inwork organizations than the one that is central to this volume As pointed out bythe editors, our field is already affected by the work of others claiming expertise
in big data analytics Policy and managerial decisions are being made based onthe analysis of enormous pools of data recently available to us as a result ofinformation technology that allows us to capture and store judgments andchoices being made by millions of people on a daily basis But there remains amajor gap between what we know and don’t know about the best approaches tosuch things as human data acquisition, data storage and analysis, the mostappropriate conceptual frameworks to be used, mitigating the weaknessesinherent in post hoc interpretations, or how best to scope out the ethicalchallenges (and limits) inherent to big data analytics This gap is especiallyproblematic when it comes to understanding the potential role of big dataanalytics in promoting the welfare of people working in organizations or thesuccess of the company that employs them The chapters of this volume havebeen written with these in mind For those who have an immediate need to applywhat we do know, there is still much to glean from a close reading of thisvolume On the other hand, for those who seek better answers (or who may even
be skeptical about the benefits of the big data analytic movement), the book’scontributors provide much to build on Finally for those who choose to interpret
or intermediate the data analytics efforts of those in other, related disciplines(e.g., labor economics), there is much by way of excellent guidance provided.Indeed, even if only some of the goals for the volume as set out by the editors intheir introductory chapter are achieved, our field in the future will besubstantially better off In this regard, I am reminded of the saying: “The bestway to predict the future is to shape it.” Along these lines I hope that this volumestimulates the reader to accept such a premise and play a position when it comes
to shaping the science and methods of big data people analytics
Trang 16Richard KlimoskiSeries EditorJanuary 9, 2015
Trang 17“Work-force science, in short, is what happens when big data meets H.R… In the past, studies of worker behavior were typically based on observing a few hundred people at most Today, studies can include thousands or hundreds of thousands of workers, an exponential leap ahead.”
It is our view that I-O psychologists are poised at an opportune moment inhistory to leverage our knowledge of people, work, and quantitative methods toserve as ambassadors, interpreters, and translators between computer scientistsand business clients We are uniquely trained to help decipher and make sense ofbusiness-related data patterns from the lens of psychological science We furtherargue that organizational psychologists may also be uniquely suited to addressquestions of privacy, ethics, and “dustbowl empiricism” that emerge indiscussions of big data Thus, this volume strives to accomplish two importantgoals: (1) to review critical issues in collecting, analyzing, communicating, and
Trang 18theorizing about big data, and (2) to ignite rigorous scholarship on big data inorganizations.
To fulfill these objectives, this book is organized into two primary sections Thefirst section deals with technical and methodological aspects of big data (e.g.,collecting, analyzing, warehousing, integrating, and visualizing) and the secondaddresses topical content areas where big data might be well positioned tocontribute to paths of future inquiry (e.g., selection, teamwork, and diversity).Here we set the stage for these ideas by introducing a general definition of bigdata and generating ideas about opportunities for its integration in I-Opsychology We also describe potential practical and conceptual challenges thatare brought about by big data In the context of these descriptions, weforeshadow the chapters that follow; we briefly summarize the ways in whicheach chapter responds to the practical, conceptual, and substantive challenges andopportunities of big data Altogether, these chapters will describe how advances
in data science have the ability to fundamentally influence and improveorganizational science and practice
Trang 19Big data can be understood with regard to three primary characteristics (the
“three V’s”; Laney, 2001): (1) volume—large number of data points, (2) velocity
—both the throughput of the data (amount being added constantly) and the latency in using this information, and (3) variety—multiple sources of data being
integrated Organizational psychologists may encounter data that fulfill all three
of these key factors, but our interpretation is broader—big data in organizationalsciences might not necessarily include all three of these characteristics Moreover,
we don’t believe that any particular amount of each V defines big data Rather,data become big data when the different V’s force you to think about and interactwith your data differently For example, the most central component of big data
in most peoples’ minds is volume But there is no single sample size that qualifies
as big data The volume of data that we might deal with would most likely notreach the level of computer science applications (hundreds of terabytes), but highvolume instead might be data sets that overwhelm commonly availablecomputing resources and require nontraditional analytic procedures Similarly,the actual sample size itself might not be very large, but there is big data volumebecause data for a large number of variables are being collected for eachindividual (think moment by moment performance or location data) that can’t beanalyzed using traditional data reduction techniques Or the volume of the datacould be manageable, but the high velocity of the data forces us to abandon ourtheories and methods that were developed for more static studies In each of theseinstances, we enter the realm of big data because the situation created by the V’srequires us to reconsider our science, to apply new theories and methods, and toask new and different questions of the data that were not previously possible.Further, the emergence of these statistical and data management approachesallows us to apply new methods to old problems and potentially gain additionalinsight That is, we may find novel and useful insights by applying newtechniques to data sets that could be analyzed using standard methods—thusexpanding overall the universe of insights we can bring to bear on the world ofwork
Trang 20What Are the Emerging Opportunities for Science and Practice?
What could it mean for the study and practice of organizational psychology if wehad access to varied and dynamic data? How can we apply new analyticstrategies to understand workplace dynamics in more nuanced ways? What could
we learn and how could we enhance organizational effectiveness and employeewellness? This is the world of big data, which represents an opportunity to buildour science and expand the impact of our discipline Here we hope to igniteinterest in this topic by brainstorming about the major areas of scholarship andpractice in organizational psychology that could be explored, expanded, andimpacted though big data Testing of models in these areas—a small sample ofwhich are listed in Table 1 and discussed in the second section of the book—might be facilitated through new data and techniques
Emerging Tools and Potential Applications
In this section, we describe several new tools and sources of data that can beleveraged to build big data and organizational science: sociometric sensors, socialmedia data and sentiment analyses, microexpression analyses, andpsychophysiological measures This is not an exhaustive list, but rather apreliminary set of data sources that (especially in combination) might offer newinsights into I-O psychology These and other tools, through complementaryinductive and deductive approaches, allow new questions and ideas to begenerated
- Individually tailored training experiences
- Objective indicators of transfer of training
Trang 21Occupational
Health - Preventative identification of health or safety risk factorsor behaviors
- Adaptive gamification to motivate employee healthbehavior
- Exploration of policies, practices, signals, and traditionsthat comprise family-friendly cultures
communication, coordination, and friendship networks
- Identifying barriers through sentiment ormicroexpression analysis of intergroup communicationFuture
Horizons - The influence of global or community factors/events onemployees and organizations
Sociometric sensors Sociometric sensors are wearable technology that cancollect a wide range of information automatically from users and individualsaround them These devices exploit the fact that many people are alreadycomfortable with wearable electronics, such as cell phones, digital watches,pedometers, and the emerging device category around personal biometrics such
as Fitbit and Google Glass These devices have a number of benefits overtraditional observational data collection methods and can replace costly humanobservation, which is susceptible to subjective biases and memory errors A
Trang 22variety of highly accurate data can be available such as nonlinguistic socialsignals (e.g., interest, excitement, influence) and relative location monitoring.Indeed, such sensors are being used to investigate a host of phenomena inorganizational behavior For example, activity and number of team interactionshave been shown to be related to creativity (Tripathi & Burleson, 2012) Similarly,Olguín-Olguín and Pentland (2010) found that activity level and interactionpatterns as measured by sociometric sensors predicted success by teams in anentrepreneurship competition These same sensors have also been used in fieldstudies to measure inter-team collaboration patterns as well as integrationprocesses in multicultural teams (Kim, McFee, Olguín, Waber, & Pentland, 2012).
In their chapter on teamwork, Kozlowski Chao, Chang, and Fernandez describethe initial stages of team-based research that leverages this kind of technology toassess team process dynamics
Social media data, text analysis, and sentiment analyses An obviousarea of focus in the world of big data is social media Social media includewebsites and applications that enable users to create and share content orparticipate in social networking The content of this electronic communication is
a treasure trove of psychologically relevant information about people, theirrelationships, and their behavior Analyses might involve simply trackingpatterns of viewing or clicking, time spent in different virtual spaces, or socialnetwork patterns such as who is interacting with whom According to IBM(2015), social media analytics are designed to “help organizations understand andact upon the social media impact of their products, services, markets, campaigns,employees and partners.” For employers, of course, social media can take on adifferent importance Social media activity can give clues to employeeengagement and warn of exit behavior Recruiters can and do review social mediainformation to find and vet candidates Reviews on sites like Glassdoor affectemployment branding and thereby influence the recruiting process oforganizations
Social media may be particularly informative through the lens of sentimentanalyses Sentiment analyses go beyond simple counts of frequency of clicks toanalyze the content of what is spoken or written For example, sentiment analysiscould be used to examine the positive or negative content of tweets or to analyze
an email to determine whether its author is happy, frustrated, or sad More
Trang 23sophisticated forms of sentiment analyses use deep learning models to representfull sentences and capture the contexts around which particular language is used.The potential of big data sentiment analyses on business-relevant constructs isfurther evidenced by the chapter on Twitter analysis (Hernandez, Newman, &Jeon), which develops and applies a word count dictionary representing jobsatisfaction to a Twitter feed of over one million tweets per day This is anexciting area, with emerging firms applying real-time methodologies like naturallanguage processing that can recognize sarcasm and emotional nuances (e.g.,Kanjoya) and connecting specific text strings and properties to outcomes (e.g.,Textio) The leading players in this space are combining sophisticated algorithmswith powerful computing and elegant visualization, and they are driving specific,measurable business outcomes.
Microexpression analyses Another exciting tool with a range of potentialapplications involves microexpression analyses Microexpressions can beunderstood as representations of brief and unconscious reactions to stimuli thatcannot be masked but can be detected through careful observation (Ekman, 2009).The original facial action coding system was first published in 1978 and involvedintensive ongoing training procedures and coding schemes Technology hasadvanced to the point that these codings can be programmed and used toautomatically assess genuine reactions and responses to stimuli (Shreve,Godavarthy, Goldgof, & Sarkar, 2011) This advancement has clear applicationsfor law enforcement (i.e., detection of lies, hostility, and dangerous demeanor),but being able to objectively assess genuine human emotion can also be useful tophenomena relevant to I-O such as selection, decision-making, and leadership(see Barsade, Ramarajan, & Westen, 2009) The ability to link microexpressionmeasurement to specific organizational stimuli may lead to insights into effectivemanager behavior and change management strategies, or to new approaches tomeasuring and managing employee engagement
Neuro/psychophysiological tools A wide set of tools has been developed todetect subtle changes in physiological reactions to stimuli such as brainactivation, heart rate, and hormonal variation This might include EEGs, bloodpressure or heart rate monitors, automatic hormone testers, and FMRI imaging(see Becker, Cropanzano, & Sanfey, 2011) This emerging set of technologies hasmuch to tell us about the inner workings of the brain as well as indicators of
Trang 24health and fitness that have relevance for occupational health psychology Theseare not new but are growing in production, shrinking in size and cost, andincreasing in their potential applications for understanding workplace dynamics.The rise in personal biometrics systems opens the door to scalable data captureand enables fascinating new analyses As offerings in this segment evolve toinclude biometric measurements and connectivity to smartphones and otherdevices, new opportunities emerge for real-time linkage analyses.
Novel Questions Enabled by Big Data
Perhaps the most highly touted advantage of big data is that it will allow us toanswer old questions in more comprehensive ways We must not forget, however,that big data may also allow (or force) us to reconceptualize phenomena that wehave been studying for decades Consider the findings of Ilies, Scott, and Judge(2006) that citizenship varies substantially within person, and that this variancecan be explained in part by job attitudes (again, within person) Regardless ofwhether one considers the experience sampling approaches used in their study to
be big data or not, there is no denying that the results suggest an entirelydifferent citizenship phenomenon (i.e., some days I’m a good citizen, some daysI’m not) from the one that appears in previous work (i.e., I’m a good citizen, youaren’t) As Cortina and Landis (2008) put it, “… these results call into questionalmost all of the previous research on this topic” (p 303) Which phenomenamight we understand in a completely different way after focusing big dataapproaches on them?
In this section, we briefly describe knowledge that might be gained throughthe lens of big data—using the methods described above—across topics inorganizational science We chose three examples from a broader list of potentialtopics (see Table 1) that are not covered elsewhere in the book but nonethelessexemplify the potential of big data to inspire scholarship
Example 1: Work-family The intersection of work and family, and the ways
in which involvement in both spheres can be mutually enhancing or depleting, isinherently a phenomenon that takes place in dynamic times and places Scholarshave begun to employ event- and time-contingent experience sampling methods(e.g., daily diaries) to questions about the ongoing decisions that people make to
Trang 25prioritize work or family and the immediate affective and physical consequences(e.g., Shockley & Allen, 2013) The tools and sources of big data could take thisapproach—and with it, our understanding—substantially further Sensors withgeospatial and relational trackers or health monitoring devices, for example,could provide incredibly rich, detailed, and objective observations of thebehavioral patterns and corresponding physical consequences for men andwomen across work and family divides Knowing where and with whom menand women with different family and work obligations spend their time mayallow us to isolate the individual behaviors and decisions that lead to work-family conflict and balance, which in turn could have transformative effects ondescribing, predicting, and preventing work/life conflict more generally.
Example 2: Training The proliferation of Internet- and/or computer-basedtraining systems opens up the possibility for genuinely adaptive training systems.Algorithms could be constructed that—like computer adaptive testing—adaptautomatically to the needs of learners and tailor their experiences appropriately
To the extent that these adaptive responses are consistent with empiricallysupported theories of learning, and potentially even tailored to individualdifferences such as goal orientation, training outcomes could be maximized Thiscould be accomplished through adaptations in the delivery modality or on theactual content delivered That is, big data analyses could use techniques such asrandom forests analysis (see Chapter 3 by Oswald and Putka, this volume) toidentify and automatically generate effective combinations of content, form,practice, and assessment that generate optimal training outcomes In essence, bigdata give us the opportunity to move from rigid, prescriptive approaches to moreagile strategies that capitalize on equifinality These approaches may expand theabsolute quantity of talent available in the marketplace by creating a greaterdevelopment in a larger population of workers
Example 3: Performance management There are also real areas of potential
in performance management, a context wherein real-time, continuous monitoringand automated feedback can be particularly useful Current theories ofperformance tend to reflect the reality that formal performance appraisal, where
it exists, occurs quite seldom (e.g., twice per year), and is necessarily general innature What would we learn about performance, its prediction, and itsmanagement if it were measured at a much more molecular level on an ongoing
Trang 26basis? In one study using wearable sociometric sensors, co-located anddistributed team members were provided visual cues that represent the relativecontributions of each team member to the task discussion (Kim, Chang, Holland,
& Pentland, 2008) These cues motivated behavioral changes: teams engaged inmore collaborative interactions and communication balance after the visual cuesthan before Thus, the immediate feedback enabled by big data (its collection,computation, and visualization) enabled higher performance teams And ofcourse, this raises new questions about the tradeoffs between resource allocation
to the task and resource allocation to the processing of feedback, a question thatseems irrelevant when feedback only occurs once or twice per year
These examples complement the much more detailed descriptions of big dataapplications in selection and assessment (Illingworth, Lippstreu, & Deprez-Sims),teams (Kozlowski et al.), turnover (Hausknecht & Li), and diversity (BotsfordMorgan, Dunleavy, & DeVries) that comprise the second half of this book
Finally, it is worth noting that one of the significant promises of big data is theopportunity for many small improvements, rather than a few larger ones Whendata collection is laborious, sample sizes are small, many relevant variablesremain unmeasured, and computational power is a scarce resource, aconservative approach focusing on just a few sure bets was important As weevolve to a world where data are ubiquitous, sample sizes are enormous, a nearlyinfinite variety of variables are measured, and computational power is abundant,
it becomes efficient to pursue many small improvements, in essence building amountain out of pebbles (see Schumpeter, 2014)
What Are the Dominant Challenges?
The world of big data, though full of opportunity, is not a panacea Ourenthusiasm must be tempered in light of practical and conceptual concerns thatcannot be overlooked We briefly describe some of the most common challengesthat big data engenders below
Practical Concerns
Analysis The rise of big data poses significant challenges to traditional analytic
Trang 27methods that were developed for single shot studies with a limited set ofvariables and a relatively small number of subjects With samples sizes increasing
by orders of magnitude, our customary reliance on statistical significance testingbecomes obsolete Increased volume in terms of larger sets of variables creates aneed for nonstandard data reduction techniques, as established methods becomeintractable with many, many columns of data Standard regression practices thattypically rely on a well-defined set of variables or attempt to identify the bestmodel from a larger set of variables are outmoded, as they fail to leverage all ofthe information available Despite these obstacles, numerous advances incomputer science and other domains can be successfully applied to theorganizational sciences to provide better answers Oswald and Putka review anumber of these modern techniques
A related problem arises from the variety of data now available Simplequantifiable metrics like test scores, responses to survey items, or supervisorratings of behavior are being replaced by more varied and complex forms of data.These data may consist of textual data, location data, auditory information, orsocial network graphs Two chapters in the current volume illustrate applications
of these new forms of data to traditional organizational science researchquestions Hernandez, Newman, and Jeon describe the challenges associated withthe novel use of Twitter data to index job satisfaction while Kozlowski et al.describe using wearable sensors to investigate team dynamics
The sheer size and complexity of big data demand these new techniques.However, many of these techniques can be applied beneficially to smaller, lesscomplex data sets as well, allowing new insights For example, machine learningand random forests can be applied where previously we may have usedregression These techniques will yield additional insights As the systems thatapply insights become more sophisticated, some of these more nuanced findingsbecome practical to investigate and apply
The analysis of big data also necessitates a new look at the quality of our data.Another less common V sometimes mentioned as a defining feature of big data is
veracity—to which I-O psychologists would refer as validity Clearly, validity,
defined as the degree to which data allow for appropriate inferences regardingobjects of measurement, is crucial regardless of the size of data In other words,veracity is not a problem unique to big data, which is why we omitted it from our
Trang 28definition Nevertheless, when data contain substantial volume, velocity, andespecially variety, it may be particularly challenging to ensure veracity Theappropriate use of big data requires making use of new statistical techniques toidentify and correct (or discard) questionable data points and to translate thosethat remain into something that allows appropriate inferences regarding variablesand phenomena of interest.
Moreover, our prior notions of measurement quality may need to be expanded
or modified for big data applications Concepts such as test-retest reliability areirrelevant for phenomena that vary from moment to moment, but we as a fieldmay have overrated its relevance for such phenomena because we study them at
a more temporally molar level In contrast, more emphasis may need to be placed
on errors specific to the measurement instrument While surveys are identicalinstruments across individuals, big data measurement technology may not be.Take wearable sensors as an example It is highly probable that these devices aremanufactured such that variability exists in their baseline sensitivity.Furthermore, the sensitivity of any given device may vary with changes in theenvironment (e.g., walking through a doorway, sitting down at a desk) If thesedevices were used to identify team centrality, the individual with the mostsensitive device, either as a matter of technology or as a matter of environment,may be identified as being most central in the network Their device would pick
up more team member signals on more occasions, not because the person isactually more central, but because their device is more successful at measuringthe variable of interest than are the devices of others in the network However,
we believe that, as I-O psychologists, we are ideally situated to contribute to thebig data movement because of our expertise with measurement issues Whilemeasurement quality is seldom mentioned in most big data applications, we canbring to bear numerous theories of measurement, such as generalizability theory,that incorporate error from different sources into overall evaluation ofmeasurement quality
Integration Another challenge posed by the variety of data sourcescomprising big data is integration Employee survey data, human resourceinformation system data, performance management data, and employee socialinteraction data represent some of the data variety that organizations may wish
to leverage in a data analytics application Unfortunately, these different types of
Trang 29data are often housed in completely separate systems with incompatibleinterfaces Moreover, the volume of this data further exacerbates the difficulties
of bringing together these disparate sources into a single data system Asdiscussed earlier in this chapter, much of the most exciting work in big data is inintegrating data from multiple data sources, such as biometric sensors, or fromentirely separate databases, such as financial systems, click data from websites,and so forth Significant challenges are encountered in terms of just accessing thedata necessary for a big data project The chapter by Ryan discusses the myriadissues related to integrating disparate big data sources along with some proposedsolutions for overcoming these difficulties
Interpretation The volume, velocity, and variety of big data also makeattempts to interpret trends over time, individuals, and geospatial indicatorsincredibly complex Traditional approaches to data interpretation are insufficient
in describing big data; new interpretative lenses are essential Two of the currentchapters highlight strategies for data interpretation from the perspective of itsvisualization (Sinar), sonification, and multimodal displays (Stanton).Visualization is another element of analysis that has evolved dramatically withthe rise of big data and that holds significant promise for driving impact Goodvisualization is more than pretty pictures; good visualization will efficientlydisplay multidimensional data and can either complement more traditionalnumerical data displays or stand alone to display data purely graphically.Additionally, many data visualization tools are navigable (they allow drill downwithin the visualization), and many display data as they change in real time.There are a dizzying variety of tools to generate data visualizations; one of thebest known is Tableau Data visualization presents a powerful new, efficientapproach to rapidly assessing information The description of these strategies willaid in the sense-making of business-related data patterns from the lens ofpsychological science This is a critical and often overlooked area, and one whereI-O psychologists can make a differential impact A significant risk with big data
is that findings will be highlighted without adequate or appropriate interpretation
or contextualization, leading perhaps to counterproductive “solutions.” Deepcontent expertise can help in structuring analytic problems in such a way thatfindings are likely to be meaningful and actionable and can help identify whetherfindings are causal or attributable to an unmeasured third variable
Trang 30The question of novelty Although the popular conception is that theseapproaches have just evolved since about 2010, in truth, applying science to datafrom work is old news We have been doing data analytics about workplacedynamics for over a century Clearly, the act of using data to inform the science
of work is not new Yet another chapter in the book (Illingworth et al.) details theways in which big data approaches to selection and assessment differ fromtraditional lenses We have moved from gigabytes to terabytes, from real-timecapture to real-time analysis, and from structured to unstructured data Together,these chapters suggest that while data science is indeed old, big data science isquite new
The death of theory A fear that scholars often raise is whether big data aresimply technologically sophisticated forms of dustbowl empiricism; the mantra
“correlation does not imply causation” is nearly shouted from soapboxes A lesscynical reaction to big data is that an iterative combination of inductive anddeductive approaches can be supported by the voluminous data now available(see chapters in this volume by Putka & Oswald; Kozlowski et al.) Rather than thedeath of theory, big data can be leveraged to improve and enhance our currenttheories Some would argue that many of our theories are somewhat limited andlack relevant variables Moreover, we may not even know what those missingvariables might be The big data analytic approach permits us to test a widervariety of variables that could lead to the identification of new variables thatshould be part of our theories but currently aren’t Thus, while we improve ourpredictive capabilities, we simultaneously improve our theory by incorporatingnew variables, testing out alternative models, and replicating our results Robustmethods such as (massive) field experiments like the one Facebook recentlyconducted on its members to determine whether moods can be contagiousthrough social media (Kramer, Guillory, & Hancock, 2014) are now available as acompelling approach to testing meaningful theories We would argue that theory
is not necessarily dead, but rather that it must be applied thoughtfully todesigning methods and interpreting findings from big data Further, manyorganizations currently have nearly inconceivable volumes of data, but finitetime and energy to devote to culling through it all As a practical matter, simplychoosing a starting place requires some level of working theory As big data
Trang 31analytics evolve as an organizational practice, we are seeing depth of contentexpertise (which is to say, application of theory) as a critical differentiatorbetween those internal research teams that produce the most consistentactionable insights and that struggle more to create impact.
Ethics An important boundary condition of the potential of big data is the set
of ethical issues unique to it Questions around the use of personal information,privacy of individuals, informed consent, and integrity of analysis are oftenraised in discussions of big data Indeed, backlash from the general public to theFacebook study described in the previous section was immense (Goel, 2014).These are real and undeniable concerns that can constrain the opportunitiesavailable In the final chapter of this book, Guzzo discusses initial ethicalguidance for I-O psychologists dealing with big data that builds on principlesdeveloped by the Society for Industrial-Organizational Psychology Ultimately,big data analytics must be held to the same standards of morality as traditionalmethods to be (1) viable and (2) supported in the long term There are twosignificant challenges with this, however First, the realities of current datamanagement make this more difficult and demand more sophistication than wehave required in the past Cloud-based storage and reams of individual-level data,even if anonymized, create inherent privacy risks, and therefore introduce risk ofharm that simply did not exist a decade ago Second, the vast majority of peopleconducting big data research today have come to the work via engineering,mathematics, and computer science channels, rather than psychology Datascience as a field is still maturing, and consistent standards for ethics and privacyprotections have not yet emerged I-O psychology as a discipline can play acritical role in articulating a set of practical standards in in this area
Trang 32As quantitatively minded scientist-practitioners, organizational psychologists areideally situated to help drive the questions and analytics behind big data Yet fewscholars in our field have discussed the specific ways in which the lens of ourscience should be brought to bear on or might benefit from big data We hopethat this book encourages I-O psychologists to raise new questions, use noveltools, design new courses and curricula, build new partnerships, and fully engage
in the world of big data
Trang 33Kim, T., Chang, A., Holland, L., & Pentland, A (2008) Meeting mediator: Enhancing group collaboration
using sociometric feedback In Proceedings of the 2008 ACM Conference on Computer Supported
Cooperative Work, 457–466 Retrieved from http://dl.acm.org/citation.cfm? id=1460563&picked=prox&CFID=523388114&CFTOKEN=38029144
Kim, T., McFee, E., Olguín, D O., Waber, B., & Pentland, A (2012) Sociometric badges: Using sensor
technology to capture new forms of collaboration Journal of Organizational Behavior, 33(3), 412–427.
Kramer, A D., Guillory, J E., & Hancock, J T (2014) Experimental evidence of massive-scale emotional
contagion through social networks Proceedings of the National Academy of Sciences, 111(24), 8788-8790 Laney, D (2001) 3-D data management: Controlling data volume, velocity and variety META Group
Trang 34Tripathi, P., & Burleson, W (2012) Predicting creativity in the wild: experience sample and sociometric
modeling of teams In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative
Work, 1203–1212 Retrieved from http://dl.acm.org/citation.cfm?id=2145204
Trang 35Part I Big Issues for Big Data Methods
Trang 36Jacqueline Ryan and Hailey Herleman
Industrial and organizational psychologists are very familiar with statisticaltechniques to find patterns in data that either support or fail to support ahypothesis The field is less familiar with mining large datasets for patterns;however, the growth and variety of big data are causing researchers to take aninterest in what can be learned from these datasets One of the main challenges toprogress in the academic literature relates to gaining access to the organizationaldatasets required to mine for patterns and/or test hypotheses In fact at the 2014SIOP conference many presenters in sessions on big data stated, and restated, thattheir largest challenge to really understanding the implications of big data in thefield was finding big data to analyze
Organizations, however, have the opposite problem They have so much dataavailable from both internal and external sources such as transactional systems,events, emails, social media, and sensors (to name a few) that they are challenged
to figure out how to find patterns and insights in the inherent complexity of data.Applied I-O psychologists struggle to supply data for research because figuringout where it is, how to get to it, and how to put it together in some meaningfulform is quite a challenge In order to advance the field and truly understand whatcan be learned from big data, more knowledge is needed about the basics of bigdata management and accessibility from both a methodology and softwareperspective
The goal of this chapter is to share best practices in managing an informationsupply chain with I-O psychologists and other readers so that they can get themost value out of big data These practices originated in big data analyticsdisciplines outside of the realm of human resources and while the HR spacecreates some unique challenges, many of the techniques and lessons learned fromother disciplines apply The following sections will first define big data in theworkforce, and then discuss the challenges around managing and accessing these
Trang 37data Following this, the core capabilities required of a big data platform arediscussed along with considerations for cloud based and local hosting.
Trang 38Big data in the realm of workforce management are not new; however, it is beingcreated and stored in volumes we could not have imagined even ten years ago.What is new is the availability of technology to get value out of the breadth ofdata that exist about employees for use in workforce analytics Let’s look at oneperspective of big data in HR
Data about employees can be categorized into at least five different subgroups:demographic data, compensation data, performance data, behavioral data, andsocial interaction data Business results data could arguably also be included tocreate a sixth category for those roles where revenue can be directly attributed to
a job role Sources for employee data can include survey responses, systems ofrecord, social business platforms, and learning systems, to name a few Taken alltogether, a complete view of the employee (Figure 2.1) can start to be seen
Figure 2.1
Employee data from different interactions within the workforce
Having a complete view of an employee is important because it allows for new
Trang 39Demographic data are unique to a candidate or employee and are transferablethrough an employee’s career Typically these data are captured during the jobapplication process and the onboarding process and managed within a humanresources information system (HRIS) Employee demographic data are essentialfor employers to understand their diversity and compliance profile Examplesinclude:
Employee identification (name, social security number, title)
Employee contact information (address, phone, email)
Education (highest educational level, school, degree, graduation year)
Diversity (gender, age, ethnicity, marital status, birth date & country, veteranstatus—US, disability)
Skills/credentials (languages, credentials, certifications)
Employment history (within company, external to company)
Demographic data, while generally available on all employees, can be some of themost sensitive information to use as there is an expectation, and in certaincountries a legal requirement, to maintain and validate security compliance ofpersonally identifiable information such as a social security numbers Whendemographic data security is properly managed, these data can provide importantworkforce segmentation insights when combined with other data such as socialinteraction data
Compensation data are specific to an employee’s job and employment within
a company These data reflect information about salary and benefits for the work
an employee performs Data in this category are typically quantitative; however,there are also qualitative attributes such as work flexibility, developmentalopportunities, etc Examples include:
Job description (job title, responsibilities, functions)
Compensation (base pay, commissions, overtime pay, bonus, stock)
Benefits (dental, insurance, medical, vacation, leaves of absence, retirement)Rewards (awards, stock options, vacation)
Trang 40Compensation data, like demographic data, are usually maintained in an HRISand are also very sensitive in nature In order to conduct employee research atthe individual level, information such as compensation has to be carefullygoverned Although, as employee compensation often represents the mostsignificant cost to an organization (Society for Human Resource Management,2008), the data patterns surrounding compensation can be critical to understand
in conjunction with other data such as employee performance and employeeengagement
Performance data are accumulated over the lifespan of an employee anddescribe their business impact and objectives achieved relative to their roleswithin the business Examples include:
Behavioral data represent information about an employee’s or candidate’scognitive ability, personality, preferences, and behavioral style It is typically