The United States emerged from World War II as the world’s leader in research, after which we initiated an experiment in governmentfunded, civiliancontrolled military research. After 50 years, with the changes in pressures and goals of the Army in transformation and the military taking on multiple roles in modern warfare, it is time to reexamine the management of science and engineering human resources. Herein we argue that the human management resource procedures put in place following Vannevar Bush’s remarkable 1945 report, do not properly take the complex human aspects of research into account. In particular we use the data on such metrics as the number of published articles and citations to those articles to show that the Six Sigma program for managing resources, so successful in manufacturing, is counterproductive in a research environment. Using these data we suggest, using the Pareto Principle, how to overcome the present day distortion in the evaluation of the overall quality of research being done within federal laboratories and by government researchers.
Trang 1Why Six-Sigma Science
is Oxymoronic
Dr Bruce J West, ST
US Army Research Office, AMSRD-ARL-RO
Information Sciences Directorate, AMSRD-ARL-RO-I Research Triangle Park, NC 27709
919-549-4257
DSN 832- 549-4257
bruce.j.west@us.army.mil
Trang 2Summary: The United States emerged from World War II as the world’s leader in
research, after which we initiated an experiment in government-funded, civilian-controlled military research After 50 years, with the changes in pressures and goals of the Army in transformation and the military taking on multiple roles in modern warfare, it is time to reexamine the management of science and engineering human resources Herein we argue that the human management resource procedures put in place following Vannevar Bush’s remarkable 1945 report, do not properly take the complex human aspects of research into account.
In particular we use the data on such metrics as the number of published articles and citations to those articles to show that the Six Sigma program for managing resources, so successful in manufacturing, is counterproductive in a research environment Using these data we suggest, using the Pareto Principle, how to overcome the present day distortion in the evaluation of the overall quality of research being done within federal laboratories and by government researchers.
Trang 31 Science in government
The Army is in the process of transforming itself from what it was in the late 1990’s into the Future Combat Force The intent is for science and technology to make the fighting forces more mobile, lethal and survivable, so they can successfully carry out today’s less conventional missions Under the rubric of network-centric warfare, the Army hoped to use computers and communications networks to connect all weapons, logistics and command networks and give our soldiers and commanders advantages in situational awareness and decision-making This transformation process has been dependent at every stage on science and technology, but it has apparently been assumed that the procedures used for the application of science and technology to this daunting task and the management of the associated human resources would remain essentially unchanged or even become more efficient during this period The arguments presented herein do not address the general problems of managing human resources within the Army and the Department of Defense (DoD), but rather the more restricted set of problems associated with the management of human resources within the Army’s science and technology community and how these problems may have been impacted by the Army’s transformation
Prior to World War II there was no large-scale scientific effort within the federal government, much less within the Army After World War II, Vannevar Bush, who had presided over the government research effort during the war as Director of the Office of Scientific Research and Development (1941-47), responded to a request from President Roosevelt regarding
the transition of war research to the private sector with the report Science-The Endless Frontier
[1] In this now legendary report V Bush argued that the United States needed to retain the scientific advantage achieved during the war years and laid out the reasons for building a civilian-controlled organization for fundamental research with close liaison with the Army and Navy to support national needs and with the ability to initiate basic research
V Bush emphasized that historically scientists have been most successful in achieving breakthroughs when they work in an atmosphere relatively free from the adverse pressure of convention, prejudice, or commercial necessity This freedom from hierarchical structure stands
in sharp contrast to military tradition He believed that it was possible to retain an alternate organizational structure, outside the more traditional military, but working in close collaboration with it Such an organization would foster and nurture science and the application of science to new technologies, through engineering In Bush’s words [1]:
…such an agency … should be … devoted to the support of scientific research… Industry learned many years ago that basic research cannot often be fruitfully conducted
as an adjunct to or a subdivision of an operating agency or department Operating agencies have immediate operating goals and are under constant pressure to produce in a tangible way, for that is the test of their value None of these conditions is favorable to basic research Research is the exploration of the unknown and is necessarily speculative
It is inhibited by conventional approaches, traditions and standards It cannot be satisfactorily conducted in an atmosphere where it is gauged and tested by operating or production standards Basic scientific research should not, therefore, be placed under an operating agency whose paramount concern is anything other than research
His vision materialized through the development of the Office of Naval Research in 1946, the Army Research Office in 1951 (as the Office of Ordnance Research), the Air Force Office of
Scientific Research in 1950 (as the Air Research and Development Command), and the National Science Foundation in 1950; albeit, none of these organizations followed all his suggestions
Trang 4regarding the management of scientific personnel and the support of science These agencies continue to support research conducted on university campuses While these agencies were being established a substantial number of research laboratories were stood up by the services to house government scientists and engineers that would collaborate with academic and industry scientists and engineers The voice of V Bush concerning the incompatibility of fundamental research and mission agencies was prophetic The dire consequence of that incompatibility was held off for 50 years, however, by a set of checks and balances put into place in order to insulate basic research (6.1) from the pressures of applied research (6.2) The separation of the research being supported through the services but done on university campuses (basic) from the research being done within government laboratories (both basic and applied) has maintained, by and large, the separation between fundamental and applied research However, V Bush’s cautionary voice is now being echoed in a report [2] authored by members of the National Research Council Congress directed DoD to have the National Research Council study the nature of the basic research being funded
by the DoD The findings of the report of most relevance to the present discussion are [2]:
A recent trend in basic research emphasis within the Department of Defense has led to a reduced effort in unfettered exploration, which historically has been a critical enabler of the most important breakthroughs in military capabilities… Generated by important near-term Department of Defense needs and by limitations in available resources, there is significant pressure to focus DoD basic research more narrowly in support of more specific needs….The key to effective management of basic research lies in having experienced and empowered program managers Current assignment policies and priorities (such as leaving a substantial number of program managers positions unfilled) are not always consistent with this need, which might result in negative consequences for the effectiveness of basic research management in the long term
Two recent studies [4] focused on the present day efficacy of the 700 laboratories and research centers constituting the Federated Laboratory System John H Hopps Jr., Deputy Director of Department of Defense Research & Engineering and Deputy Undersecretary of Defense in the Department of Defense, introduces the 2002 document with the observation that our “defense laboratories should have the same attributes as our transformed uniformed military forces.” He specifically pointed out that scientific research should share the characteristic of the modularity of the joint forces with the parallel attributes of [5]:
….productivity; responsiveness and adaptability; relevance, programming, and execution and application; and perpetuation of knowledge
In opposition to this notion of modularity it is interesting to recall V Bush’s remarks [1]:
Science is fundamentally a unitary thing….Much medical progress, for example, will come from fundamental advances in chemistry Separation of the sciences in tight compartments….would retard and not advance scientific knowledge as a whole
The variability in research and creativity is fundamentally at odds with uniformity and regimentation One does not create on demand; it is the exploration of new alternatives, going down blind alleys, and even failing that is at the heart of innovative research Again quoting from
V Bush [1]:
Basic research is a long-term process - it ceases to be basic if immediate results are expected on short-term support [1]
So what are the specific management problems in the Army science and engineering community? One of the more significant problems is the age distribution, which is heavily weighted towards the senior ranks with fewer junior scientists and engineers being attracted into and retained by the Army This problem was also identified by V Bush who observed that [1]:
The procedures currently followed within the Government for recruiting, classifying and compensating such personnel place the Government under a severe handicap in competing with industry and the universities for first-class scientific talent
His comments are as true today as they were in 1945
Trang 5A second significant problem is the morale of those scientists and engineers staying within the Army A symptom of this dissatisfaction is the fact that 15-20% of the Army’s senior research scientist (ST) population left the Army in 2004 [5]
I believe we have arrived at the present situation through a failure to properly take into account human nature in the management of scientists and engineers To support this proposition let us examine the evidence regarding the ‘unfair’ nature of social organizations and how this unfairness can and should influence human resource management The evidence I present concerns such apparently disconnected phenomena as the distribution of income in western societies, the distribution of computer connections on the World Wide Web, the frequency or publications of, and citations to, research articles, as well as, the description of other complex phenomena involving the multiple interactions of human beings [6] However, we begin our investigation on more familiar ground, before venturing out onto the frontier, where understanding is rare and useful ideas are few and far between
The modern theory of human resource management can be traced back to Joseph M Juran who wrote the standard reference work on quality control [7] It is his work on management [8] that evolved into the Six Sigma program which is the basis of quality management worldwide The Six Sigma program has achieved a certain currency within the DoD and elsewhere because of its emphasis on metrics and the importance of being able to quantify the problems being addressed within an organization What we propose to establish in the following sections is that the Six Sigma approach to management is fundamentally incompatible with the goals of science Moreover, six sigma is counterproductive when applied to any metrics of the quality of the research done by scientists and engineers
2 Gauss versus Pareto
2.1 Myth of Normalcy
The name Six Sigma is taken from the bell-shaped probability distribution, see Figure 1, constructed by the mathematician Johann Carl Friedrich Gauss (1777-1855), to quantify the variability observed in the results of experimental measurements taken in the physical sciences In the Gaussian worldview the average value of an observable is the single most important variable for characterizing a phenomenon and the fluctuations around that average value are due to errors
of measurement and random fluctuations of the environment The parameter sigma (σ) quantifies these fluctuations and in statistics, sigma is called the standard deviation; the smaller the value of
σ relative to the average, the better the average represents the data Consequently, in this view, the world consists of networks made up of linear additive processes, whose variability is described by a bell-shaped curve, centered on the average value, with a width determined by sigma
The Gauss model is very useful in manufacturing, where the specifications for the production of a widget can be given to as high a tolerance as can be achieved on a given piece of machinery Of course no two widgets coming off the production line are exactly the same, there is always some variability Let us suppose that the widget can be characterized by a diameter and the variation around the average value of the diameter is given by a bell-shaped curve If all the measured diameters are divided by the standard deviation in a given production run then the new variable is expressed in terms of the standard deviation, sigma (σ) In Figure 1 the results of producing this hypothetical widget are shown The attraction of this approach is the universality
of the shape of the distribution when the standard deviation is the unit of measure All processes that have Gaussian or Normal statistics can be superimposed on the given curve A well known property of the Normal distribution is that 68% of all widgets produced fall between plus and
Trang 6minus one sigma; 95% of all widgets produced fall between plus and minus two sigma; 99.7% of all widgets produced fall between plus and minus three sigma and so on
Figure 1: The normal distribution in units of σ centered on the average The typical partitioning for grading in college is indicated with the distribution providing the relative number of students
in each interval
The Six Sigma program maintains that the variability seen in Figure 1 is an undesirable property in the production of widgets and should be eliminated Uniformity of outcome very near the specified tolerance level is the goal of managing the manufacturing process To make the implications of this distribution more transparent let us assume that one million widgets are sold and 99.7% of them are within the specified tolerance limits On its face this would appear to be very good, with the production line functioning at the three-sigma level On the other hand, this also implies that the company will have 3,000 unhappy customers, because they receive widgets that do not satisfy specifications If the widgets were armor vests slated to be used by soldiers in Iraq a three-sigma level would not be acceptable A six-sigma level of 99.9997%, with its approximately 30 faulty vests out of the million shipped would be the more acceptable number
So this is what the Six Sigma program is all about How to take a manufacturing plant, an organization, or any other activity whose output has an unacceptable level of variability and reduce that variability to a six-sigma level; variability is assumed to be bad and uniformity is assumed to be good
When first introduced the six-sigma argument seems to be rather good, with simple but acceptable assumption about the process being measured and ultimately controlled through management After all we have been exposed to arguments of this kind since we first entered college and experienced that every large class was graded on a curve That curve invariably involves the Normal distribution, where as shown in Figure 1 between +1 and -1 lie the grades of most of the students This is the C range, between plus and minus one σ of the class average, which includes 68% percent of the students The next range is the equally wide B and D interval from one to two σ in the positive and negative directions, respectively These two intervals capture another 27% percent of the student body Finally, the top and bottom of the class split the remaining 5% equally between A and F (or is it E?)
I remember that being graded on a curve made me uncomfortable, but I could not put my finger on the reason why Subsequently in upper level courses the classes were small and the problem disappeared At least I did not experience it again until in graduate school I began
Trang 7grading large Freshman Physics classes By that time I knew that grading on a curve was a gross distortion of what the students knew and understood and I even had a theory as to why such grading was wrong I confronted the professor in charge of the course about grading on a curve and explained to him my theory, with the simple arrogance of a graduate student who knows he is right The professor asked a simple question Do you have any data to show that such grading is wrong and that can verify your theory? Of course I did not have such data and therefore I continued grading on a curve
Subsequently I learned that in the social sciences the bell-shaped curve introduced the notion of a statistical measure of performance relative to some goal An organization, or network,
is said to have a quantifiable goal, when a particular measurable outcome serves the purpose of the organization A sequence of realizations of this outcome produces a set of data; say the number of research articles published per month by members of a research laboratory Suppose a laboratory’s ideal publication rate (goal) is specified in some way The actual publication rate is not a fixed quantity, but varies a great deal from month to month If 68% of the time the lab essentially achieves its goal, the lab is functioning at the one-sigma level If another lab has 95%
of its realizations at essentially the ideal rate, it is functioning at the two-sigma level A third lab, one that seems to be doing extremely well, delivers 99.4% of the ideal rate and is functioning at the three-sigma level Finally a six-sigma lab delivers 99.9997% of its publications at, or very nearly the ideal rate [9] What six-sigma means is that the variability in the publication rate has been reduced to nearly zero and based on a linear additive model of the world this ought to be both desirable and achievable The logic of assessing the quality of the laboratory research is the same as that used to assess the quality of the students and relies just as much on the bell-shaped curve
Recently I ran across a paper where the authors analyzed the achievement tests of over 65,000 students graduating high school and taking the University entrance examination of
Universidade Estadual Paulista (UNESP) in the state of Sao Paulo, Brazil [10] In Figure 2 the
results of the entrance exam is recorded for high- and low-income students It is clear that the humanities data in Figure 2a seem to support the conjecture that the Normal distribution is appropriate for describing the distribution of grades in a large population of students The solid curves are the best fits of a Normal distribution to the data, and give a different mean and width between public and private schools In the original publication the data was represented in a variety of ways, such as between the rich and poor, but that does not concern us here, since the results turned out to be independent of representation In Figure 2b the data from the physical sciences is graphed under the same grouping as that of the humanities in Figure 2a One thing that
is clear is that the distribution is remarkably different from the bell-shaped curve Figure 2c depicts the distribution of grades under the same grouping for the biological sciences
The first thing to notice is that the distributions of grades for the biological sciences are more like those in the physical sciences than they are like those in the humanities In fact the distribution of grades in the sciences is nothing like that in the humanities In fact the distributions are so different that if they were not labeled one would not be able to detect that they refer to the same general phenomenon of learning So why does normalcy apply to the humanities and not to the sciences?
One possible explanation for this difference in the grade distributions between the humanities and the sciences has to do with the structural difference between the two learning categories In the humanities is collected a disjoint group of disciplines including language, philosophy, sociology, economics and a number of others relatively independent areas of study
We use the term independent because what is learned in sociology is not dependent on what is learned in economics, but are at most weakly dependent on what is learned in language Consequently, the grades obtained in each of these separate disciplines are essentially independent of one another, thereby satisfying the conditions of Gauss' argument In meeting Gauss' conditions the distribution of grades in the humanities takes on normalcy
Trang 8Figure 2: The distribution of grades for 65,000 student university entrance examination
Universidade Estadual Paulista (UNESP) in state of Sao Paulo, Brazil [10]: (a) humanities; (b)
physical sciences; (c) biological sciences
On the other hand, every science builds on previous knowledge Elementary physics cannot be understood without algebra and the more advanced physics cannot be understood without the calculus, which also requires an understanding of algebra Similarly, understanding biology requires some mastery of chemistry and physics The different scientific disciplines form
an interconnecting web, starting from the most basic and building upward, a situation that violates Gauss' assumptions of independence and undercuts the idea that the average value provides the best description of the process The empirical distribution of grades in science clearly show extensions out into the tail of the distribution with no clear peak and consequently no characteristic, with which to characterize the data Consequently, the average values, so important
in normal processes, become irrelevant in complex networks A better indicator of complex processes than the average is one that quantifies how rapidly the tail is quenched
Trang 9The distinction between the distribution in grades in the humanities and sciences is clear evidence that the Normal distribution does not describe the normal situation The bell curve of grades is imposed through education orthodoxy and by our preconceptions and is not indicative of the process by which student’s master information and knowledge Thus, the pursuit and achievement of intellectual goals, whether in science or engineering, is not normal
Consequently, the Six Sigma program, being crucially dependent on Normal statistics, is not applicable to complex phenomena such as learning or scientific research The variability targeted for elimination by the Six Sigma program under the assumption of normal statistics is invalidated by the long-tailed distribution observed in truly complex networks The tails indicate
an intrinsic variability that is not contained in the simpler processes where normal statistics are valid and working to reduce the tail region may in fact remove the very property that makes the process valuable With this in mind let us turn our attention away from the now discredited distribution of Gauss, at least discredited as being a viable description of the outcome for complex phenomena, and examine the arguments of the first scientist to recognize the existence
of the long tails depicted in Figure 2
2.2 A tale of tails
We need to have at least a preliminary understanding of how human networks operate in order to determine how to manage scientists and engineers In order to achieve this primitive level
of understanding let us sketch how members of a community choose from a large number of options Suppose a large cohort group has a hypothetical set of choices, say a large set of nodes
on a computer network, to which they may connect If we assume that the choices are made independently of the quality of the node, or without regard for the selections made by other members of the group, the resulting distribution in the number of times a given node is selected has the familiar bell-shape1 In this case the selection process is completely uniform with no distinction based on personal taste, peer pressure or aesthetic judgment, resulting in a network of random links between humans and nodes or more generally between individuals
The bell-shaped distribution describes the probable number of links a given node has in a random network Barabási [11] determined such distributions to be unrealistic; that is, using real-world data he was able to show that the number of connections between nodes on the Internet and the World Wide Web deviate markedly from the bell-shaped distribution In fact he, along with others [12,13], found that complex networks in general have inverse power-law rather than bell-shaped distributions The inverse power law was first observed in the systematic study of scientific data on income in western societies as analyzed by the engineer/economist/sociologist Marquis Vilfredo Frederico Damoso Pareto (1848-1923) in the nineteenth century Subsequently, scientists recognized that phenomena described by such inverse power laws do not possess a characteristic scale and referred to them collectively as scale-free, in keeping with the history of such distributions in social phenomena [6]
Pareto worked an engineer until middle age from which he gained an appreciation for the quantitative representations of phenomena On the death of his father he left engineering and took
a faculty position in Lausanne, Switzerland With his collection of data from various countries he became the first person to recognize that the distribution of income and wealth in a society was not random, but followed a consistent pattern This pattern could be described by an inverse power-law distribution, which now bears his name, and which he called “The Law of the Unequal
same, and the former becomes the latter under certain limiting conditions
Trang 10Distribution of Results” [14] He referred to the inequality in his distribution more generally as a
“predictable imbalance” which he was able to find in a variety of phenomena This imbalance is ultimately interpretable as the implicit unfairness found in complex networks
So how do we go from random networks with their average values and standard deviations to networks that are scale free; and more importantly, how does this all relate to the management of scientists and engineers?
Figure 3: A schematic comparison is made between the bell-shaped distribution of Gauss and the inverse power-law distribution of Pareto The vertical axis is the logarithm of the probability and the horizontal axis is the variable divided by σ
Let us examine the distribution of human achievement and consider a simple mechanism that explains why such distributions have long tails, such as shown in Figure 3 In the next section
we discuss the distribution of scientific publications and citations to scientific papers as exemplars of the many inverse power-law networks in the social network of scientists and engineers Achievement is, in general, the outcome of a complex task A complex task or project
is multiplicative and not additive because an achievement requires the successful completion of a number of separate subtasks, and the failure of any one of which would lead to the failure of the project As an example of such a process consider the publication of a scientific paper A partial list of those abilities that might be important for the publication of a paper is: 1) ability to think
up a good problem; 2) ability to work on the problem; 3) ability to recognize a worthwhile result; 4) ability to make a decision as to when to stop and write up the results; 5) ability to write adequately; 6) ability to profit from criticism; 7) determination to submit the paper to a journal and 8) willingness to answer referee’s objections If we associate a probability with each of these abilities, then to some level of approximation, the overall probability of publishing a paper would, based on this argument, be the product of the eight probabilities The central limit theorem applied to such a process would yield a distribution for the successful publication of a paper that
is log-normal or inverse power law at large scales [6] Other more mathematical arguments lead
to inverse power laws throughout the domain of the variate
So how does the inverse power-law distribution affect the evaluation of the scientific and engineering work force?
Assume that a position has become available and a short list of candidates has been compiled Suppose further that there are eight criteria that are being used in the evaluation of a group of candidates, all with ostensibly the same level of professional achievement Using the