Rational Inference: A Constrained Optimization Framework 10 Inference Under Limited Information 11 Qualitative Arguments for Rational Inference 11 Probability Distributions: The Object
Trang 2Foundations of Info- Metrics
Trang 4Foundations of Info- Metrics
MODELING, INFERENCE, AND IMPERFECT INFORMATION Amos Golan
Trang 5Oxford University Press is a department of the University of Oxford It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America.
© Oxford University Press 2018
All rights reserved No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by license, or under terms agreed with the appropriate reproduction
rights organization Inquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above.
You must not circulate this work in any other form
and you must impose this same condition on any acquirer.
Library of Congress Cataloging- in- Publication Data
Names: Golan, Amos.
Title: Foundations of info-metrics: modeling, inference, and imperfect
information / Amos Golan.
Other titles: Foundations of info-metrics
Description: New York, NY: Oxford University Press, [2018] |
Includes bibliographical references and index.
Identifiers: LCCN 2016052820 | ISBN 9780199349524 (hardback: alk paper) |
ISBN 9780199349531 (pbk.: alk paper) | ISBN 9780199349548 (updf) |
ISBN 9780199349555 (ebook)
Subjects: LCSH: Measurement uncertainty (Statistics) | Inference | Mathematical statistics
‘Information Measurement’ and ‘Mathematical Modeling.’
Classification: LCC T50.G64 2017 |
DDC 519.5/4—dc23
LC record available at https://lccn.loc.gov/2016052820
9 8 7 6 5 4 3 2 1
Paperback printed by WebCom, Inc., Canada
Hardback printed by Bridgeport National Bindery, Inc., United States of America
Trang 6Benjamin Katz, and my children, Maureen and Ben
Trang 8List of Figures xiii
List of Tables xv
List of Boxes xvii
Acknowledgments xix
1 Introduction 1
The Problem and Objectives 1
Outline of the Book 3
2 Rational Inference: A Constrained Optimization Framework 10
Inference Under Limited Information 11
Qualitative Arguments for Rational Inference 11
Probability Distributions: The Object of Interest 12
The Basic Questions 18
Motivating Axioms for Inference Under Limited Information 19
Axioms Set B: Defined on the Inference Itself 20
Axioms Set C: Defined on the Inference Itself 21
Inference for Repeated Experiments 22
Axioms Versus Properties 24
3 The Metrics of Info- Metrics 32
Information, Probabilities, and Entropy 32
Information and Probabilities 37
Information Gain and Multiple Information Sources 43
Trang 94 Entropy Maximization 59
Formulation and Solution: The Basic Framework 60
Information, Model, and Solution: The Linear Constraints Case 60
Model Specification 60
The Method of Lagrange Multipliers: A Simple Derivation 61
Information, Model, and Solution: The Generalized Constraints Case 68
Basic Properties of the Maximal Entropy Distribution 71
Discussion 72
Uniformity, Uncertainty, and the Solution 72
Conjugate Variables 74
The Concentrated Framework 79
Examples in an Ideal Setting 82
Joint Scale and Scale- Free Moment Information 87
Likelihood, Information, and Maximum Entropy: A Qualitative Discussion 87
5 Inference in the Real World 107
Single- Parameter Problems 108
Exponential Distributions and Scales 108
Distribution of Rainfall 108
The Barometric Formula 110
Power and Pareto Laws: Scale- Free Distributions 112
Distribution of Gross Domestic Products 113
Multi- Parameter Problems 114
Size Distribution: An Industry Simulation 114
Incorporating Inequalities: Portfolio Allocation 117
Background 123
A Simple Info- Metrics Model 124
Efficient Network Aggregation 126
6 Advanced Inference in the Real World 135
Interval Information 136
Weather Pattern Analysis: The Case of New York City 140
Treatment Decision for Learning Disabilities 143
Brain Cancer: Analysis and Diagnostics 147
Trang 10The Surprisal 151
Bayesian Updating: Individual Probabilities 154
7 Efficiency, Sufficiency, and Optimality 165
Priors, Treatment Effect, and Propensity Score Functions 222
9 A Complete Info- Metrics Framework 231
Information, Uncertainty, and Noise 232
Formulation and Solution 234
A Simple Example with Noisy Constraints 242
The Concentrated Framework 245
A Framework for Inferring Theories and Consistent Models 249
Examples in an Uncertain Setting 250
Example: Mixed Models in a Non- Ideal Setting 254
Uncertainty 259
Lagrange Multipliers 261
The Stochastic Constraints 262
Trang 11Visual Representation of the Info- Metrics Framework 264
Adding Priors 268
10 Modeling and Theories 281
Core Questions 282
Basic Building Blocks 284
Incorporating Priors 286
Validation and Falsification 286
Prediction 288
A Detailed Social Science Example 288
Introducing the Basic Entities 289
Information and Constraints 291
The Statistical Equilibrium 294
Prices, Lagrange Multipliers, and Preferences 297
Priors, Validation, and Prediction 298
Other Classical Examples 300
11 Causal Inference via Constraint Satisfaction 307
Definitions 308
Info- Metrics and Nonmonotonic Reasoning 309
Typicality and Info- Metrics 316
The Principle of Causation 316
Info- Metrics and Causal Inference 318
Causality, Inference, and Markov Transition Probabilities: An Example 319
Inferred Causal Influence 322
12 Info- Metrics and Statistical Inference: Discrete Problems 334
Discrete Choice Models: Statement of the Problem 335
Definitions and Problem Specification 339
The Unconstrained Model as a Maximum Likelihood 340
Trang 12The Constrained Optimization Model 341
The Info- Metrics Framework: A Generalized Likelihood 343
Real- World Examples 345
Tailoring Political Messages and Testing the Impact of Negative
Background on the Congressional Race and the Survey 346
Inference, Prediction, and the Effect of Different Messages 346
Background on Loans, Minorities, and Sample Size 347
Inference, Marginal Effects, Prediction, and Discrimination 348
The Benefits of Info- Metrics for Inference in Discrete Choice Problems 351
13 Info- Metrics and Statistical Inference: Continuous Problems 357
Continuous Regression Models: Statement of the Problem 358
Definitions and Problem Specification 359
Unconstrained Models in Traditional Inference 359
Rethinking the Problem as a Constrained Optimization 361
Specific Cases: Empirical and Euclidean Likelihoods 367
Exploring a Power Law: Shannon Entropy Versus Empirical Likelihood 371
Theoretical and Empirical Examples 373
Information- Theoretic Methods of Inference: Stochastic Moment Conditions 376
Misspecification 386
The Benefits of Info- Metrics for Inference in Continuous Problems 388
Information and Model Comparison 390
14 New Applications Across Disciplines 411
Option Pricing 412
Generalized Case: Inferring the Equilibrium Distribution 416
Implications and Significance 418
Predicting Coronary Artery Disease 418
The Complete Sample 420
Trang 13Out- of- Sample Prediction 423
Sensitivity Analysis and Simulated Scenarios 424
Implications and Significance 425
Improved Election Prediction Using Priors on Individuals 426
Analyses and Results 427
The Data 427
The Priors and Analyses 428
Implications and Significance 431
Predicting Dose Effect: Drug- Induced Liver Injury 432
Inference and Predictions 434
A Linear Model 434
Analyzing the Residuals: Extreme Events 437
Implications and Significance 439
Epilogue 446
List of Symbols 449
Index 451
Trang 141.1 Chapter dependency chart 7
3.1 A graphical illustration of the information, entropy, and
probability relationships for a binary random variable with
probabilities p and 1–p 40
3.2 A simple representation of the interrelationships among entropies
and mutual information for two dependent random variables 48 4.1 A two- dimensional representation of the maximum entropy solution
for a discrete probability distribution defined over two possible events 65
4.2 A geometrical view of maximum entropy 66
4.3 A graphical representation of the log inequality 69
5.1 The distribution for rainfall in southwest England over the period
1914 to 1962 109 5.2 Pareto GDP tail distribution of the 39 largest countries (20% of the
world’s countries) in 2012 on a log- log plot of the distribution versus the GDP in US$ 113
5.3 A graphical illustration of the size distribution of firms in Uniformia
under the a priori assumption that all states are equally likely 116 5.4 Entropy contours of the three assets 120
5.5 Network aggregation from 12 nodes to 7 nodes with simple
weights 127 5.6 Two nonlinear network aggregation examples 129
6.1 A simple representation of the inferred temperature- range joint
distribution for New York City 141 6.2 A higher- dimensional surprisal representation of the New York City
weather results 142 8.1 A geometrical view of cross entropy with uniform priors 198 8.2 A geometrical view of cross entropy with nonuniform priors 199 8.3 The two- dice example, featuring a graphical representation of the
relationship between elementary outcomes (represented by dots) and events 202
9.1 A simple representation of the info- metrics stochastic moments
solution for a discrete, binary random variable 240 9.2 A simplex representation of the info- metrics solution for a discrete
random variable with three possible outcomes 241
Trang 159.3 A simplex representation of the three- sided- die version of the
example 242
9.4 The inferred Lagrange multipliers of two measurements of a six-
sided die as functions of the support bounds for the errors 246 9.5 A simplex representation of the solution to a mixed- theory, three-
event problem 260
9.6 The info- metrics constraints and noise 265
9.7 A two- dimensional representation of the info- metrics problem and
solution 266
9.8 A simplex representation of the info- metrics problem and
solution 267
9.9 A simplex representation of the info- metrics framework and
solution for a discrete probability distribution defined over three possible events and nonuniform priors 270
11.1 The Martian creatures, part I 315
11.2 The Martian creatures, part II 315
13.1 Graphical comparisons of the Rényi, Tsallis, and Cressie- Read
entropies of order α for a binary variable with value of 0 or 1 366 13.2 A comparison of the theoretical Benford (first- digit) distribution
with the ME and EL inferred distributions under two scenarios 374 13.3 A simplex representation of the info- metrics solution for
a linear regression problem with three parameters and ten
observations 382
14.1 Risk- neutral inferred probability distributions of a Wells Fargo call
option for October 9, 2015, specified on September 30, 2015 415 14.2 The predicted probability (gray line) of each patient together with
the correct diagnosis (dark points) of being diseased or healthy 422 14.3 The predicted (out- of- sample) probability (gray) of each patient
together with the correct diagnosis (dark points) of being diseased
14.7 The return levels for the four different doses based on the
info- metrics model of the first stage when dose is treated as
a set of binary variables 438
Trang 166.1 Correct Versus Inferred Multipliers 147
9.1 A Schematic Representation of a Three- State Transition Matrix 251 12.1 Inferred Parameters of Bank A 349
12.2 Inferred Probabilities for Bank A Using a More Comprehensive
Analysis 350 14.1 The Marginal Effects of the Major Risk Factors and Individual
Characteristics for Patients Admitted to the Emergency Room with Some Type of Chest Pain 420
14.2 Prediction Table of Both the Risk Factor Model (Regular Font)
and the More Inclusive Model Using Also the Results of the Two Tests (Italic Font) 421
14.3 Diagnostic Simulation 424
14.4 Prediction Table of the Binary Voting Data Based on the November
Sample 429 14.5 Prediction Table of the Multinomial Voting Data Based on the
November Sample 429 14.6 Inferred Coefficients of the First- Stage Regression of the Drug-
Induced Liver Damage Information 436
Trang 181.1 A Concise Historical Perspective 4
2.1 On the Equivalence of Probabilities and Frequencies 14
2.2 A Simple Geometrical View of Decision- Making for
Underdetermined Problems 16 3.1 Information and Guessing 35
3.2 Information, Logarithm Base, and Efficient Coding: Base 3 Is the
Winner 36 3.3 Information and Entropy: A Numerical Example 41
4.1 Graphical Representation of Inequality Constraints 67
4.2 Temperature and Its Conjugate Variable 75
4.3 Primal- Dual Graphical Relationship 81
4.4 Bayes’ Theorem 85
4.5 Maximum Entropy Inference: A Basic Recipe 89
5.1 Info- Metrics and Tomography 121
5.2 Networks for Food Webs 123
6.1 The Bose- Einstein Distribution 139
6.2 Prediction Table 146
6.3 Brain Tumor: Definitions and Medical Background 148
6.4 Prediction Accuracy, Significance Level, and miRNA 157
7.1 Maximum Entropy and Statistics: Interrelationships Among Their
Objectives and Parameters 170 7.2 Variance and Maximum Entropy 172
7.3 Relative Entropy and the Cramér- Rao Bound 174
7.4 Information, Maximum Entropy, and Compression: Numerical
Examples 183 8.1 Multivariate Discrete Distributions: Extending the Two- Dice
Problem 205 8.2 Size Distribution Revisited: Constructing the Priors 206
8.3 Simple Variable Transformation 216
8.4 Priors for a Straight Line 218
9.1 Incorporating Theoretical Information in the Info- Metrics
Framework: Inferring Strategies 255 10.1 A Toy Model of Single- Lane Traffic 299
11.1 A Six- Sided- Die Version of Default Logic 312
11.2 Markov Transition Probabilities and Causality: A Simulated
Example 323
Trang 1912.1 Die, Conditional Die, and Discrete Choice Models 337 13.1 Rényi’s and Shannon’s Entropies 365
13.2 Three- Sided- Die and Information- Theoretic Methods of
Inference 370 13.3 Constraints from Statistical Requirements 383
13.4 Inequality and Nonlinear Constraints from Theory 383
Trang 20This book is a result of what I have learned via numerous discussions, debates, tutorials, and interactions with many people over many years and across many disciplines I owe thanks and gratitude to all those who helped me with this project I feel fortunate to have colleagues, friends, and students who were willing to provide me with their critiques and ideas
First, I wish to thank Raphael (Raphy) Levine for his contributions to some
of the ideas in the early chapters of the book In fact, Raphy’s contributions should have made him an equal coauthor for much of the material in the first part of this book We had many conversations and long discussions about info- metrics We bounced ideas and potential examples back and forth We sat over many cups of coffee and glasses of wine trying to understand— and then reconcile— the different ways natural and social scientists view the world and the place of info- metrics within that world Raphy also made me understand the way prior information in the natural sciences can often emerge from the grouping property For all of that, I am grateful to him
Special thanks go to my colleague and friend Robin Lumsdaine, who has been especially generous with her time and provided me with comments and critiques on many parts of this book Our weekly morning meetings and discussions helped to clarify many of the info- metrics problems and ideas discussed here Ariel Caticha and I have sat together many times, trying to understand the fundamentals of info- metrics, bouncing around ideas, and bridging the gaps between our disciplines In addition, Ariel provided me with his thoughts and sharp critique on some of the ideas discussed here My good colleague Alan Isaac is the only one who has seen all of the material discussed
in this book Alan’s vision and appraisal of the material, as well as major torial suggestions on an earlier version, were instrumental Furthermore, his careful analysis and patience during our many discussions contributed signif-icantly to this book
edi-My academic colleagues from across many disciplines were generous with their time and provided me with invaluable comments on parts of the book They include Radu Balan, Nataly Kravchenko- Balasha, Avi Bhati, Min Chen,
J. Michael (Mike) Dunn, Mirta Galesic, Ramo Gencay, Boris Gershman, Justin Grana, Alastair Hall, Jim Hardy, John Harte, Kevin Knuth, Jeff Perloff, Steven Kuhn, Sid Redner, Xuguang (Simon) Sheng, Mike Stutzer, Aman Ullah, and John Willoughby Min Chen also provided guidance on all of the graphics incorporated here, much of which was new to me I also thank all the students
Trang 21and researchers who attended my info- metrics classes and tutorials during the years; their questions and critique were instrumental for my understand-ing of info- metrics My students and colleagues T. S Tuang Buansing, Paul Corral, Huancheng Du, Jambal Ganbaatar, and Skipper Seabold deserve spe-cial thanks Tuang helped with all of the figures and some of the computational analysis Huancheng worked on two of the major applications and did some
of the computational work Ganbaatar was instrumental in two of the tions Skipper was instrumental in putting the Web page together, translat-ing many of the codes to Python, and developing a useful testing framework for the codes Paul tested most of the computer codes, developed new codes, helped with some of the experiments, and put all the references together I also thank Arnob Alam, who developed the code for aggregating networks, one
applica-of the more complicated programs used in this book, and Aarti Reddy, who together with Arnob helped me during the last stage of this project
I owe a special thanks to my former editor from Oxford University Press, Scott Parris I have worked with Scott for quite a while (one previous book), and I am always grateful to him for his wisdom, thoughtful suggestions, rec-ommendations, and patience I am also thankful to Scott for sticking by me and guiding me during the long birth process of this book I also thank David Pervin, my new editor, for his patience, suggestions, and effort Finally, I thank Sue Warga, my copyeditor, for her careful and exceptional editing
I am grateful for the institutions that hosted me, and for all the resources
I have received in support of this project I am indebted to the Info- Metrics Institute and its support of some of my research assistants and students
I thank the Faculty of Science at the Hebrew University for hosting me twice during the process of writing this book I thank Raphy Levine for sharing his grant provided by the European Commission (FP7 Future and Emerging Technologies— Open Project BAMBI 618024) to partially support me during
my two visits to the Hebrew University I thank the Santa Fe Institute (SFI) for hosting me, and partially supporting me, during my many visits at the Institute during the last three years I also thank my many colleagues at SFI for numer-ous enchanting discussions and for useful suggestions that came up during
my presentations of parts of this book I thank Pembroke College (Oxford) for hosting me a few times during the process of writing this book I am also grate-ful to Assen Assenov from the Center for Teaching, Research and Learning at American University for his support and help
Special thanks to Maureen and Ben for their contributions and edits
Trang 22Foundations of Info- Metrics
Trang 24Introduction
Chapter Contents
The Problem and Objectives 1
Outline of the Book 3
References 7
The Problem and Objectives
The material in this book derives from the simple observation that the ble information is most often insufficient to provide a unique answer or solu-tion for most interesting decisions or inferences we wish to make In fact, insufficient information— including limited, incomplete, complex, noisy, and uncertain information— is the norm for most problems across all disciplines.The pervasiveness of insufficient information across the sciences has resulted
availa-in the development of disciplavaila-ine- specific approaches to dealavaila-ing with it These ferent approaches provide different insights into the problem They also provide grist for an interdisciplinary approach that leverages the strengths of each This
dif-is the core objective of the book Here I develop a unified constrained
optimi-zation framework— I call it info- metrics— for information processing, modeling,
and inference for problems across the scientific spectrum The interdisciplinary aspect of this book provides new insights and synergies between distinct scien-tific fields It helps create a common language for scientific inference
Info- metrics combines the tools and principles of information theory, within a constrained optimization framework, to tackle the universal prob-lem of insufficient information for inference, model, and theory building
In broad terms, info- metrics is the discipline of scientific inference and cient information processing This encompasses inference from both quanti-tative and qualitative information, including nonexperimental information, information and data from laboratory experiments, data from natural
Trang 25effi-experiments, the information embedded in theory, and fuzzy or uncertain information from varied sources or assumptions The unified constrained optimization framework of info- metrics helps resolve the major challenge to scientists and decision- makers of how to reason under conditions of incom-plete information.
In this book I provide the mathematical and conceptual foundations for info- metrics and demonstrate how to use it to process information, solve problems, and construct models or theories across all scientific disciplines
I present a framework for inference and model or theory building that copes with limited, noisy, and incomplete information While the level and type of uncertainty can differ among disciplines, the unified info- metrics approach efficiently handles inferential problems across disciplines using all availa-ble information The info- metric framework is suitable for constructing and validating new theories and models, using observed information that may be experimental or nonexperimental It also enables us to test hypotheses about competing theories or causal mechanisms I will show that the info- metrics framework is logically consistent and satisfies all important requirements
I will compare the info- metrics approach with other approaches to inference and show that it is typically simpler and more efficient to use and apply.Info- metrics is at the intersection of information theory, statistical meth-ods of inference, applied mathematics, computer science, econometrics, complexity theory, decision analysis, modeling, and the philosophy of sci-ence In this book, I present foundational material emerging from these sciences as well as more detailed material on the meaning and value of information, approaches to data analysis, and the role of prior information
At the same time, this primer is not a treatise for the specialist; I provide a discussion of the necessary elementary concepts needed for understanding the methods of info- metrics and their applications As a result, this book offers even researchers who have minimal quantitative skills the necessary building blocks and framework to conduct sophisticated info- metric analy-ses This book is designed to be accessible for researchers, graduate students, and practitioners across the disciplines, requiring only some basic quantita-tive skills and a little persistence
With this book, I aim to provide a reference text that elucidates the mathematical and philosophical foundations of information theory and maximum entropy, generalizes it, and applies the resulting info- metrics framework to a host of scientific disciplines The book is interdisciplinary and applications- oriented It provides all the necessary tools and building blocks for using the info- metrics framework for solving problems, mak-ing decisions, and constructing models under incomplete information The multidisciplinary applications provide a hands- on experience for the reader That experience can be enhanced via the exercises and problems at the end of each chapter
Trang 26Outline of the Book
The plan of the book is as follows The current chapter is an introductory one
It expresses the basic problem and describes the objectives and outline of the book The next three chapters present the building blocks of info- metrics Chapter 2 provides the rationale for using constrained optimization to do inference on the basis of limited information This chapter invokes a specific decision function to achieve the kind of inference we desire It also sum-marizes the axioms justifying this decision function Despite the axiomatic discussion, this is a nontechnical chapter Readers familiar with constrained optimization and with the rationale of using entropy as the decision function may even skip this chapter
The following two chapters present the mathematical framework ning the building blocks of Chapter 2 Chapter 3 explores the basic metrics of info- metrics; additional quantities will be defined in later chapters Chapter 4 formulates the inferential problem as a maximum entropy problem within the constrained optimization framework of Chapter 2, which is then formulated
underpin-as an unconstrained optimization Chapter 4 also develops the methods of idation to evaluate the inferred solutions
val-The two chapters after that provide a mix of detailed cross- disciplinary applications illustrating the maximum entropy method in action They dem-onstrate its formulation, its simplicity, and its generality in real world settings Chapter 5 starts with a relatively simple set of problems Chapter 6 presents more advanced problems and case studies
Chapter 7 develops some of the basic properties of the info- metrics work It builds directly on Chapter 4 and concentrates on the properties of efficiency, optimality, and sufficiency Chapter 7 fully quantifies the notion of
frame-“best solution.”
Having formulated the basic building blocks, the book moves on to the broader, more general info- metrics framework Chapter 8 introduces the con-cept of prior information and shows how to incorporate such information into the framework This chapter also takes up the critical question of how to con-struct this prior information, and it explores three different routes The first approach is based on the grouping property— a property of the Boltzmann- Gibbs- Shannon entropy, defined in Chapter 3— which is less familiar to social and behavioral scientists The second approach is based on the more obscure concept of transformation groups Finally, the chapter considers empiri-cal priors— a concept that is familiar to social scientists but often misused Chapter 8 places special emphasis on the extension of these ideas to com-mon problems in the social sciences Chapter 9 extends all previous results
to accommodate all types of uncertainties, including model and parameters uncertainties This chapter provides the complete info- metrics framework All applications and specific problems can be modeled within the complete
Trang 27BOX 1.1 } A Concise Historical Perspective
I provide here a brief historical perspective on some of the major research on inference that leads us to info- metrics This background is for historical interest only; it is not needed to understand this book.
The problem of inference under uncertainty is as old as human history Possibly the work of the Greek philosophers and Aristotle ( fourth century BC), where the first known study of formal logic started, led to the foundations for logical inference But not until the seventeenth century were the mathematical foundations of inference under uncertainty formally founded None of this pre- seventeenth- century work extended “to the consideration of the problem: How, from the outcome of a game (or several outcomes of the same game), could one learn about the properties of the game and how could one quantify the uncertainty
of our inferred knowledge of these properties?” (Stigler 1986, 63).
The foundations of info- metrics can be traced to Jacob Bernoulli’s work in the late 1600s He established the mathematical foundations of uncertainty, and
he is widely recognized as the father of probability theory Bernoulli’s work is
summarized in the Art of Conjecture (1713), published eight years after his death
Bernoulli introduced the “principle of insufficient reason,” though at times that phrase is also used to recognize some of Laplace’s work De Moivre and Laplace followed on Bernoulli’s work and established the mathematical foundations of the theory of inference De Moivre’s three books (1718, 1738, 1756) on probability, chance, and the binomial expansion appeared before Laplace arrived on the scene To support himself early on, De Moivre provided tutorials and consulting,
in London’s coffeehouses, for clients interested in learning mathematics and quantitative inference.
Although De Moivre developed a number of groundbreaking results in mathematics and probability theory and practiced (simple) quantitative inference, his work did not have an immediate impact on the more empirical scientists who were interested in knowing how to convert observable quantities into information about the underlying process generating these observables Approximately at
the time the second edition of De Moivre’s Doctrine of Chances was published,
Simpson (1755) and Bayes (1764) engaged (independently) in pushing Bernoulli’s work toward establishing better tools of inference But it was Laplace (1774, 1886), with his deep understanding of the notions of inverse probability and “inverse inference,” that finally laid the foundations for statistical and probabilistic reasoning or logical inference under uncertainty The foundations of info- metrics grew out of that work.
To complete this brief historical note, I jump forward almost two centuries
to the seminal work of Shannon (1948) on the foundations of information theory Jaynes recognized the common overall objectives and the common mathematical procedures used in all of the earlier research on inference and modeling, and understood that the new theory developed by Shannon could help in resolving some of the remaining open questions As a consequence, Jaynes formulated his classical work on the maximum entropy (ME) formalism (1957a, 1957b) Simply stated, facing the fundamental question of drawing
(continued)
Trang 28framework It encompasses all inferential and model construction problems under insufficient information Chapter 9 fully develops the complete inter-disciplinary vision of this book and the complete info- metrics framework The examples throughout the book complement that vision.
Combining the ideas of Chapter 9 with those of the earlier chapters takes us
to model and theory building, causal inference, and the relationship between the two The fundamental problem of model development and theory building
is the subject of Chapter 10 The premise of this chapter is that the info- metrics framework can be viewed as a “meta- theory”— a theory of how to construct theories and models given the imperfect information we have That frame-work provides a rational perspective that helps us to identify the elements needed for building a reasonably sound model That premise is demonstrated via multidisciplinary examples, one of which is very detailed The same build-ing blocks are used to construct each one of these examples This chapter also places emphasis on the idea that a model should be constructed on all of the information and structure we know or assume, even if part of that informa-tion is unobserved In such cases a mechanism for connecting the observable
inferences from limited and insufficient information, Jaynes proposed a generalization of Bernoulli’s and Laplace’s principle of insufficient reason Jaynes’s original ME formalism aimed at solving any inferential problem with
a well- defined hypothesis space and noiseless but incomplete information This formalism was subsequently extended and applied by a large number
of researchers across many disciplines, including Levine (1980), Levine and Tribus (1979), Tikochinsky, Tishby, and Levine (1984), Skilling (1988), Hanson and Silver (1996), and Golan, Judge, and Miller (1996) Axiomatic foundations for this approach were developed by Shore and Johnson (1980), Skilling (1989), and Csiszar (1991) See also Jaynes 1984 and his nice 2003 text for additional discussion.
Though the present book contains minimal discussion of Bayes’ theorem and related Bayesian and information- theoretic methods, the two (Bayes’ theorem and information- theoretic inference) are highly related For a nice exposition, see Caticha 2012 and the recent work of Toda (2012) as well as the original work of Zellner (1988).
Naturally, like all scientific methods and developments, info- metrics grew out of many independent and at times interconnected lines of research and advancements But unlike most other scientific advancements, info- metrics developed out of the intersection of inferential methods across all disciplines Thus, rather than providing a long historical perspective, we move straight into the heart of the book For the history of statistics and probability prior to 1900, see Stigler’s classic and insightful 1986 work For a more comprehensive historical perspective on info- metrics and information- theoretic inference, see Golan 2008, which extends the brief historical thread provided by Jaynes (1978).
BOX 1.1 } Continued
Trang 29information to the unobserved entities of interest must be provided Examples
of such a mechanism are discussed as well
In Chapter 11 the emphasis shifts from model and theory building to causal
inference via constraint satisfaction The term causal inference here is taken
to mean the causality inferred from the available information; we infer that
A causes B from information concerning the occurrences of both The first part of this chapter concentrates on nonmonotonic and default logic, which were developed to deal with extremely high conditional probabilities The sec-ond part deals with cause and effect in a probabilistic way given the informa-tion we have and the inferential framework we use The chapter also provides detailed example that connects some of the more traditional ideas of causal inference to the info- metrics framework
The next two chapters connect the info- metrics framework with more tional statistical methods of inference In particular, they show that the family
tradi-of information- theoretic methods tradi-of estimation and inference are subsumed within the info- metrics framework These chapters use duality theory to con-nect info- metrics with all other methods Chapter 12 concentrates on discrete models In that setting, specific maximum likelihood approaches are special cases of the info- metrics framework Chapter 13 concentrates on continuous models, such as linear and nonlinear regression analysis and system of equa-tions analysis It compares the info- metrics framework with the familiar least- squares technique and other methods of moments approaches for continuous models Chapter 13 also shows that the info- metrics framework can accommo-date possible misspecifications in empirical models Misspecification issues are common across the social and behavioral sciences, where the researcher does not have sufficient information to determine the functional form of the structure to be inferred Chapter 13 also demonstrates, via familiar examples, the trade- offs between functional forms (the constraints in our framework) and the decision function used in the inference To demonstrate this, the chap-ter shows that two different formulations yield the same inferred distribution even though one of the two is misspecified
Chapter 14 provides four detailed, cross- disciplinary applications oped especially for this book These applications represent diverse fields of investigation: the medical sciences, political science, and finance The chapter illustrates the generality and simplicity of the info- metrics approach, while demonstrating some of the features discussed throughout the book Each case study presents the required empirical background, the necessary analytics conditional on the input information, the inferred solution, and a brief sum-mary of its implications
devel-Each chapter includes exercises and extended problems devel-Each chapter ends with a notes section, which summarizes the main references to that chapter
as well as readings on related topics The book is complemented by a website, http:// info- metrics.org, that provides supporting codes and data sets (or links
Trang 30to the data) for many of the examples presented in the book It also provides extended analyses of some of the examples as well as additional examples.
A simple chart illustrating the logical dependencies among the chapters is provided above It shows the flow of the book Though I recommend reading the chapters in order, the diagram helps those who may be more informed or are just interested in a certain topic or problem It also provides the necessary details for instructors and students
References
Bayes, T 1764 “An Essay Towards Solving a Problem in the Doctrine of Chances.”
Philosophical Transactions of the Royal Society of London 53: 37– 418.
Bernoulli, J 1713 Art of Conjecturing Basel: Thurneysen Brothers.
Caticha, A 2012 Entropic Inference and the Foundations of Physics Monograph
com-missioned by the 11th Brazilian Meeting on Bayesian Statistics, EBEB 2012 São Paulo: University of São Paulo Press.
Rationale &
Properties
Core Chapter (size reflects core ‘level’) Multidisciplinary Examples (size reflects complexity level)
Core Framework Modeling &
Estimation ExtendedMultidisciplinary
Examples &
Case Studies
12
3487
FIGURE 1.1 Chapter dependency chart The chart provides the logical flow of the book Though there are many examples throughout the book, the three chapters devoted solely to examples, shown above, can be read in order (see arrows) or at any time after reading the relevant chapters.
Trang 31Csiszar, I 1991 “Why Least Squares and Maximum Entropy? An Axiomatic Approach to
Inference for Linear Inverse Problem.” Annals of Statistics 19: 2032– 66.
De Moivre, A 1718 The Doctrine of Chances London: W Pearson.
— — — 1738 The Doctrine of Chances 2nd ed London: Woodfall.
— — — 1756 The Doctrine of Chances: or, A Method for Calculating the Probabilities of Events
in Play 3rd ed London: A Millar.
Golan, A 2008 “Information and Entropy Econometrics: A Review and Synthesis.”
Foundations and Trends in Econometrics 2, nos 1– 2: 1– 145.
Golan, A., G Judge, and D Miller 1996 Maximum Entropy Econometrics: Robust Estimation
with Limited Data Chichester, UK: John Wiley & Sons.
Hanson, K M., and R N Silver, 1996 Maximum Entropy and Bayesian Methods
— — — 1978 “Where Do We Stand on Maximum Entropy.” In The Maximum Entropy
Formalism, ed R D Levine and M Tribus, 15– 118 Cambridge, MA: MIT Press.
— — — 1984 “Prior Information and Ambiguity in Inverse Problems.” In Inverse Problems,
ed D W McLaughlin, 151– 66 Providence, RI: American Mathematical Society.
— — — 2003 Probability Theory: The Logic of Science Cambridge: Cambridge
University Press.
Laplace, P S 1774 “Mémoire sur la probabilité des causes par les évènemens.” Mémoires de
l’Académie Royale des Sciences 6: 621– 56.
— — — 1886 Théorie analytique des probabilités 3rd ed Paris: Gauthier- Villars Originally
published 1820.
Levine, R D 1980 “An Information Theoretical Approach to Inversion Problems.” Journal
of Physics A: Mathematical and General 13, no. 1: 91.
Levine, R D., and M Tribus, eds 1979 The Maximum Entropy Formalism Cambridge,
MA: MIT Press.
Shannon, C E 1948 “A Mathematical Theory of Communication.” Bell System Technical
Journal 27: 379– 423.
Shore, J E., and R W Johnson 1980 “Axiomatic Derivation of the Principle of Maximum
Entropy and the Principle of Minimum Cross- Entropy.” IEEE Transactions on
Information Theory IT- 26, no 1: 26– 37.
Simpson, T 1755 “A Letter to the Right Honourable George Earl of Macclesfield, President
of the Royal Society, on the Advantage of Taking the Mean of a Number of Observations,
in Practical Astronomy.” Philosophical Transactions of the Royal Society of London
49: 82– 93.
Skilling, J 1988 “The Axioms of Maximum Entropy.” In Maximum- Entropy and Bayesian
Methods in Science and Engineering, eds Gary J Erickson and C Ray Smith; vols 31– 32, of
the series Fundamental Theories of Physics (1988) pp 173– 187 Boston: Kluwer Academic.
— — — 1989 “Classic Maximum Entropy.” In Maximum Entropy and Bayesian Methods,
Cambridge, England, 1988, ed J Skilling, 45– 52 Boston: Kluwer Academic.
Stigler, S M 1986 The History of Statistics: The Measurement of Uncertainty Before 1900
Cambridge, MA: Harvard University Press.
Trang 32Tikochinsky, Y., N Z Tishby, and R D Levine 1984 “Alternative Approach to Maximum-
Entropy Inference.” Physical Review A 30, no 5: 2638– 44.
Toda, A A 2012 “Axiomatization of Maximum Entropy Without the Bayes Rule.” In AIP
Conference Proceedings New York: American Institute of Physics.
Zellner, A 1988 “Optimal Information Processing and Bayes Theorem.” American
Statistician 42: 278– 84.
Trang 33Rational Inference
A CONSTRAINED OPTIMIZATION FRAMEWORK
Chapter Contents
Inference Under Limited Information 11
Qualitative Arguments for Rational Inference 11Probability Distributions: The Object of Interest 12Constrained Optimization: A Preliminary Formulation 15The Basic Questions 18
Motivating Axioms for Inference Under Limited Information 19
Axioms Set A: Defined on the Decision Function 20Axioms Set B: Defined on the Inference Itself 20Axioms Set C: Defined on the Inference Itself 21Axioms Set D: Symmetry 22
Inference for Repeated Experiments 22
Axioms Versus Properties 24
we desire In the second part I summarize four sets of axioms to justify the decision function I argue for in the first part
Trang 34The second part, axioms, is not essential for the mathematical and nical understanding nor for the implementation of the inferential methods discussed in the following chapters If you skip it now, I hope that curiosity will bring you back to it once you have perused a few applications in the coming chapters.
tech-I begin by defining rational inference in terms of an information decision
function, which I call H Then I briefly reflect on four sets of axioms that
pro-vide alternative logical foundations for info- metric problems dealing with inference of probability distributions All four alternatives point to the entropy function of Boltzmann, Gibbs, and Shannon I then discuss an important sub-set of problems: those where an experiment is independently repeated a very large number of times These types of problems are quite common in the natu-ral sciences, but they can also arise elsewhere When a system is governed by a probability distribution, the frequency of observing a certain event of that sys-tem, in a large number of trials, approximates its probability I show that, just
as in other inferential problems I discuss, the repeated experiment setting also naturally leads us to the entropy function of Boltzmann, Gibbs, and Shannon
as the decision function of choice Finally, rather than investigate other sets of axioms leading to the same conclusion, I take the complementary approach and reflect on the basic properties of the inferential rule itself
Inference Under Limited Information
QUALITATIVE ARGUMENTS FOR RATIONAL INFERENCE
I discuss here a framework for making rational inference based on partial, and often uncertain and noisy, information By partial or noisy information,
I mean that the problem is logically underdetermined: there is more than a single inference that can be logically consistent with that information But even in the rare case that we are lucky and there is no uncertainty surrounding the incomplete information we face, there may still be more than a single solu-tion Think, for example, of the very trivial problem of figuring out the ages
of Adam and Eve from the information that their joint age is fifty- three This problem generates a continuum of solutions Which one should we choose? The problem is magnified if there is additional uncertainty about their joint age Again, there is more than a single inference that can be logically consistent with that information In order to choose among the consistent inferences, we need to select an inferential method and a decision criterion
The inferential method for which I argue is grounded in ordinary notions
of rational choice as optimization of an objective, or decision criterion— often called a decision or utility function We optimize that decision function in order to choose a solution for our inferential problem from a set of logically
Trang 35consistent solutions The process of choosing the solution with the help of our decision criterion is called an optimization process We optimize (minimize or
maximize) that decision function while taking into account all of the
informa-tion we know No other hidden informainforma-tion, such as hidden structures, is imposed in the inferential process This optimization process is our character-ization of rational inference
In this chapter, I discuss the logic for using a particular decision function
I also provide details about the inferential method: what the decision function
is optimized on, what I mean by “optimization,” and in what way that zation should be done
optimi-There are other ways of attacking underdetermined problems In ular, we might try to transform the problem with additional resources or assumptions For example, we could collect more data or impose restrictions
partic-on functipartic-onal forms The first optipartic-on is often unrealistic: we must deal with the data at hand Further, with noisy information the problem may remain underdetermined regardless of the amount of information we can gather The second option has an unattractive feature: it requires imposing structure we cannot verify Therefore, I treat all of our inferential problems as inherently underdetermined
PROBABILITY DISTRIBUTIONS: THE OBJECT OF INTEREST
Practically all inferential problems, across all disciplines, deal with the ence of probability distributions These distributions summarize our inferences about the structure of the systems analyzed With the help of these inferred distributions, we can express the theory in terms of the inferred parameters,
infer-or any other quantity of interest Info- metrics, like other inferential methods,
is a translation of limited information about the true and unknown ity density function (pdf) toward a greater knowledge of that pdf Therefore,
probabil-we express any problem in terms of inferring a probability distribution— often
a conditional probability distribution That distribution is our fundamental unobserved quantity of interest
Naturally, we want the data to inform our inferences We specify observed quantities via some functions of the data— such as moments of the data, or other linear or nonlinear relationships We choose functions that connect the unobserved probability distributions to the observed quantities These func-tions are called the constraints; they capture the information we know (or assume we know) and use for our inference
In info- metric inference, the entity of interest is unobserved For example,
it may be a probability (or conditional probability) distribution of istics in a certain species, or a parameter capturing the potential impact of a certain symptom of some disease The observed quantities are usually some expected values, such as the arithmetic or geometric means
Trang 36character-Often, the unobserved quantities are the micro states, while the observed quantities capture the macro state of the system By micro state, I mean the precise details of the entities of interest— the elements of the system studied, such as the exact positions and velocities of individual molecules in a con-tainer of gas, or the exact allocation of goods and means of production among agents in a productive social system By macro state, I mean the values of attri-butes of the population or system as a whole, such as the volume, mass, or total number of molecules in a container of gas A single macro state can corre-spond to many different possible micro states The macro state is usually char-acterized by macro variables, whereas different micro states may be associated with a particular set of values of the macro variables A statistical analogy of these micro- macro relationships may help The micro state provides a high- resolution description of the physical state of the system that captures all the microscopic details The macro state provides a coarser description of lower resolution, in terms of averages or moments of the micro variables, where dif-ferent micro states can have similar moments.
Once the constraints are specified, we optimize a decision function ject to these constraints and other requirements That decision function is
sub-a function of the fundsub-amentsub-al probsub-ability distribution of interest There are other potential decision criteria, some of which I discuss briefly in sub-sequent chapters, but in this chapter I provide the rationale for the choice
of decision function used here Using a fundamental method for finding optima— the tools of the calculus of variations in general, and the variational principle (attributed to Leibniz around 1707) in particular— we employ our decision criterion for inferring probability distributions conditional on our limited information
Our objects of interest are most often unobserved probability distributions
We infer these probabilities using the tools of info- metrics With these inferred probability distributions we can predict, or further infer, any object of interest However, dealing with observed information as the inputs for our inference means that in practice we deal with frequencies rather than with “pure” proba-bilities In our analyses, I do not differentiate among these quantities I provide the reasoning for that in Box 2.1 Briefly stated, we can say that probabilities are never observed We can think of probability as a likelihood— a theoreti-cal expectation of the frequency of occurrence based on some laws of nature Some argue that probabilities are grounded in these laws (See Box 2.1 for a more detailed discussion.) Others consider them to be somewhat subjective or involving degrees of belief Regardless of the exact definition or the research-er’s individual interpretation, we want to infer these probabilities Stated dif-ferently, regardless of the exact definition, I assume that either there is one correct (objective) probability distribution that the system actually has or that there is a unique most rational way to assign degrees of belief to the states of the system These are the quantities we want to infer Frequencies, on the other
Trang 37BOX 2.1 } On the Equivalence of Probabilities and Frequencies
I summarize here the reasoning for conceptually comparing probabilities with frequencies.
One way to interpret probabilities is as likelihoods The likelihood of an event is
measured in terms of the observed favorable cases in relation to the total number
of cases possible From that point of view, a probability is not a frequency of occurrence Rather, it is a likelihood— a theoretical expectation of the frequency
of occurrence based on some laws of nature.
Another, more commonly used interpretation is that probabilities convey the actual frequencies of events But that interpretation holds only for events that occur under similar circumstances (events that arise from the exact same universe) an arbitrarily large number of times Under these circumstances, the likelihood and frequency definitions of probabilities converge as the number of independent trials becomes large In that case, the notion of probability can be viewed in an objective way as a limiting frequency.
Thus, under a repeated experiment setup, I treat probabilities and frequencies similarly This view is quite similar to that in Gell- Mann and Lloyd 1996.
But how should we handle the probabilities of events that are not repeatable—
a common case in info- metrics inference? The problem here is that we cannot employ the notion of probability in an objective way as a limiting frequency; rather, we should use the notion of subjective probability as a degree of rational expectation (e.g., Dretske 2008).
In that case we can relate the notion of subjective probabilities (subjective degree of rational expectation) to that of a subjective interpretation of likelihood Within an arbitrarily large number of repeated experiments (arising from the same universe), our subjective probabilities imply relative frequency predictions But this doesn’t solve our problem completely, as we still may have non- reproducible events So I add the following: For non- reproducible events, probabilities and observed frequencies are not the same But the best
we can do is use our observed information under the strict assumption that the expected values we use for the inference are correct If they are, then we know that our procedure will provide the desired inferred probabilities In that case we can say that our “predicted frequencies” correspond to our “subjective probabilities.”
Given the above arguments, our formulation in the repeated experiments section, and the axiom at the beginning of the axioms section— all observed samples come from a well- defined population (universe) even if that universe
is unknown to us— I treat probabilities and frequencies as likelihoods (even if subjective at times).
For a deeper discussion of this, see the original work of Keynes (1921), Jeffreys (1939), Cox (1946), and Jaynes (1957a and b) and more recent discussions in the work of Gell- Mann and Lloyd (1996, 2003) and MacKay (2003).
Trang 38
hand, are observed They are not subjective We use our inferred probabilities
to update our subjective probabilities using the information we have
CONSTRAINED OPTIMIZATION: A PRELIMINARY
FORMULATION
Simply stated, the basic inferential problem can be specified as follows We want
to infer a discrete probability distribution P associated with the distribution of values of a random variable X, where X can take on K discrete and distinct
values, k= …1, ,K There are two basic constraints on these probabilities: the probability of any particular value is not negative, p k ≥ 0, and the probabilities must assume values such that ∑k p k = 1 The latter is called the normalization
constraint The random variable X is a certain variable of interest with K
mutu-ally exhaustive states such as the number of atoms or molecules in a gas, or viduals’ wealth, or the incidence of a certain disease In a more general setting,
indi-the K possible values of X stand for possible states of a social, behavioral,
phys-ical, or other system, or possibly the set of outcomes of a certain experiment, or even the classification of different mutually exclusive propositions
Let f X m( ), m= 1, ,M, be functions with mean values defined as
synony-mous notations for the expectation operation, and f X m( ) and y m stand for
the expected value of f X m( ) Our inference will be based on M +1 pieces of information about the p k’s, namely, normalization and the M expected val-
ues This is the only information we have In this chapter we are interested in the underdetermined case (the most common case in info- metrics inference), where (M+ <1) K: the number of constraints is smaller (often very much
smaller) than the number, K, of unknown probabilities that we wish to infer.
Suppose we have a differentiable decision function, H( ) P Then our ence problem is amenable to standard variational techniques We need to
infer-maximize (or minimize) H subject to the M +1 constraints The mathematical framework for solving the problem is within the field of calculus, or for contin-uous probability density functions it is within the calculus of variations (or the variational principle), by searching for a stationary value, say the minimum
or maximum, of some function Expressed in mathematical shorthand, the constrained optimization inferential framework is:
Over all probability distributions P
( )
Maximize subjectt to
= 1, ,M;
p and
k k K
Trang 39BOX 2.2 } A Simple Geometrical View of Decision- Making for Underdetermined Problems
Consider the simplest underdetermined problem (top panel of the figure below) of solving for x1 and x2 given the linear condition W=α1 1x +α2 2x where
α1 and α2 are known constants and the value of W is the constraint (See dark
line, with a negative slope, on the top panel of the figure that shows the x1 and
x2 plane.) Every point on that (dark) line is consistent with the condition As the plot shows, there are infinitely many such points Which point should we choose? We need a “decision- maker.” Such a decider can be mathematically
viewed as a concave function H whose contours are plotted in the figure Maximizing the value of the concave function H subject to the linear constraint
W makes for a well- posed problem: at the maximal value of H that just falls on
the constraint we determine the unique optimal solution x1* and x2* (contour C2
in the figure).
Mathematically, the slope of H must equal the slope of the linear function
at the optimal solution The optimal solution is the single point on H where the
linear constraint is the tangent to the contour (I ignore here the case of a corner
solution where only one of the two x’s is chosen and it is positive.) The contours
to the northeast (above C2) are those that are above the constraint for all their values Those that are below C2 intersect with the constraint at more than a single value Only C2 satisfies the optimality condition.
The bottom panel provides a similar representation of an underdetermined problem, but this time it is the more realistic case where the value of the constraint may be noisy The light gray area in between the dashed lines captures this “noisy” constraint (instead of the dark line in the left panel) The noisy constraint can
be expressed as W=α1 1x +α2 2x +ε, where ε represents the noise such that the mean of ε is zero Equivalently, we can express the noisy information as
W+ ∆W =α1 1x +α2 2x , where now ∆W captures the realized value of the noise
Again the solution is at the point where the plane is tangent to H, but this time the
optimal solution is different due to the noise.
SOME EXAMPLES OF OPTIMIZATION PROBLEMS
Economics— Consumer: Given two goods x1 and x2 , the linear line is the budget constraint; the α’s are the prices of each good (the slope of the line is the price
ratio), and H is a preference function (utility function, which is strictly quasi-
concave) The x* ’s are the consumer’s optimal choice.
Economics— Producer: If the x’s are inputs, the linear line is the input price ratio
and H in this case is some concave production function.
Operation Research: Any optimization problem, of any dimension, where we
need to find an optimal solution for a problem with many solutions The H to be
used is problem specific; often it is in terms of minimizing a certain cost function.
Info- metrics: All the problems we deal with in this book where the objective
is to infer certain quantities (e.g., probabilities) from partial information H
is the entropy (as defined in Chapter 3) The x*’s may be the optimal choice of
probabilities.
(continued)
Trang 40FIGURE BOX 2.2 Constrained optimization of an underdetermined problem: a graphical representation The figure shows the optimal solution of solving for x1 and x2 given the linear condition W=α1 1x +α2 2x , where α1 and α2 are known constants and the value of W is the constraint The top panel shows the perfect
case where there is no additional noise The dark line, with a negative slope, is the constraint Every point on that (dark) line is consistent with the condition There are infinitely many such points We use the concave
function H, whose contours are plotted in the figure, to choose the optimal solution The unique optimal
solution x1* and x2* (Contour C2 in the Figure) is at the maximal value of H that just falls on the constraint
The contours to the north- east (above C2) are those that are above the constraint for all their values Those that are below C2 intersect with the constraint at more than a single value Only C2 satisfies the optimality condition The bottom panel provides a similar representation of an underdetermined problem but where the value of the constraint is noisy The light gray area in between the dashed lines captures this “noisy” constraint:
W=α1 1x +α2 2x +ε , where ε represents the noise such that the mean of ε is zero Again, the solution is at
the point where the plane is tangent to H but this time the optimal solution is different due to the noise.