1. Trang chủ
  2. » Tất cả

Statistics, data mining, and machine learning in astronomy

8 2 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Statistics, data mining, and machine learning in astronomy
Trường học Penn State University
Chuyên ngành Statistics, Data Mining, and Machine Learning
Năm xuất bản 2006
Định dạng
Số trang 8
Dung lượng 349,43 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Statistics, Data Mining, and Machine Learning in Astronomy Index 1/ f process, 458 1/ f 2 process, 458 α value, 150 absolute deviation, 79 ACF, see correlation functions, autocorrelation active learni[.]

Trang 1

1/f process, 458

1/f2process, 458

α value, 150

absolute deviation, 79

ACF, see correlation functions,

autocorrelation

active learning, 49

Advances in Machine Learning

and Data Mining for

Astronomy (WSAS), 10, 46,

47, 49, 275, 278, 280, 317

Aikake information criterion

(AIC), 134, 352, 406, 432, 442

aliasing, 412

All of Nonparametric Statistics

and All of Statistics: A Concise

Course in Statistical Inference

(Wass10), 9, 69, 85, 123, 128,

134, 199, 243, 251, 254

analytic function, 6

angular frequency, 408

AR(1) process, 464

ARIMA, see autoregressive

models

arithmetic mean, 78; clipped, 94;

standard error of, 83

ARMA, see autoregressive

models

Arnoldi decomposition, 309

associated set, 168

AstroML, 511; installing, 37

astronomical flux

measurements, 15

atmospheric seeing, 410

autocorrelation function, see

correlation functions,

autocorrelation

autocovariance function, 459

Auton Lab, 11

autoregressive models, 461–463;

ARIMA, 463, 465; ARMA,

463, 465; linear, 462

bagging, see bootstrap

aggregating band limited, 412 bandwidth, 378

Bar89, see Statistics: A Guide to the Use of Statistical Methods

in the Physical Sciences

Bayes classifier, 369 Bayes’; rule, 73, 369; theorem,

177, 368 Bayes, Thomas, 175

BayesCosmo, see Bayesian Methods in Cosmology

Bayesian blocks, 442, 451, 465 Bayesian inference, 175

—Bayes factor, 187, 225

—Bayes’ theorem, see Bayes’

theorem

—classifier, 369

—conjugate priors, 184

—consistency principle, 181

—credible region, 179, 185

—empirical methods, 184, 368

—flat prior, 181

—global likelihood, 187

—hieararchical model, 184

—hyperparameters, 184

—hypothesis testing, 180, 188

—improper prior, 181

—indifference principle, 181

—informative priors, 180

—Jeffreys’ odds ratio scale, 187

—MAP estimate, 179

—marginal likelihood, 186

—marginal posterior pdf, 185

—marginalization, 179, 185

—Markov chain Monte Carlo, 230

—maximum entropy principle, 181

—model odds ratio, 186, 225

—model selection, 186, 223

—nonuniform priors, 191

—nuisance parameters, 177, 185

—numerical methods, 229

—Occam’s razor, 189

—parameter estimation, 196; binomial distribution, 206; Cauchy distribution, 208; effects of binning, 215; Gaussian distribution, 196; outlier rejection, 219; signal and background, 213; uniform distribution, 211

—posterior mean, 179

—prior, 176, 177

—prior predictive probability, 178

—priors, 180

—scale-invariant prior, 181

—uninformative priors, 180 Bayesian information criterion (BIC), 134, 190, 352, 406,

432, 442

Bayesian Logical Data Analysis for the Physical Sciences

(Greg05), 9, 105, 182,

184, 231, 232, 243, 408, 413

Bayesian method, 4

Bayesian Methods in Cosmology

(BayesCosmo), 10, 231 Bayesian models, 46 beam convolution, 410 Bernoulli’s theorem, 105 Bessel’s correction, 82 bias, 7, 82

BIC, see Bayesian information

criterion (BIC)

“big O” notation, 44, 45 Boškovi´c, Rudjer, 345 boosted decision tree, 394 boosting, 393, 398, 399 bootstrap, 140, 391 bootstrap aggregating (bagging), 144

Trang 2

Brownian motion, 458

burst signal, 453

c statistic, 147

CAR(1) process, 463, 464

Center for Astrostatistics at

Penn State University, 11

central limit theorem, 105

Cepheid variables, 403, 426

chirp signal, 406, 453

class labels, 135

classification, 8, 145, 368;

Benjamini and Hochberg

method, 147; binomial

logistic regression, 381;

boundary, 146; c statistic,

147; comparison of methods,

397; completeness, 145;

contamination, 145; decision

tree, 399; discriminative, 367,

380, 385, 397; efficiency, 147;

expectation maximization,

see expectation

maximi-zation; false discovery rate,

147; Gaussian Bayes, 374;

Gaussian naive Bayes, 372,

373, 395; generative, see

generative classification;

GMM Bayes classifier, 377,

398; k-nearest-neighbor, 399;

logistic regression, 381, 398,

399; loss, see loss function;

naive Bayes, 136, 371, 372,

399; nearest-neighbor, 378,

399; periodic light curves,

427, 443; RR Lyrae stars, 380;

sensitivity, 147; simple, 145;

supervised, 4, 365, 443;

unsupervised, 3, 250, 365, 443

cluster finding, 249

clustering, 3, 270; K -means,

270; “friends-of-friends”, 275;

comparison of methods, 281;

dendogram, 274; hierarchical,

274; max-radius

minimization, 271; mean

shift, 271; minimum

spanning tree, 275;

unsupervised, 250

clusters of galaxies, 4

Cochran’s theorem, 200

cocktail party problem, 313

code management tools, 13;

CVS, 13; Git, 13; GitHub, 13 color–magnitude diagram, 22 comparable set, 168

completeness, 368, 372, 395 completeness vs purity, 8 compressed sensing, 303 conditional density distribution, 379 conditional independence, 376 confidence estimation, 123 contamination, 368, 372 contingency table, 75 convolution, 407; convolving pattern, 410; of two functions, 409, 410; theorem,

409, 410, 419 coordinate gradient descent, 336

correlation coefficient, 109, 115;

Kendall’s, 116; Pearson’s, 109, 115; population, 109; sample, 109; Spearman’s, 116 correlation functions, 277, 456;

autocorrelation, 407, 456–458, 460, 461;

covariance, 460;

cross-correlation, 460;

discrete correlation, 460;

Edelson and Krolik’s discrete correlation function, 461;

evenly sampled data, 460;

n-point, 278; slot

autocorrelation, 460;

two-point, 277 cosine window, 416 cost function, 131 covariance, 46, 108, 456 covariance matrix, 294 credible region, 179 cross-matching, 47, 54 cross-validation, 144, 164, 352,

355, 379, 390, 392, 398 cross-validation error, 336 cross-validation score, 254

ctypes, see Python/wrapping

compiled code cumulative distribution function, 6

curse of dimensionality, 59, 289

cython, see Python/wrapping

compiled code

damped random walk, 463, 464

data structures; cone trees, 62; cover trees, 62

Data Analysis: A Bayesian Tutorial (Siv06), 9, 181, 182,

208 data cloning, 120, 264 data compression, 299 data mining, 3, 8 data set tools, 14

—fetch_dr7_quasar, 23, 24, 396

—fetch_imaging_sample, 14,

18, 19, 269

—fetch_LINEAR_sample, 29,

440, 442, 443

—fetch_moving_objects, 30, 31, 34

—fetch_sdss_S82standards, 27,

28, 32, 33, 269

—fetch_sdss_specgals, 22, 23,

167, 280, 390, 392, 395

—fetch_sdss_spectrum, 19–21,

425, 426

—fetch_sdss_sspp, 25, 26, 34,

261, 272, 274, 396

—plotting, 31; all-sky distributions, 35; basemap, 37; contour, 32; density, 32; Hammer–Aitoff projection,

35, 36; HEALPix, 37; high dimension, 33; Lambert azimuthal equal-area projection, 36; Mercator projection, 35; Mollweide projection, 36

data sets

—LIGO “Big Dog” data, 16,

416, 417

—LINEAR, 27, 29, 403, 438,

440, 442, 443, 445, 446, 448, 449

—RR Lyrae stars, 365, 372, 374, 376–378, 380, 382, 384–388,

395, 396, 426

—SDSS galaxy data, 21, 23, 167,

280, 390, 392, 395

—SDSS imaging data, 16, 269

—SDSS moving objects, 30, 31, 34

—SDSS photometric redshift data, 394, 395

Trang 3

—SDSS quasar data, 23, 24, 366,

396

—SDSS spectroscopic data, 19,

21, 291, 298–300, 304, 425,

426

—SDSS stars, 25, 26, 32, 34, 425,

426

—SDSS stellar data, 261, 272,

274, 366, 396

—SDSS Stripe 82, 26; standard

stars, 26, 28, 32, 33, 269, 365;

simulated supernovas, 5, 325,

328

data smoothing, 249

data structures; kd-tree, 58, 60;

B-tree, 51, 53; ball-tree, 60,

62; cosine trees, 62;

maximum margin trees, 62;

multidimensional tree, 53;

oct-tree, 57; orthogonal

search trees, 62; partition, 59;

quad-tree, 57–59; trees, 47,

51, 386

data types, 43; categorical, 8, 43;

circular variables, 43;

continuous, 43; nominal, 43;

ordinal, 43; ranked variables,

43

data whitening, 298

decision boundary, 370, 380,

386, 397

decision tree, 386, 388, 389, 398,

399

declination, 16, 18

deconvolution, 407; of noisy

data, 410

degree of freedom, 98

δ Scu, 446

density estimation, 3, 249, 367,

371; Bayesian blocks, 259;

comparison of methods, 281;

deconvolution KDE, 256;

extreme deconvolution, 264;

Gaussian mixtures, 259;

kernel (KDE), 48, 251; kernel

cross-validation, 254;

nearest-neighbor, 257;

nonparametric, 250; number

of components, 264;

parametric, 259

descriptive statistics, 78

DFT, see Fourier analysis,

discrete Fourier transform Dickey–Fuller statistic, 463 differential distribution function, 5

digital filtering, 421 Dijkstra algorithm, 311 dimensionality, 8 dimensionality reduction, 289;

comparison of methods, 316 discriminant function, 369, 375,

384, 395

discriminative classification, see

classification distance metrics, 61 distribution functions, 85

χ2, 96

—Bernoulli, 89, 381

—beta, 101

—binomial, 89

—bivariate, 108; Gaussian, 109

—Cauchy, 92, 459

—exponential, 95

—Fisher’s F , 100

—gamma, 102

—Gauss error, 88

—Gaussian, 87; convolution, 88;

Fourier transform, 88

—Hinkley, 94

—Laplace, 95

—Lilliefors, 158

—Lorentzian, 92

—multinomial, 90

—multivariate, 108; Gaussian,

372, 373; normal, 87; Poisson,

91; Student’s t, 99; uniform,

85; Weibull, 103 DR7 Quasar Catalog, 366 dynamic programming, 47, 228 Eddington–Malmquist bias, 191 Edgeworth series, 160

efficiency, 395 eigenspectra, 298 eigenvalue decomposition, 294

empirical Bayes, see Bayesian

inference empirical pdf, 6–8 ensemble learning, 391, 398 entropy, 389

Epanechnikov kernel, 255, 273 error bar, 7

error distribution, 7, 8

error rate, 367 estimator, 82

—asymptotically normal, 83

—bias of, 82

—consistent, 82

—efficiency, 83

—Huber, 345

—Landy–Szalay, 279

—luminosity function, 166

—Lynden-Bell’s C−, 168

—maximum a posteriori (MAP), 179

—maximum likelihood, 124, 125; censored data, 129; confidence interval, 128; heteroscedastic Gaussian, 129; homoscedastic Gaussian, 126; properties, 127;

truncated data, 129;

minimum variance unbiased, 83; robust, 83; Schmidt’s

1/Vmax, 168; unbiased, 82; uncertainty, 82; variance of, 82

Euler’s formula, 409 expectation maximization (EM), 46, 136, 204, 223, 260, 374

expectation value, 78 exploratory data analysis, 4, 249 extreme deconvolution, 264

f2py, see Python/wrapping

compiled code false alarm probability, 437 false discovery rate, 147 false negative, 145, 368 false positive, 145, 368 false-positive rate, 405 FastICA, 315

FB2012, see Modern Statistical Methods for Astronomy With

R Applications FFT, see Fourier analysis, fast

Fourier transform fingerprint database, 418 finite sample size, 7 Fisher’s linear discriminant (FLD), 375

fitting, 4 flicker noise, 458 Floyd–Warshall, 311

Trang 4

flux measurements,

astronomical, 15

Fourier analysis, 406

—band limit, 521

—Bayesian viewpoint, 433

—discrete analog of PSD, 412

—discrete Fourier transform

(DFT), 410, 521

—fast Fourier transform (FFT),

408, 415, 521; aliasing, 522; in

Python, 500, 523; ordering of

frequencies, 522

—Fourier integrals, 410

—Fourier terms, 465

—Fourier transform, 459;

approximation via FFT, 521;

inverse discrete Fourier

transform , 411; inverse

Fourier transform, 422;

irregular sampling window,

414; regularly spaced Fourier

transform, 414; RR Lyrae

light curves, 406; transform

of a pdf, 409; truncated

Fourier series, 442; window

function, 414

Freedman–Diaconis rule, 164

frequentist paradigm, 123

function transforms, 48

functions; beta, 100;

characteristic, 105;

correlation, see correlation

functions; gamma, 97, 101;

Gauss error, 88; Huber loss,

345; kernel, 251; likelihood,

125; marginal probability, 72;

probability density, 71;

regression, 334; selection, 166

GalaxyZoo, 367

Galton, Francis, 321

Gardner, Martin, 74

Gauss–Markov theorem, 332

Gaussian distribution, see

distribution functions

Gaussian mixture model

(GMM), 46, 259, 377

Gaussian mixtures, 134, 374,

400, 446, 447

Gaussian process regression, 48

generative classification, 367,

368, 397

geometric random walk, 462 Gini coefficient, 154, 389

GMM Bayes classification, see

classification goodness of fit, 132 Gram–Charlier series, 81, 160 graphical models, 46

Greg05, see Bayesian Logical Data Analysis for the Physical Sciences

Guttman–Kaiser criterion, 302 Hadoop, 44

Hanning, 416 hashing and hash functions, 51 Hertzsprung–Russell diagram, 25

Hess diagram, 32 heteroscedastic errors, 460, 465 hidden variables, 135

high-pass filtering, 424 histograms, 6, 163; Bayesian blocks, 228; comparison of methods, 226; errors, 165;

Freedman–Diaconis rule, 164; Knuth’s method, 225;

optimal choice of bin size, 6;

Scott’s rule, 164 homoscedastic errors, 7, 460;

Gaussian, 405, 427

HTF09, see The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Hubble, Edwin, 365 hypersphere, 290 hypothesis testing, 77, 123, 144,

370, 404; multiple, 146 independent component analysis (ICA), 313 inference, 4

—Bayesian, see Bayesian

inference

—classical, 71

—statistical, 123; types of, 123 information content, 389 information gain, 389

Information Theory, Inference, and Learning Algorithms, 10

installing AstroML, 37 interpolation, 412, 501

interquartile range, 81 intrinsic dimension, 63 IsoMap, 311

isometric mapping, 311 IVOA (International Virtual Observatory Alliance), 11 jackknife, 140

Jay03, see Probability Theory: The Logic of Science

Jeffreys, Harold, 175

K nearest neighbors, see

clustering Kaiser’s rule, 302 Kalman filters, 465 Karhunen–Loéve transform, 292

Karpathia, 130 kernel density estimation, 49,

see density estimation

kernel discriminant analysis,

377, 378, 398, 399 kernel regression, 48, 338, 379 knowledge discovery, 3 Kullback–Leibler divergence,

183, 389 kurtosis, 79 Lagrangian multipliers, 182, 294 Landy–Szalay estimator, 279 Laplace smoothing, 372 Laplace, Pierre Simon, 175 Laser Interferometric Gravitational Observatory (LIGO), 16, 403, 415 LASSO regression, 48, 335 learning curves, 356 leptokurtic, 80 LEV diagram, 302 Levenberg–Marquardt algorithm, 341 light curves, 5, 404

LIGO, see Laser Interferometric

Gravitational Observatory likelihood, 125

LINEAR, 16 linear algebraic problems, 46

LINEAR data set, see data sets

linear discriminant analysis (LDA), 374, 376, 381, 398 locality, 47

Trang 5

locally linear embedding (LLE),

3, 307

locally linear regression, 339

location parameter, 78

logistic regression, see

classification

loss function, 345, 367

lossy compression, 303

low signal-to-noise, 465

low-pass filters, 422

lowess method, 340

luminosity distribution, 4

luminosity functions; 1/Vmax

method, 168; C−method,

168; Bayesian approach, 172;

estimation, 166

Lup93, see Statistics in Theory

and Practice

Lutz–Kelker bias, 191

Lynden-Bell’s C−method, 168

machine learning, 3, 4, 8

magic functions, 51

magnitudes, 515; astronomical,

78; standard systems, 516

Mahalanobis distance, 374, 379

Malmquist bias, 191

manifold learning, 47, 306;

weaknesses, 312

MAP, 429, 441

MapReduce, 49

Markov chain Monte Carlo

(MCMC), 46, 231, 451, 453,

454; detailed balance

condition, 231;emcee

package, 235;

Metropolis–Hastings

algorithm, 231, 340; PyMC

package, 233

Markov chains, 465

matched filters, 418, 452, 454,

465

maximum likelihood, see

estimator

maximum likelihood

estimation, 371

McGrayne, Sharon Bertsch, 175

mean, 46

mean deviation, 81

mean integrated square error

(MISE), 131

median, 79; standard error, 84

memoization, 47 Miller, George, 365 minimum component filtering, 424

minimum detectable amplitude, 405

minimum variance bound, 83 misclassification rate, 367

mixtures of Gaussians, see

Gaussian mixture model (GMM)

mode, 79 model comparison, 133 model parameters, 8 model selection, 77, 398, 452

models; Bayesian, 46; Gaussian

mixtures, see Gaussian

mixture model (GMM);

hieararchical Bayesian, 184;

non-Gaussian mixtures, 140;

state-space, 465

Modern Statistical Methods for Astronomy With R

Applications (FB2012), 10,

437, 458, 463 Monte Carlo, 229; samples, 119 Monty Hall problem, 73 morphological classification of galaxies, 365

multidimensional color space, 4 multidimensional scaling framework (MDS), 311 multiple harmonic model, 438

MythBusters, 74 N-body problems, 46, 53

Nadaraya–Watson regression, 338

naive Bayes, see Bayesian

inference nearest neighbor, 47, 49;

all-nearest-neighbor search, 54; approximate methods, 63;

bichromatic case, 54;

monochromatic case, 54;

nearest-neighbor distance, 57; nearest-neighbor search, 53

neural networks, 398–400

no free lunch theorem, 397 nonlinear regression, 340

nonnegative matrix factorization (NMF), 305 nonparametric bootstrap resampling, 437 nonparametric method, 6 nonparametric models, 4, 6 nonuniformly sampled data, 414

null hypothesis, 144 number of neighbors, 379

Numerical Recipes: The Art of Scientific Computing

(NumRec), 8, 50, 120, 135,

141, 151, 156, 162, 408, 415,

418, 422, 424, 435, 436

NumRec, see Numerical Recipes: The Art of Scientific

Computing

Nyquist; frequency, 415, 436, 522; limit, 422;

Nyquist–Shannon theorem, 412; sampling theorem, 412, 521

O(N), 45

Occam’s razor, 189 online learning, 48 optical curve, 448 optimization, 46, 501 Ornstein–Uhlenbeck process, 463

outliers, 80, 83 overfitting, 380, 391

p value, 144

parallel computing, 49 parallelism, 49 parameter estimation, 406, 452; deterministic models, 406 parametric methods, 6, 398 Pareto distribution, 459 Parseval’s theorem, 409

Pattern Recognition and Machine Learning, 10

pdf, 5 periodic models, 405 periodic time series, 426 periodic variability, 465 periodicity, 434 periodograms, 430, 441, 444, 448; definition of, 430; generalized Lomb–Scargle,

Trang 6

438; Lomb–Scargle

periodogram, 426, 430,

434–436, 438, 442, 444, 449,

465; noise, 431

phased light curves, 441, 442

photometric redshifts, 366, 390

pink noise, 409, 458

platykurtic, 80

point estimation, 123

population pdf, 6, 7

population statistics, 78

power spectrum, 407, 409, 430,

454; estimation, 415

Practical Statistics for

Astronomers (WJ03), 9, 69,

424

precision, see efficiency

prediction, 4

principal axes, 111

principal component analysis

(PCA), 3, 49, 292, 444;

missing data, 302

principal component

regression, 337

probability, 69

—axioms, 69; Cox, 71;

Kolmogorov, 70; conditional,

70, 72; density function, 71;

law of total, 71, 72; notation,

69; random variable, 5; sum

rule, 70

probability density, 368

probability density functions, 5,

6

probability distribution, 5, 43

probability mass function, 5

Probability Theory: The Logic of

Science (Jay03), 9, 71, 182

programming languages

—Python, 471

—C, 507

—C++, 507

—Fortran, 37, 507

—IDL, 37

—Python, 12

—R, 10

—SQL (Structured Query

Language), 14–16, 44, 50, 53,

519; where, 17

projection pursuit, 3, 314

PSD, see power spectrum

Python

—AstroML, see AstroML

—further references, 508

—installation, 474

—introduction, 471

—IPython, 473, 486;

documentation, 487; magic functions, 488

—Matplotlib, 473, 494

—NumPy, 472, 488, 498;

efficient coding, 503;

scientific computing, 472;

SciPy, 472, 498; tutorial, 474;

wrapping compiled code, 506

quadratic discriminant analysis (QDA), 375, 376, 398 quadratic programming, 383 quantile, 79; function, 6;

standard error, 84 quartile, 81 quasar, 5 quasar variability, 458, 460, 463, 464

quicksort, 51 random forests, 391, 398, 399 random number generation, 119

random walk, 449, 458, 462, 463 rank error, 63

Rayleigh test, 448

RDBMS, see Relational

Database Management System

recall, see completeness, 368 recall rate, 147

receiver operating characteristic (ROC) curve, 147, 395 red noise, 409, 458 regression, 4, 321

—Bayesian outlier methods, 346

—comparison of methods, 361

—cross-validation, 355; K -fold,

360; leave-one-out, 360;

random subset, 360; twofold, 360; design matrix, 327;

formulation, 322; Gaussian basis functions, 331; Gaussian process, 349; Gaussian

vs Poissonian likelihood, 215; Kendall method, 345;

kernel, 338; LASSO, 335; learning curves, 356; least absolute value, 345; least angle, 336; linear models, 325; local polynomial, 340; locally linear, 339; M estimators, 345; maximum likelihood solution, 327; method of least squares, 326; multivariate, 329; nonlinear, 340;

overfitting, 352; polynomial, 330; principal component, 337; regularization, 332; ridge, 333; robust to outliers, 344; sigma clipping, 345; Theil–Sen method, 345; toward the mean, 150; uncertainties in the data, 342; underfitting, 352

regression function, 369 regularization, 332; LASSO regression, 335; ridge regression, 333; Tikhonov, 333

Relational Database Management System, 44 relative error, 63

resolution, 412 responsibility, 136 ridge regression, 333 ridge regularization, 384 right ascension, 16, 18 risk, 367

robustness, 80 runtime, 45 sample contamination, 405

sample selection, 4 sample size, 8 sample statistics, 78, 81 sampling, 49; window, 414; window function, 414 Savitzky–Golay filter, 424 scale parameter, 78 scatter, 7

SciDB, 44 Scott’s rule, 164 scree plot, 298 SDSS “Great Wall”, 250, 255, 275

searching and sorting, 50, 51

Trang 7

SEGUE Stellar Parameters

Catalog, 366

selection effects, 166

selection function, 8

self-similar classes, 5

sensitivity, see completeness

Shannon interpolation formula,

412

shape parameter, 78

Sheldon, Erin, 13

significance level, 144

Simon, Herbert, 365

sinc-shifting, 412

sine wave, 415

single harmonic model, 405,

427, 433, 435, 438, 465

single-valued quantity, 7

singular value decomposition,

295, 337

singular vectors, 295

Siv06, see Data Analysis: A

Bayesian Tutorial

skewness, 79

Sloan Digital Sky Survey

(SDSS), 15, 250

—Catalog Archive Server

(CAS), 15; CASJobs, 17;

PhotoObjAll, 17; PhotoTag,

17; Schema Browser, 17

—Data Release 7, 15

—Data Release 8, 22

—Data Release 9, 25

—flags, 17

—magnitudes; model

magnitudes, 17; Petrosian

magnitudes, 22; PSF

magnitudes, 17; object types,

17; SEGUE Stellar Parameters

Pipeline, 25; spectroscopic

follow-up, 15; Stripe 82, 15,

32, 372

Sobolev space, 163

software packages; Python, 471;

AstroML, 12, 511; AstroPy,

13; AstroPython, 13; Chaco,

473; CosmoloPy, 13; esutil,

13; HealPy, 13; IPython, 14,

52, 473; Kapteyn, 13; Markov

chain Monte Carlo, 13;

Matplotlib, 12, 473; MayaVi,

473; NetworkX, 473;

Numerical Python, 12, 18,

472; Pandas, 473; PyMC, 13;

Python, 12; Scientific Python,

12, 472; Scikit-learn, 12, 473;

Scikits-image, 473;

Statsmodels, 473; SymPy, 473;

urllib2, 20 sorting, 51 specific flux, 515 spectral window function, 414

spherical coordinate systems, 35 spherical harmonics, 37 standard deviation, 7, 79 state-space models, 465 stationary signal, 452 statistically independent, 8

Statistics in Theory and Practice

(Lup93), 9, 37, 69, 81, 84, 85,

105, 117, 118, 127, 141–143,

176, 208

Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (Bar89), 9,

69 stochastic programming, 48 stochastic time series, 458 stochastic variability, 455 streaming, 48

structure function, 458, 460 sufficient statistics, 199 sum of sinusoids, 406

supervised classification, see

classification supervised learning, 4 support vector machines, 48,

382, 384, 398, 399 support vectors, 382

SWIG, see Python/wrapping

compiled code

SX Phe, 446 telescope diffraction pattern, 410

temporal correlation, 404 tests; Anderson–Darling, 154,

157; F , 162; Fasano and

Franceschini, 156;

Kolmogorov–Smirnov, 151;

Kuiper, 152;

Mann–Whitney–Wilcoxon, 155; non-Gaussianity, 157;

nonparametric, 151;

parametric, 160; power, 145;

Shapiro–Wilk, 158; t, 161; U , 155; Welch’s t, 162; Wilcoxon

rank-sum, 155; Wilcoxon signed-rank, 155

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

(HTF09), 9, 134, 136, 137,

141, 147, 181

“The magical number 7± 2”, 365

The Visual Display of Quantitative Information, 31

time series, 403, 406, 458; comparison of methods, 465 top-hat, 416

total least squares, 343 training sample, 5 tree traversal patterns, 378 tricubic kernel, 340 trigonometric basis functions, 417

Two Micron All Sky Survey (2MASS), 15

Type I and II errors, 145, 368 uncertainty distribution, 7 uneven sampling, 465 unevenly sampled data, 460, 461 uniformly sampled data, 410

unsupervised classification, see

classification

unsupervised clustering, see

clustering unsupervised learning, 4 Utopia, 130

variability, 404 variable

—categorical, 371

—continuous, 372

—random, 71; continuous, 71; discrete, 71; independent, 71; independent identically distributed, 71;

transformation, 77 variance, 46, 79; of a well-sampled time series, 405

variogram, 458 vectorized, 55

Trang 8

Voronoi tessellation, 379

vos Savant, Marilyn, 74

Wass10, see All of

Nonparametric Statistics and

All of Statistics: A Concise

Course in Statistical Inference

wavelets, 418, 454; Daubechies,

418; discrete wavelet

transform (DWT), 418; Haar,

418; Mexican hat, 418;

Morlet, 418; PyWavelets, 418;

wavelet PSD, 418, 419

weave, see Python/wrapping

compiled code Welch’s method, 416 whitening, 298 Whittaker–Shannon, 412 width parameter, 78 Wiener filter, 422, 423

Wiener–Khinchin theorem,

457, 461, 463

WJ03, see Practical Statistics for Astronomers

WMAP cosmology, 170

WSAS, see Advances in Machine Learning and Data Mining for Astronomy

zero-one loss, 367

Ngày đăng: 20/11/2022, 11:16