1. Trang chủ
  2. » Tất cả

Statistics, data mining, and machine learning in astronomy

6 3 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 47,72 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Statistics, Data Mining, and Machine Learning in Astronomy Visual Figure Index This is a visual listing of the figures within the book The first number below each thumbnail gives the figure number wit[.]

Trang 1

Visual Figure Index

This is a visual listing of the figures within the book The first number below each thumbnail gives the figure number within the text; the second (in parentheses) gives the page number on which the figure can be found.

14

17

20

Galaxies

g− r

−1

0

1

2

14

17

20

Stars

g− r

−1

0

1

2

1.1 (19)

λ( ˚A)

50 100 150 200 250 300

Plate = 1615, MJD = 53166, Fiber = 513

1.2 (21)

u− r

14

15

16

17

18

1.3 (23)

redshift

−0.4

0.0

0.2

0.4

0.6

0.8

1.0

1.4 (24)

4500 5000 5500 6000 6500 7000 7500 8000

Teff(K)

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

1.5 (26)

g− r

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

1.6 (28)

0.00.20.40.60.81.0

phase 14

15

Example of phased light curve

g− r

−1

0 1

10−1

100 101

Period (days)

1.7 (29)

Semimajor Axis (AU)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

1.8 (31)

g− r

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

1.9 (32)

g− r

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

1.10 (33)

5000

7000

T eff

1 .5

2 .0

2 .5

3 .0

3 .5

4 .0

4 .5

number in pixel

5000

7000

T eff

1 .5

2 .0

2 .5

3 .0

3 .5

4 .0

4 .5

−2.5 −1.5 −0.5 0.5

mean [Fe /H]inpixel

5000

7000

T eff

1 .5

2 .0

2 .5

3 .0

3 .5

4 .0

4 .5

−2.5 −1.5 −0.5 0.5

mean [Fe /H]inpixel

1.11 (34)

−0.2−0.1 0.0 0.1 0.2 0.3 0.4

a∗

−0.8

−0.4

0.0 0.2 0.4

2.0 2.2 2.4 2.6 2.8 3.0 3.2 a(AU) 0.00 0.05 0.10 0.15 0.20 0.25

1.12 (34)

−75◦

−60

−45◦

−30◦

−15

0◦

15◦

30◦

45◦

60◦

75◦

Mercator projection

1.13 (35)

−120◦−60◦ 60120

−60 ◦

−30◦

30

60Hammer projection

−120◦−60◦ 60120

−60 ◦

−30◦

30

60Aitoff projection

−120 ◦ −60 ◦

60120

−60 ◦

−30◦

30

60

Mollweide projection

−120 ◦ −60 ◦

60120

Lambert projection

1.14 (36)

HEALPix Pixels (Mollweide)

Raw WMAP data

1.15 (38)

Length of Array

10−4

10−3

10−2

10−1

100

101

Scaling of Search Algorithms

linear search (O[N])

efficient search (O[log N])

2.1 (45)

Length of Array

10−3 10−2 10−1

100 101 102

Scaling of Sort Algorithms list sort NumPy sort

O[N log N]

O[N]

2.2 (52)

Quad-tree Example

2.3 (58)

kd-tree Example

2.4 (59)

Ball-tree Example

2.5 (61)

p(A ∪ B) = p(A) + p(B) − p(A ∩ B)

3.1 (70)

x

×10−3

Joint Probability

0.0 0.5 1.0 1.5 2.0

x

p(y)

0.0 0.5 1.0 1.5 2.0

0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 13.5

)p(x

Conditional Probability

0.0 0.5 1.0 1.5 2.0

x

3.2 (72)

0

1 D

3.3 (76)

0 .00 0.25 0.50 0.75 1.00 x

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

1 .2

1 .4

p x x) = Uniform(x)

1 .2 1.6 2.0 2.4 2.8 y

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

y = exp(x)

p y y) = p x(ln y)/y

3.4 (77)

flux

0 .0

0 .4

0 .8

1 .2

1 .6

20% flux error

−1.0 −0.5 0.0 0 .5 1 .0

mag

0 .0

0 .4

0 .8

1 .2

1 .6

mag = −2.5log 10(flux)

3.5 (78)

Trang 2

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Skew Σ and KurtosisK

Gaussian, Σ = 0

mod Gauss, Σ = −0.36

log normal, Σ = 11.2

x

0.0

0.1

0.2

0.3

0.4

0.5

Laplace, K = +3

Gaussian, K = 0

Cosine, K = −0.59

Uniform, K = −1.2

3.6 (80)

x

0.0

0.2

0.4

0.6

0.8

1.0

Uniform Distribution

µ = 0, W = 1

3.7 (86)

x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Gaussian Distribution

µ = 0, σ = 0.5

3.8 (87)

x

0.00

0.05

0.10

0.15

0.20

0.25

Binomial Distribution

b = 0.2, n = 20

3.9 (90)

x

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Poisson Distribution

µ = 1

µ = 15

3.10 (92)

x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Cauchy Distribution

µ = 0, γ = 0.5

3.11 (93)

−6

−2

0 4

mean median robust mean (mixture) robust mean (sigma-clip)

Sample Size

−60

−20

0 20 60

3.12 (94)

x

0.0

0.2

0.4

0.6

0.8

1.0

Laplace Distribution

µ = 0, ∆ = 0.5

3.13 (96)

Q

0.0

0.1

0.2

0.3

0.4

0.5

χ2Distribution

k = 1

k = 7

3.14 (97)

x

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Student’st Distribution

t(k = ∞)

t(k = 2.0)

t(k = 1.0)

t(k = 0.5)

3.15 (99)

x

0.0

0.2

0.4

0.6

0.8

1.0

|d1

,d2

Fisher’s Distribution

d= 1, d2= 1

d= 5, d2= 2

d= 2, d2= 5

d= 10, d2= 50

3.16 (101)

x

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Beta Distribution

α = 0.5, β = 0.5

α = 0.5, β = 1.5

3.17 (102)

x

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Gamma Distribution

k = 1.0, θ = 2.0

k = 5.0, θ = 0.5

3.18 (103)

x

0.0

0.1

0.2

0.3

0.4

0.5

Weibull Distribution

k = 0.5, λ = 1

k = 2.0, λ = 2

3.19 (104)

0.4

0.8

1.2

1.6

2.0

N = 2

0.5

1.0

1.5

2.0

2.5

N = 3

1 3

N = 10

3.20 (106)

−0.15

0.00

0.05

0.10

0.15

¯

µ = mean(x)

σ = 1 √ 12 · W

N

−0.03

0.00

0.01

0.02

0.03

¯

µ = 12[max(x) + min(x)]

σ = 1 √ 12 · 2W N

3.21 (107)

x

−4

−2

0 2 4

σ1= 2

σ2= 1

α = π/4

σx= 1.58

σy= 1.58 σxy= 1.50

3.22 (110)

x

6 8 10 12 14

5% outliers

Input Fit Robust Fit

x

15% outliers

Input Fit Robust Fit

3.23 (113)

0 10 20

Pearson-r

No Outliers

0 8 10 18

Spearman-r

0.32 0.34 0.36 0.38 0τ 40 0.42 0.44 0.46 0.48

0 10 25

Kendall-τ

3.24 (119)

x

0 50 100 200 300

Input data distribution

x

0.0

0.2

0.4

0.6

0.8

1.0

Cumulative Distribution

0.00.20.40.60.81.0 p(< x)

−3

−1

0 2

Inverse Cuml Distribution

x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

KS test: Cloned Distribution

3.25 (121)

9

10

11

correct errors

ˆ

µ = 9.99 χ2dof= 0.96 (−0.2 σ)

overestimated errors

ˆ

µ = 9.99 χ2dof= 0.24 (−3.8 σ)

observations

9

10

11

underestimated errors

ˆ

dof= 3.84 (14 σ)

observations

incorrect model

ˆ

µ = 10.16 χ2

dof= 2.85 (9.1 σ)

4.1 (133)

−6−4−2 0 2 4 6 x

0 .00

0 .05

0 .10

0 .15

0 .20

0 .25

0 .30

0 .35

Best-fit Mixture

1 2 3 4 5 6 7 8 9 10

n components 3800 3850 3900 3950 4000 4050

AIC

−6−4−2 0 2 4 6 x

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

4.2 (139)

σ

0 5 10 15 20

σ (std.dev.)

σG (quartile)

4.3 (142)

σ ∗

0 100 200 300 400 500

σ ∗ (std .dev.)

σ ∗

G (quartile)

0 .5 0.6 0.7 0.8 0.9 1.0 1.1 σ

0 5 10 15 20

σ (std.dev.)

σG (quartile)

4.4 (143)

x

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

hB(x)

hS(x)

xc = 120

(

x > xc classified as sources)

4.5 (145)

p = 1 − HB(i)

10−3

10−2

10−1

100

=

=

=

0001

4.6 (149)

0.0

0.1

0.2

0.3

0.4

0.5

Anderson-Darling:A2 = 0.29

Kolmogorov-Smirnov:Shapiro-Wilk:D = 0.0076 W = 1 Z1 = 0.2

x

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Anderson-Darling:A2 = 194.50

Kolmogorov-Smirnov:Shapiro-Wilk:W = 0.94 Z1 = 32.2 D = 0.28 Z2 = 2.5

4.7 (153)

x

y

max(x)

( x i ,y i

J i

x

ymax

( x)

xmax (

x k ,y k

J k

4.8 (167)

0 .0 0.2 0.4 0.6 0.8 1.0

x, y

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

1 .2

1 .4

1 .6

1 .8

p(x)

0 .0 0.2 0.4 0.6 0.8 1.0 x

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

5580 points

0 .0

2 .5

5 .0

7 .5

10 .0

12 .5

15 .0

17 .5

20 .0

22 .5

4.9 (170)

0.080.090.100.110.12 z

−22.0

−21.0

−20.0

u − r > 2.22

0.080.090.100.110.12 z

10 16 22

u − r > 2.22

−23

−22

−21

−20 M 10−5 10−3 10−1

100

u − r > 2.22

0.080.090.100.110.12 z

−22.0

−21.0

−20.0

u − r < 2.22

N = 45010

4.10 (171)

k

10−1

100

101

102

103

n = 10

b∗= 0.5

b∗= 0.1

k

n = 20

5.1 (188)

xobs 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

sampled region

true distribution observed distribution

−4 −3 −2 −1 0 1 2 3 4

xobs− xtrue 0.0 0.1 0.2 0.3 0.4 0.5 0.6

xobs

observed sample random sample

5.2 (192)

mobs 0.0 0.1 0.2

scatter bias

σπ/π

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

p = 2

5.3 (195)

µ

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

L(µ, σ) for ¯x = 1, V = 4, n = 10

−5.0

−4.0

−3.0

−2.0

−1.0

0.0

5.4 (199)

10−4 10−2

100

µ

0.0

0.2

0.4

0.6

0.8

1.0

10−4 10−2

100

0.0

0.2

0.4

0.6

0.8

1.0

5.5 (201)

Trang 3

Visual Figure Index • 529

x

0.00

0.05

0.10

0.15

0.20

0.25

σ fit

σG fit

5.6 (203)

µ

1 2 3 4 5

L(µ, σ) for ¯x = 1, σtrue = 1, n = 10

−5.0

−4.0

−3.0

−2.0

−1.0

0.0

5.7 (205)

−3 −2 −1 0 1 2 3 4 5 µ

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

σ

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

marginalized approximate

5.8 (206)

0 .0 0.2 0.4 0.6 0.8 1.0 b

0 .0

0 .5

1 .0

1 .5

2 .0

2 .5

3 .0

0 .0 0.2 0.4 0.6 0.8 1.0 b

10−6

10 −5

10 −4

10−3

10 −2

10−1

10 0

5.9 (208)

µ

1 2 3 4 5

L(µ, γ) for ¯x = 0, γ = 2, n = 10

−5.0

−4.0

−3.0

−2.0

−1.0

0.0

5.10 (209)

µ

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 γ

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 γ

0.0

0.2

0.4

0.6

0.8

1.0

5.11 (210)

L(µ, W ) uniform, n = 100

−6.4

−4.8

−3.2

−1.6

0.0

µ

p(W )

9.8

10.0

10.2

10.4

10.6

5.12 (212)

0.6 0.8 1.0 1.2 σ1.4 1.6 1.8 2.0

0.0

0.2

0.4

0.6

0.8

1.0

L(σ, A) (Gauss + bkgd, n = 200)

−5.0

−3.5

−2.0

−0.5

0.0

x

0.0

0.1

0.2

0.3

5.13 (214)

x

0 .00

0 .02

0 .04

0 .06

0 .08

0 .10

0 .12

0 .14

0 .16

0 .18

−0.01 0 .00 0 .01 0 .02 a

0 50 100 150 200 250 300

500 pts

20 pts

5.14 (216)

−5

0 10 20

y i

50 points

5 bins

a∗

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Poisson Likelihood Gaussian Likelihood

x

−1

0 2 4

y i

50 points

40 bins

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Poisson Likelihood Gaussian Likelihood

5.15 (218)

a

0

50

100

150

200

continuous

discrete, 1000 bins

discrete, 2 bins

5.16 (219)

µ

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

µ

5.17 (222)

g1

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

p(g1) (bad point) p(g1|µ0) (bad point)

p(g1) (good point) p(g1|µ0) (good point)

5.18 (223)

100

104

108

1012

1016

1020

1024

1028

Sample SizeN

−60

−20

0 20 60

5.19 (224)

Scott’s Rule:

38 bins Gaussian distribution

Scott’s Rule:

24 bins non-Gaussian distribution

Freed.-Diac.:

49 bins Freed.-Diac.:

97 bins

x

Knuth’s Rule:

38 bins

x

Knuth’s Rule:

99 bins

5.20 (227)

0.0

0.1

0.2

0.3

0.4

Knuth Histogram

Bayesian Blocks

x

0.0

0.1

0.2

0.3

0.4

Knuth Histogram

Bayesian Blocks

5.21 (229)

0 1 2 4

5.22 (232)

x

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

µ1= 0;σ1= 0.3

µ2= 1;σ2= 1.0

ratio = 1.5

Input pdf and sampled data true distribution best fit normal

5.23 (235)

0.30.40.5 µ0.60.70.8

0.75

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

Single Gaussian fit

0.8

1.2

1.6

0.2

0.3

0.4

0.75

1.00

1.25

−0.2 −0.1 0.0 µ1 0.1

0.6

1.2

1.8

0.8 µ21.21.60.20.30.40.75 σ21.001.25

5.24 (236)

µ

0 1 2 3 4 5

5.25 (239)

5.8

6.0

6.2

6.4

6.6

0.10.20.30.40.50.6

A

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

5.8 6.0 6.2 µ 6.4 6.6

x

0.0

0.1

0.2

0.3

0.4

0.5

yobs

5.26 (240)

x

0.0

0.2

0.4

0.6

0.8

1.0

x

0.0

0.2

0.4

0.6

0.8

1.0

x

0.0

0.2

0.4

0.6

0.8

1.0

x

6.1 (252)

u

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Gaussian Exponential Top-hat

6.2 (253)

−350

−250

−300 −200 −100 0 100

y (Mpc)

−350

−250

top-hat ( h = 10)

−300 −200 −100 0 100

y (Mpc)

exponential ( h = 5)

6.3 (255)

−350

−250

−300 −200 −100 0 100

y (Mpc)

−350

−250

k-neighbors (k = 5)

−300 −200 −100 0 100

y (Mpc) k-neighbors (k = 40)

6.4 (259)

0.0

0.1

0.2

0.3

0.4

Nearest Neighbors (k=10)

Kernel Density (h=0.1)

Bayesian Blocks

x

0.0

0.1

0.2

0.3

0.4

Nearest Neighbors (k=100)

Kernel Density (h=0.1)

Bayesian Blocks

6.5 (260)

−0.9−0.6−0.3 0.0

[Fe /H]

0 .0

0 .1

0 .2

0 .3

0 .4

0 .5

Input

0 2 4 6 8 10 12 14

N components

−34500

−33500

−32500

−0.9−0.6−0.3 0.0

[Fe /H]

0 .0

0 .1

0 .2

0 .3

0 .4

0 .5

Converged

6.6 (261)

−350

−250

y (Mpc)

−350

−250

6.7 (262)

0.0

0.1

0.2

0.3

0.4

Mixture Model (3 components) Kernel Density (h = 0.1)

Bayesian Blocks

x

0.0

0.1

0.2

0.3

0.4

Mixture Model (10 components)

h = 0.1)

Bayesian Blocks

6.8 (263)

0 20 60 100

N = 100 points

0 20 60 100

N = 1000 points

N = 10000 points

n clusters

16.0

16.5

17.0

17.5

18.0

18.5

N=100 N=1000 N=10000

6.9 (265)

−2 0 2 4 6 8

x

−2

0

2

4

6

8

Input Distribution

−2 0 2 4 6 8

x

Density Model

−2 0 2 4 6 8

x

Cloned Distribution

6.10 (266)

−5

0 10

x

−5

0 10

Extreme Deconvolution resampling

x

Extreme Deconvolution cluster locations

6.11 (268)

−0.5

0.0

0.5

1.0

1.5

g − r

−0.5

0.0

0.5

1.0

1.5

Extreme Deconvolution resampling

g − r

Extreme Deconvolution cluster locations

0 10 30 50

w = −0.227g + 0.792r

−0.567i + 0.05

single epoch

σG = 0.016

standard stars

σG = 0.010

XD resampled

σG = 0.008

6.12 (269)

[Fe/H]

0.0

0.1

0.2

0.3

0.4

0.5

6.13 (272)

[Fe/H]

0.0

0.1

0.2

0.3

0.4

0.5

6.14 (274)

Trang 4

−250

−350

−250

y (Mpc)

−350

−250

6.15 (276)

r12

r12

r23

r31

r12

r13

r24

r14

r23r34

6.16 (278)

θ (deg)

10−2 10−1 100

10 1

u − r > 2.22

N = 38017

θ (deg)

10−2 10−1 100

10 1

u − r < 2.22

N = 16883

6.17 (280)

3000 4000 5000 6000 7000 wavelength ( ˚A)

3000 4000 5000 6000 7000 wavelength ( ˚A)

3000 4000 5000 6000 7000 wavelength ( ˚A)

7.1 (291)

x

y

x y

7.2 (293)

1

=

=

7.3 (295)

mean PCA components

component 1

component 2

component 3

3000 4000 5000 6000 7000 wavelength ( ˚A) component 4

mean ICA components

component 1

component 2

component 3

3000 4000 5000 6000 7000 wavelength ( ˚A) component 4

component 1 NMF components

component 2

component 3

component 4

3000 4000 5000 6000 7000 wavelength ( ˚A) component 5

7.4 (298)

10−3 10−1

100 102

Eigenvalue Number

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

7.5 (299)

0 10

mean

0 10

mean + 4 components (σ2tot = 0.85)

0 10

mean + 8 components (σ2tot = 0.93)

wavelength ( ˚A) 0

10

mean + 20 components (σ2tot = 0.94)

7.6 (300)

λ ( ˚ A)

True spectrum (nev=10)

λ ( ˚ A)

λ ( ˚ A)

7.7 (304) PCA projection

7.8 (307)

absorption galaxy galaxy emission galaxy narrow-line QSO broad-line QSO

−1.0

0.0

0.5

1.0

c2

c1

−0.8

0.0

0.4

0.8

c3

c2

absorption galaxy galaxy emission galaxy narrow-line QSO broad-line QSO

−0.04

0.00

0.02

−0.010 −0.0050.000 c10.0050.010

−0.04

0.00

0.02

0.04

−0.04 −0.02 c20.00 0.02

7.9 (310)

x

−2.0

−0.5

0.0

0.5

1.0

1.5

2.0

x1 x2 x3 x4

True fit fit to{x1, x2, x3}

−0.5

0.0

0.5

1.0

0.5 1θ1 0 1.5 2.0

−0.5

0.0

0.5

1.0

x4

8.1 (323)

38 42 46

χ2dof = 1.57

Straight-line Regression

χ2dof = 1.02

4th degree Polynomial Regression

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 z

38 42 46

χ2dof = 1.09

Gaussian Basis Function Regression

0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 z χ2dof = 1.11

Gaussian Kernel Regression

8.2 (328)

θ1

θ2

θnormal equation

θridge

θ2

θnormal equation

θlasso

r

8.3 (333)

36

42

48

Linear Regression

0.0 0.5 1.0 1.5

z

−15

−5

0

10

×1012

Linear Regression

Ridge Regression

0.0 0.5 1.0 1.5

z

−2

0

2

4

Ridge Regression

Lasso Regression

0.0 0.5 1.0 1.5

z

−0.5

0.0

0.5

1.0

1.5

2.0

Lasso Regression

8.4 (335)

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

z

36 38 40 42 44 46

100 observations

0.1 0.2 0.3 0.4 0.5 0.6 0.7 ΩM 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

8.5 (341)

x

100 200 300 400 500 600

1.8 2.0 2.2 2.4 2.6 slope

−60

−20

0 20 60 100

8.6 (344)

t

0 10 20 30 40 50

c = 1

c = 2

c = 5

c = ∞

8.7 (346)

100 200 300 400 500 600 700

squared loss:

y = 1.08x + 213.3

Huber loss:

y = 1.96x + 70.0

8.8 (347)

100

300

500

700

intercept

0.6

0.8

1.0

1.2

1.4

1.6

no outlier correction

(dotted fit)

intercept

2.0

2.2

2.4

2.6

2.8

mixture model

(dashed fit)

intercept

2.0

2.2

2.4

2.6

2.8

outlier rejection

(solid fit)

8.9 (348)

−3

−1

0 2

−3

−1

0 2

−2.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

−1.5

0.0

0.5

1.0

1.5

2.0

8.10 (350)

36 38 40 42 44 46 48

8.11 (352)

x

0.0

0.5

1.0

1.5

2.0

d = 1

8.12 (353)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

x

0.0 0.5 1.0 1.5 2.0

d = 2

0.0 0.5 1.0 1.5 2.0 2.5 3.0

x

d = 3

0.0 0.5 1.0 1.5 2.0 2.5 3.0

x

d = 19

8.13 (354)

polynomial degree

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

cross-validation

training

polynomial degree

0

20

60

100

cross-validation

training

8.14 (356)

Number of training points

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

training

Number of training points

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

training

8.15 (358)

x

0.0

0.1

0.2

0.3

0.4

0.5

g1(x)

g2(x)

9.1 (370)

x

−1

0 1 2 3 4 5

9.2 (373)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

9.3 (374)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u g

−0.1

0.0

0.1

0.2

0.3

0.0

0.2

0.4

0.6

0.8

1.0

N colors

0.0

0.2

0.4

0.6

0.8

1.0

9.4 (376)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

9.5 (376)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

N=1

9.6 (378)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

k = 10

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

k=1 k=10

9.7 (380)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

9.8 (382)

Trang 5

Visual Figure Index • 531

x

−1

0

1

2

3

4

9.9 (383)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

9.10 (385)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

9.11 (386)

69509/ 346

split ong − r

2841/ 333

split onu − g

66668/ 13

split ong − r

1666/ 23

split ong − r

1175/ 310

split onr − i

1645/ 11

split onu − g

65023/ 2

split onr − i

392/ 16

split oni − z

1274/ 7

split onu − g

756/ 41

split onr − i

419/ 269

split oni − z

1616/ 3

split onu − g

29/ 8

split onr − i

6649/ 2

split onu − g

58374/ 0

non- variable

126 / 1 split on i − z

266 / 15 split on g − r

1001 / 2 split on i − z

273 / 5 split on g − r

379 / 0

non-variable

377 / 41 split on u − g

123 / 18 split on g − r

296 / 251 split on i − z

1296 / 0

non-variable

320 / 3 split on i − z

21 / 1 split on g − r

8 / 7 split on r − i

5200 / 0

non-variable

1449 / 2 split on u − g

Numbers are count of non-variable / RR Lyrae

in each node

Training Set Size:

69855 objects Cross-Validation, with

137 RR Lyraes (positive)

23149 non-variables (negative) false positives: 53 (43.4%) false negatives: 68 (0.3%)

9.12 (387)

0.7 0.8 0.9 1.0 1.1 1.2 1.3

u − g

−0.1

0.0 0.1 0.2 0.3

depth = 12

0.0 0.2 0.4 0.6 0.8 1.0

N colors 0.0 0.2 0.4 0.6 0.8 1.0

depth=7 depth=12

9.13 (388)

depth of tree

0.01

0.02

0.03

0.04

cross-validation

training set

ztrue

0.0

0.1

0.2

0.3

0.4

depth = 13

rms = 0.020

9.14 (390)

depth of tree 0.01 0.02 0.03 0.04

cross-validation training set

ztrue 0.0 0.1 0.2 0.3 0.4

zfit

depth = 20 rms = 0.017

9.15 (392)

number of boosts 0.01 0.02 0.03

Tree depth: 3

cross-validation training set

ztrue 0.0 0.1 0.2 0.3 0.4

N = 500 rms = 0.018

9.16 (395)

0 .000 0.008 0.016 0.024 0.032 0.040

false positive rate

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

GNB QDA LR KNN DT GMMB

0 .0 0.2 0.4 0.6 0.8 1.0

efficiency

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1 .0

9.17 (396)

−0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

u − g

−0.2

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.00 0.03 0.06 0.09 0.12 0.15 false positive rate 0.6 0.7 0.8 0.9 1.0

GNB

QDA LR KNN DT GMMB

9.18 (396)

1 mode

3 modes

phase

8 modes

10.1 (407)

0.0

0.5

1.0

1.5

2.0

data windowW (x)

Convolution:

[

D ∗ W ](x)

0.20.40.6 0.8 x

0.0

0.5

1.0

1.5

[D ∗ W ](x)= F−1{F[D] · F[W ]}

F(D)

F(W )

Pointwise product:

F(D) · F(W )

k

10.2 (411)

Signal and Sampling Window Sampling Rate ∆t

Time Domain: Multiplication

FT of Signal and Sampling Window

f = 1/∆t

Frequency Domain: Convolution

t

Sampled signal: pointwise multiplication

f

Convolution of signal FT and window FT Well-sampled data: ∆t < tc

Signal and Sampling Window Sampling Rate ∆t

Time Domain: Multiplication

FT of Signal and Sampling Window

f = 1/∆t

Frequency Domain: Convolution

t

Sampled signal: pointwise multiplication

f

Convolution of signal FT and window FT Undersampled data: ∆t > tc

10.3 (413)

−1.5

0 .0

0 .5

1 .0

1 .5

Data

0 .0 0.2 0.4 0.6 0.8 1.0

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

Data PSD

t

−0.2 0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

1 .2

1 .4

Window

0 .0 0.2 0.4 0.6 0.8 1.0 f

0 .0

0 .2

0 .4

0 .6

0 .8

1 .0

Window PSD

10.4 (414)

t

−0.1

0.0

0.1

0.2

0.3

f

0.0

0.2

0.4

0.6

0.8

10.5 (416)

time (s)

−1.0

0.0

0.5

1.0

×10−18

frequency (Hz)

10−46

10−40

Top-hat window

frequency (Hz)

10−46

10−40

Hanning (cosine) window

10.6 (417)

−2

0

Input Signal:

Localized Gaussian noise

−1.0

0.0

0.5

1.0

Example Wavelet

t0 = 0, f0 = 1.5, Q = 1.0

w(t; t0, f0, Q) = e−[f0(t−t0)/Q] 2 e2πif0(t−t0)

real part imag part

t

1 4 7

Wavelet PSD

10.7 (419)

−1.0

0.0

0.5

1.0

1.5

2.0

Input Signal:

Localized spike plus noise

−1.0

0.0

0.5

1.0

Example Wavelet

t0 = 0, f0 = 1/8, Q = 0.3

w(t; t0, f0, Q) = e−[f0(t−t0)/Q] 2 e2πif0(t−t0)

real part imag part

t

1/8 1/4 1/2

Wavelet PSD

10.8 (420)

−1.0

0.0

0.5

1.0

f0 = 5

t

−1.0

0.0

0.5

1.0

f0 = 10

Q = 1.0

t

f0 = 10

Q = 0.5

10.9 (421)

λ

−0.5

0.0

0.5

1.0

1.5

Input Signal

λ

Filtered Signal Wiener Savitzky-Golay

f

0 1000 3000

Input PSD

f

Filtered PSD

10.10 (423)

λ

0 .0

0 .1

0 .2

0 .3

0 .4

Effective Wiener

Filter Kernel

0 10 20 30 40 50 60 70 80 90

λ

−0.5

0 .0

0 .5

1 .0

1 .5

Kernel smoothing

result

10.11 (424)

λ ( ˚A)

30 60 90 100

f

0.0

0.2

0.4

0.6

0.8

1.0

10.12 (425)

λ ( ˚A)

30 60 90 100

SDSS white dwarf 52199-659-381

f

10−8

10−7

10−6

10−5

10−4

10−3

10−2

10101010−1012

10.13 (426)

t

−1.5

0.0 0.5 1.0 1.5

Data

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

f

0.0 0.2 0.4 0.6 0.8 1.0

PLS

Window PSD 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

f

0.0 0.2 0.4 0.6 0.8 1.0

Data PSD

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

f

0.0 0.2 0.4 0.6 0.8 1.0

PLS

Data PSD (10x errors)

10.14 (431)

time (days) 7 9 10 12

period (days)

0.0

0.2

0.4

0.6

0.8

−10

0 10 30

10.15 (437)

t

8

10

12

14

ω

0.0

0.2

0.4

0.6

0.8

1.0

PLS

standard

generalized

−10

0

10

30

50

10.16 (439)

14.4

14.8

15.2

ID = 14752041

P = 8.76 hr

14.6

14.8

15.0

ID = 1009459

P = 2.95 hr

13.6

14.0

14.4

14.8

ID = 10022663

P = 14.78 hr 15

17ID = 10025796

P = 3.31 hr

0.00.20.40.60.81.0

phase

15.6

15.9

16.2

ID = 11375941

P = 2.58 hr

0.00.20.40.60.81.0

phase

14.5

15.0

15.5ID = 18525697

P = 13.93 hr

10.17 (440)

17.18 17.19 17.20 17.21 17.22 17.23 0.0 0.2 0.4 0.6 0.8 1.0

6 terms

1 term

0.0 0.2 0.4 0.6 0.8 1.0

14.4 14.8 15.2

ω0= 17.22

P0= 8.76 hours

ω

0.0 0.2 0.4 0.6 0.8 1.0

6 terms

1 term

0.0 0.2 0.4 0.6 0.8 1.0 phase

14.4 14.8 15.2

ω0= 8.61

P0= 17.52 hours

10.18 (442)

0 5000 10000 20000 30000

ω0= 17.22

N frequencies 0

5000 10000 20000 30000

ω0= 8.61

22750 22825 zoomed view

26675 26750 zoomed view

10.19 (443)

−1.5

−0.5

0.0

0.5

−0.50.00.51.01.52.0

g − i

−1.5

−0.5

0.0

0.5

0.2 0.4 0.6 0.8 1.0 1.2 1.4 A

10.20 (445)

−0.50.00.51.01.52.0

g − i

−1.5

0.0

0.5

1.0

1.5

2.0

−0.50.00.51.01.52.0

g − i

1.0

1.5

2.0

2.5

−0.50.00.51.01.52.0

g − i

0.5

1.0

1.5

2.0

2.5

−0.50.00.51.01.52.0

g − i

−0.2

0.0

0.2

0.4

0.6

0.8

1.0

10.21 (446)

−1.5

−0.5

0.0

0.5

−0.50.00.51.01.52.0

g − i

−1.5

−0.5

0.0

0.5

0.2 0.4 0.6 0.8 1.0 1.2 1.4 A

10.22 (448)

−1.5

−0.5

0.0

0.5

−0.50.00.51.01.52.0

g − i

−1.5

−0.5

0.0

0.5

0.2 0.4 0.6 0.8 1.0 1.2 1.4 A

10.23 (449)

0.6

0.7

0.8

0.9

1.0

1.1

0.0

0.5

1.0

1.5

3.8

3.9

4.0

4.1

4.2

0.6 0.7 0.8 0.9 1.0 1.1

φ

0 10

t

10.24 (452)

2 6 10

46 50 54

b0

0.00

0.05

0.10

0.15

0.20

0.25

T

t

8 10 15

10.25 (454)

Trang 6

0 20 60 80 100

t

2

10

20

4.0

4.5

5.0

5.5

6.0

0.07

0.08

0.09

0.10

0.11

b0

0.0098

0.0099

0.0100

0.0101

0.0102

0.0103

0.0104

4.0 4.5 5.0 5.5 6.0

ω

10.26 (455)

0.76

0.78

0.80

0.82

0.088

0.096

0.104

0.112

29.8530.0030.15 T

0.01980

0.01995

0.02010

0.760.780.800.82

ω

t

−1.5

0.0

0.5

1.0

1.5

2.0

2.5

10.27 (456)

−2

0 2

Input Signal: chirp

t

0.1

0.2

0.3

0.4

0.5

0.6

Wavelet PSD

10.28 (457)

t

−1.5

−0.5

0.0

0.5

1.0

1.5

P (f) ∝ f−1

f

10 10

−1

0

101

t

P (f) ∝ f−2

f

10.29 (459)

t (days)

19.2

19.4

19.6

19.8

20.0

20.2

20.4

20.6

20.8

t (days)

−1.0

0.0

0.5

1.0

Scargle True Edelson-Krolik

10.30 (462)

np.arange(3) + 5

np.ones((3, 3)) + np.arange(3)

0

2

0

2

0

2

np.arange(3).reshape((3, 1)) + np.arange(3)

A.1 (493)

x

−1.0

−0.5

0.0

0.5

1.0

Simple Sinusoid Plot

A.2 (496)

x

−1.0

−0.5

0.0

0.5

1.0

Simple Sinusoid Plot

A.3 (496)

x 0 10 20 30 40 50 60

A.4 (497)

−1.0

−0.5

0.0

0.5

1.0

A.5 (502)

Wavelength (Angstroms)

0.0

0.1

0.2

0.3

0.4

0.5

SDSS Filters and Reference Spectrum

C.1 (516)

t

−1.0

0.0

0.5

1.0

Re[h]

Im[h]

f

−1.5

0.0

0.5

1.0

1.5

E.1 (524)

Ngày đăng: 20/11/2022, 11:14