1. Trang chủ
  2. » Công Nghệ Thông Tin

10 minutes to pandas

26 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 26
Dung lượng 555,91 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

122016 1 Mi0min html 126 10 Minutes to pandas nutes to pandas — pandas 0 17 1 documentation http pandas pydata orgpandThis is a short introduction to pandas, geared mainly for n.122016 1 Mi0min html 126 10 Minutes to pandas nutes to pandas — pandas 0 17 1 documentation http pandas pydata orgpandThis is a short introduction to pandas, geared mainly for n.

Trang 1

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Creating a DataFrame by passing a numpy array, with a datetime index and labeled columns:

In [6]: dates = pd.date_range('20130101', periods=)

Trang 2

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

: 'C' : pd.Series(1,index=list(range()),dtype='float32'

: 'D' : np.array([3] *4,dtype='int32'),

: 'E' : pd.Categorical(["test","train","test","train"

Trang 3

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 4

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 5

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Selection

Note:  While standard Python / Numpy expressions for selecting and setting are intuitive and

come in handy for interactive work, for production code, we recommend the optimized pandas

data access methods, .at , .iat , .loc , .iloc  and .ix

See the indexing documentation  Indexing and Selecting Data  and  MultiIndex / Advanced Indexing

Trang 6

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 7

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 8

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 9

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 10

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

In [55]: df1 = df.reindex(index=dates[04], columns=list(df.columns) + ['E'])

In [56]: df1.loc[dates[0]:dates[1],'E'] =1

Trang 11

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

2013-01-01 False False False False True False

2013-01-02 False False False False False False

2013-01-03 False False False False False True

2013-01-04 False False False False False True

Trang 12

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Freq: D, dtype: float64

In [65]: df.sub(s, axis='index')

Out[65]:

A B C D F

2013-01-01 NaN NaN NaN NaN NaN

2013-01-02 NaN NaN NaN NaN NaN

Trang 13

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

In [68]: s = pd.Series(np.random.randint(0, 7, size=10))

Trang 14

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

In [77]: left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})

In [78]: right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})

Trang 15

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 16

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

http://pandas.pydata.org/pandas-docs/stable/10min.html 16/26

Applying a function to each group independently Combining the results into a data structure

See the sections on  Hierarchical Indexing  and  Reshaping

In [86]: df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',

: 'foo', 'bar', 'foo', 'foo'],

: 'B' : ['one', 'one', 'two', 'three',

: 'two', 'two', 'one', 'three'],

Trang 17

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

With a “stacked” DataFrame or Series (having a MultiIndex  as the index ), the inverse operation of

stack() is unstack(), which by default unstacks the last level:

In [90]: tuples =list(zip([['bar', 'bar', 'baz', 'baz',

: 'foo', 'foo', 'qux', 'qux'],

: ['one', 'two', 'one', 'two',

: 'one', 'two', 'one', 'two']]))

:

In [91]: index = pd.MultiIndex.from_tuples(tuples, names='first', 'second'])

In [92]: df = pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])

Trang 18

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 19

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Time Series

pandas has simple, powerful, and efficient functionality for performing resampling operations during frequency conversion (e.g., converting secondly data into 5­minutely data). This is extremely

In [103]: rng = pd.date_range('1/1/2012', periods=100, freq='S')

In [104]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [105]: ts.resample('5Min', how='sum')

Out[105]:

2012-01-01 25083

Freq: 5T, dtype: int32

In [106]: rng = pd.date_range('3/6/2012 00:00', periods=, freq='D')

In [107]: ts = pd.Series(np.random.randn(len(rng)), rng)

Freq: D, dtype: float64

In [109]: ts_utc = ts.tz_localize('UTC')

Trang 20

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

In [112]: rng = pd.date_range('1/1/2012', periods=, freq='M')

In [113]: ts = pd.Series(np.random.randn(len(rng)), index=rng)

Freq: MS, dtype: float64

In [118]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')

In [119]: ts = pd.Series(np.random.randn(len(prng)), prng)

In [120]: ts.index = (prng.asfreq('M', 'e') +1.asfreq('H', 's') +9

In [121]: ts.head()

Out[121]:

1990-03-01 09:00 -0.902937

Trang 21

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Name: grade, dtype: category

Categories (3, object): [a, b, e]

Freq: H, dtype: float64

In [122]: df = pd.DataFrame({"id":[1,,,,,], "raw_grade":['a', 'b', 'b', 'a',

In [125]: df["grade"].cat.categories = ["very good", "good", "very bad"]

In [126]: df["grade"] = df["grade"]cat.set_categories(["very bad", "bad", "medium"

Name: grade, dtype: category

Categories (5, object): [very bad, bad, medium, good, very good]

Trang 22

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 23

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

On DataFrame, plot() is a convenience to plot all of the columns with labels:

In [133]: df = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,

.: columns=['A', 'B', 'C', 'D'])

.:

In [134]: df = df.cumsum()

In [135]: plt.figure(); df.plot(); plt.legend(loc='best')

Out[135]: <matplotlib.legend.Legend at 0xab53b26c>

Trang 24

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 25

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

Trang 26

1/2/2016 10 Minutes to pandas — pandas 0.17.1 documentation

See  Comparisons  for an explanation and what to do.

See  Gotchas  as well.

In [141]: pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values='NA'])

>>> if pd.Series([False, True, False]):

print("I was true")

Traceback

ValueError: The truth value of an array is ambiguous Use a.empty, a.any() or a.all()

Ngày đăng: 08/09/2022, 11:25

w