Powered by GitBook

Pandas之Grouping

和 SQL 中的GROUP BY类似，包括以下这几步：

根据某些规则，把数据分组；
对每个分组应用不同聚集函数；
把结果放在一个数据结构中，进行展示。

官方网页：http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby，根据官方介绍，分组的步骤如下:

By “group by” we are referring to a process involving one or more of the following steps

Splitting the data into groups based on some criteria

Applying a function to each group independently

Combining the results into a data structure

例子：

import numpy as np
import pandas as pd

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})

print(df)

     A      B         C         D
0  foo    one  0.650290 -0.158116
1  bar    one  0.150322 -0.767010
2  foo    two -1.160527 -0.307235
3  bar  three -0.514078 -0.139870
4  foo    two  1.030693  0.660093
5  bar    two  0.094431  0.713019
6  foo    one -1.112724 -1.266692
7  foo  three -0.173227  0.303787

分组后sum求和:

sum1=df.groupby('A').sum()
print(sum1)

            C         D
A                      
bar -0.269325 -0.193861
foo -0.765495 -0.768163

对多列分组后sum:

sum2=df.groupby(['A','B']).sum()
print(sum2)

                  C         D
A   B                        
bar one    0.150322 -0.767010
    three -0.514078 -0.139870
    two    0.094431  0.713019
foo one   -0.462435 -1.424808
    three -0.173227  0.303787
    two   -0.129834  0.352858

results matching ""

No results matching ""