Pandas之Grouping


和 SQL 中的GROUP BY类似,包括以下这几步:

  • 根据某些规则,把数据分组;
  • 对每个分组应用不同聚集函数;
  • 把结果放在一个数据结构中,进行展示。

官方网页:http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby,根据官方介绍,分组的步骤如下:

By “group by” we are referring to a process involving one or more of the following steps

  • Splitting the data into groups based on some criteria
  • Applying a function to each group independently
  • Combining the results into a data structure
例子:
import numpy as np
import pandas as pd

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})

print(df)
     A      B         C         D
0  foo    one  0.650290 -0.158116
1  bar    one  0.150322 -0.767010
2  foo    two -1.160527 -0.307235
3  bar  three -0.514078 -0.139870
4  foo    two  1.030693  0.660093
5  bar    two  0.094431  0.713019
6  foo    one -1.112724 -1.266692
7  foo  three -0.173227  0.303787

分组后sum求和:

sum1=df.groupby('A').sum()
print(sum1)
            C         D
A                      
bar -0.269325 -0.193861
foo -0.765495 -0.768163

对多列分组后sum:

sum2=df.groupby(['A','B']).sum()
print(sum2)
                  C         D
A   B                        
bar one    0.150322 -0.767010
    three -0.514078 -0.139870
    two    0.094431  0.713019
foo one   -0.462435 -1.424808
    three -0.173227  0.303787
    two   -0.129834  0.352858

results matching ""

    No results matching ""