Pandas之Grouping
和 SQL 中的GROUP BY
类似,包括以下这几步:
- 根据某些规则,把数据分组;
- 对每个分组应用不同聚集函数;
- 把结果放在一个数据结构中,进行展示。
官方网页:http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby,根据官方介绍,分组的步骤如下:
By “group by” we are referring to a process involving one or more of the following steps
- Splitting the data into groups based on some criteria
- Applying a function to each group independently
- Combining the results into a data structure
例子:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
print(df)
A B C D
0 foo one 0.650290 -0.158116
1 bar one 0.150322 -0.767010
2 foo two -1.160527 -0.307235
3 bar three -0.514078 -0.139870
4 foo two 1.030693 0.660093
5 bar two 0.094431 0.713019
6 foo one -1.112724 -1.266692
7 foo three -0.173227 0.303787
分组后sum求和:
sum1=df.groupby('A').sum()
print(sum1)
C D
A
bar -0.269325 -0.193861
foo -0.765495 -0.768163
对多列分组后sum:
sum2=df.groupby(['A','B']).sum()
print(sum2)
C D
A B
bar one 0.150322 -0.767010
three -0.514078 -0.139870
two 0.094431 0.713019
foo one -0.462435 -1.424808
three -0.173227 0.303787
two -0.129834 0.352858