Pandas之Merge
pandas提供了大量的方法,能轻松的对Series,DataFrame和Panel执行合并操作。详情请查看Merging section,官方网页是:
http://pandas.pydata.org/pandas-docs/stable/merging.html#merging,详细介绍,可以参照官方的文档,本文就简单扼要的介绍下述三个函数
Concat
简单地按行拼接
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 4))
print(df)
0 1 2 3
0 1.503786 -0.015502 1.399567 0.857002
1 1.391407 -0.565491 0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248
3 -0.917173 -0.429209 0.676149 -0.245727
4 0.165568 -1.031282 -1.861940 -0.300801
5 0.436347 -0.352189 -0.438893 -1.687227
6 1.635600 0.792631 0.043945 2.041445
7 -0.856410 0.476403 0.788491 1.003481
8 -0.588387 1.865996 -0.886048 -0.033396
9 0.508678 -2.174582 -0.193041 0.217243
#pieces是一个list结构,pieces中的每一个元素是一个二维数组,其维度分别是3*4,4*4,3*4
pieces = [df[:3], df[3:7], df[7:]]
print(type(pieces))
print(pieces)
<class 'list'>
[ 0 1 2 3
0 1.503786 -0.015502 1.399567 0.857002
1 1.391407 -0.565491 0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248, 0 1 2 3
3 -0.917173 -0.429209 0.676149 -0.245727
4 0.165568 -1.031282 -1.861940 -0.300801
5 0.436347 -0.352189 -0.438893 -1.687227
6 1.635600 0.792631 0.043945 2.041445, 0 1 2 3
7 -0.856410 0.476403 0.788491 1.003481
8 -0.588387 1.865996 -0.886048 -0.033396
9 0.508678 -2.174582 -0.193041 0.217243]
print(pd.concat(pieces))
0 1 2 3
0 1.503786 -0.015502 1.399567 0.857002
1 1.391407 -0.565491 0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248
3 -0.917173 -0.429209 0.676149 -0.245727
4 0.165568 -1.031282 -1.861940 -0.300801
5 0.436347 -0.352189 -0.438893 -1.687227
6 1.635600 0.792631 0.043945 2.041445
7 -0.856410 0.476403 0.788491 1.003481
8 -0.588387 1.865996 -0.886048 -0.033396
9 0.508678 -2.174582 -0.193041 0.217243
Join
类似SQL的合并操作,详情请查看Database style joining,效果就是笛卡尔积。
left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
print(left)
print('\n',right)
mlr = pd.merge(left, right, on='key')
print('\n',mlr)
key lval
0 foo 1
1 foo 2
key rval
0 foo 4
1 foo 5
key lval rval
0 foo 1 4
1 foo 1 5
2 foo 2 4
3 foo 2 5
left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
print(left)
print(right)
mlr = pd.merge(left, right, on='key')
print(mlr)
key lval
0 foo 1
1 bar 2
key rval
0 foo 4
1 bar 5
key lval rval
0 foo 1 4
1 bar 2 5
Append
向 DataFrame 增加新的数据行,追加,详情请查看Appending,
df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
print(df)
#获取游标有3的数据,添加到末尾
s = df.iloc[3]
print(df.append(s, ignore_index=True) )
A B C D
0 0.578197 -0.721444 0.452432 -0.529774
1 -1.213266 0.922351 -0.215287 1.127292
2 -0.158181 -1.489970 -0.728378 0.697814
3 -1.599426 -0.134784 -1.147437 -0.704976
4 -0.195494 0.903841 -0.815240 0.804362
5 2.255849 0.151316 -1.186190 0.871065
6 -0.933151 -0.689410 -0.546619 0.298793
7 0.060045 0.154370 0.877249 0.609006
A B C D
0 0.578197 -0.721444 0.452432 -0.529774
1 -1.213266 0.922351 -0.215287 1.127292
2 -0.158181 -1.489970 -0.728378 0.697814
3 -1.599426 -0.134784 -1.147437 -0.704976
4 -0.195494 0.903841 -0.815240 0.804362
5 2.255849 0.151316 -1.186190 0.871065
6 -0.933151 -0.689410 -0.546619 0.298793
7 0.060045 0.154370 0.877249 0.609006
8 -1.599426 -0.134784 -1.147437 -0.704976