Pandas之Merge


pandas提供了大量的方法,能轻松的对Series,DataFrame和Panel执行合并操作。详情请查看Merging section,官方网页是:

http://pandas.pydata.org/pandas-docs/stable/merging.html#merging,详细介绍,可以参照官方的文档,本文就简单扼要的介绍下述三个函数

  • Concat

简单地按行拼接

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 4))
print(df)
         0         1         2         3
0  1.503786 -0.015502  1.399567  0.857002
1  1.391407 -0.565491  0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248
3 -0.917173 -0.429209  0.676149 -0.245727
4  0.165568 -1.031282 -1.861940 -0.300801
5  0.436347 -0.352189 -0.438893 -1.687227
6  1.635600  0.792631  0.043945  2.041445
7 -0.856410  0.476403  0.788491  1.003481
8 -0.588387  1.865996 -0.886048 -0.033396
9  0.508678 -2.174582 -0.193041  0.217243
#pieces是一个list结构,pieces中的每一个元素是一个二维数组,其维度分别是3*4,4*4,3*4
pieces = [df[:3], df[3:7], df[7:]]
print(type(pieces))

print(pieces)
<class 'list'>
[          0         1         2         3
0  1.503786 -0.015502  1.399567  0.857002
1  1.391407 -0.565491  0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248,           0         1         2         3
3 -0.917173 -0.429209  0.676149 -0.245727
4  0.165568 -1.031282 -1.861940 -0.300801
5  0.436347 -0.352189 -0.438893 -1.687227
6  1.635600  0.792631  0.043945  2.041445,           0         1         2         3
7 -0.856410  0.476403  0.788491  1.003481
8 -0.588387  1.865996 -0.886048 -0.033396
9  0.508678 -2.174582 -0.193041  0.217243]
print(pd.concat(pieces))
          0         1         2         3
0  1.503786 -0.015502  1.399567  0.857002
1  1.391407 -0.565491  0.587861 -1.934991
2 -1.364511 -0.754196 -1.157549 -0.063248
3 -0.917173 -0.429209  0.676149 -0.245727
4  0.165568 -1.031282 -1.861940 -0.300801
5  0.436347 -0.352189 -0.438893 -1.687227
6  1.635600  0.792631  0.043945  2.041445
7 -0.856410  0.476403  0.788491  1.003481
8 -0.588387  1.865996 -0.886048 -0.033396
9  0.508678 -2.174582 -0.193041  0.217243
  • Join

类似SQL的合并操作,详情请查看Database style joining,效果就是笛卡尔积。

left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})
print(left)
print('\n',right)

mlr = pd.merge(left, right, on='key')
print('\n',mlr)
   key  lval
0  foo     1
1  foo     2

    key  rval
0  foo     4
1  foo     5

    key  lval  rval
0  foo     1     4
1  foo     1     5
2  foo     2     4
3  foo     2     5
left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
print(left)
print(right)
mlr = pd.merge(left, right, on='key')
print(mlr)
   key  lval
0  foo     1
1  bar     2

   key  rval
0  foo     4
1  bar     5

   key  lval  rval
0  foo     1     4
1  bar     2     5
  • Append

向 DataFrame 增加新的数据行,追加,详情请查看Appending,

df = pd.DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
print(df)

#获取游标有3的数据,添加到末尾 
s = df.iloc[3]
print(df.append(s, ignore_index=True) )
          A         B         C         D
0  0.578197 -0.721444  0.452432 -0.529774
1 -1.213266  0.922351 -0.215287  1.127292
2 -0.158181 -1.489970 -0.728378  0.697814
3 -1.599426 -0.134784 -1.147437 -0.704976
4 -0.195494  0.903841 -0.815240  0.804362
5  2.255849  0.151316 -1.186190  0.871065
6 -0.933151 -0.689410 -0.546619  0.298793
7  0.060045  0.154370  0.877249  0.609006

          A         B         C         D
0  0.578197 -0.721444  0.452432 -0.529774
1 -1.213266  0.922351 -0.215287  1.127292
2 -0.158181 -1.489970 -0.728378  0.697814
3 -1.599426 -0.134784 -1.147437 -0.704976
4 -0.195494  0.903841 -0.815240  0.804362
5  2.255849  0.151316 -1.186190  0.871065
6 -0.933151 -0.689410 -0.546619  0.298793
7  0.060045  0.154370  0.877249  0.609006
8 -1.599426 -0.134784 -1.147437 -0.704976

results matching ""

    No results matching ""