26. Pandas的数据拼接-内外连接
Pandas在做数据拼接的时候提供类似于数据库的内连接、外连接的操作。默认是outer join即外连接,可以使用参数指定连接的类型为内连接inner join。
import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
print "***outer join", "*" * 20
print pd.concat([df1, df2], join = "outer")
print "***inner join", "*" * 20
print pd.concat([df1, df2], join = "inner")
程序的执行结果:
   hello  the  cruel  world
a      0    1      2      3
b      4    5      6      7
c      8    9     10     11
d     12   13     14     15
   hello  the  nice  world
a     20   21    22     23
b     24   25    26     27
d     28   29    30     31
e     32   33    34     35
***outer join ********************
   cruel  hello  nice  the  world
a      2      0   NaN    1      3
b      6      4   NaN    5      7
c     10      8   NaN    9     11
d     14     12   NaN   13     15
a    NaN     20    22   21     23
b    NaN     24    26   25     27
d    NaN     28    30   29     31
e    NaN     32    34   33     35
***inner join ********************
   hello  the  world
a      0    1      3
b      4    5      7
c      8    9     11
d     12   13     15
a     20   21     23
b     24   25     27
d     28   29     31
e     32   33     35
inner join类型的拼接实际是求两个集的交集。