26. Pandas的数据拼接-内外连接
Pandas在做数据拼接的时候提供类似于数据库的内连接、外连接的操作。默认是outer join即外连接,可以使用参数指定连接的类型为内连接inner join。
import pandas as pd
import numpy as np
col1 = "hello the cruel world".split()
col2 = "hello the nice world".split()
idx1 = ["a", "b", "c", "d"]
idx2 = ["a", "b", "d", "e"]
val1 = np.arange(16).reshape(4, 4)
val2 = np.arange(20, 36).reshape(4, 4)
df1 = pd.DataFrame(val1, index = idx1, columns = col1)
print df1
df2 = pd.DataFrame(val2, index = idx2, columns = col2)
print df2
print "***outer join", "*" * 20
print pd.concat([df1, df2], join = "outer")
print "***inner join", "*" * 20
print pd.concat([df1, df2], join = "inner")
程序的执行结果:
hello the cruel world
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
d 12 13 14 15
hello the nice world
a 20 21 22 23
b 24 25 26 27
d 28 29 30 31
e 32 33 34 35
***outer join ********************
cruel hello nice the world
a 2 0 NaN 1 3
b 6 4 NaN 5 7
c 10 8 NaN 9 11
d 14 12 NaN 13 15
a NaN 20 22 21 23
b NaN 24 26 25 27
d NaN 28 30 29 31
e NaN 32 34 33 35
***inner join ********************
hello the world
a 0 1 3
b 4 5 7
c 8 9 11
d 12 13 15
a 20 21 23
b 24 25 27
d 28 29 31
e 32 33 35
inner join类型的拼接实际是求两个集的交集。