Pandas之缺失值
1、pandas 用
np.nan
表示缺失值。2、缺失值默认不会包含在计算中。
reindexing 允许你改变某个轴的 index(以下代码制造一个示例用的 DataFrame)
import numpy as np
import pandas as pd
dates = pd.date_range('20130101', periods=6)
print(dates)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print('\n',df)
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E','F'])
df1.loc[dates[1]:dates[4],'E'] = 1
df1.loc[dates[0]:dates[1],'F'] = 1
df1
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06'],
dtype='datetime64[ns]', freq='D')
A B C D
2013-01-01 -0.140132 -1.197951 0.197453 0.269451
2013-01-02 0.198453 -0.114003 1.267294 0.105047
2013-01-03 -0.964521 -1.637295 0.443044 -1.146653
2013-01-04 0.840861 -2.151081 -0.063940 -0.886323
2013-01-05 0.514730 -0.610716 -0.719966 -0.863810
2013-01-06 -1.675452 2.274091 0.055401 0.679852
A B C D E F
2013-01-01 0.000000 0.000000 -1.509059 5 NaN 1
2013-01-02 1.212112 -0.173215 0.119209 5 1 1
2013-01-03 -0.861849 -2.104569 -0.494929 5 2 NaN
2013-01-04 0.721555 -0.706771 -1.039575 5 3 NaN
丢弃有 NaN 的行
df1.dropna()
A B C D F E
2013-01-02 1.212112 -0.173215 0.119209 5 1 1
填充缺失值
df1.fillna(value=5)
A B C D F E
2013-01-01 0.000000 0.000000 -1.509059 5 5 1
2013-01-02 1.212112 -0.173215 0.119209 5 1 1
2013-01-03 -0.861849 -2.104569 -0.494929 5 2 5
2013-01-04 0.721555 -0.706771 -1.039575 5 3 5
获取布尔值的 mask:哪些值是 NaN
pd.isnull(df1)
A B C D F E
2013-01-01 False False False False True False
2013-01-02 False False False False False False
2013-01-03 False False False False False True
2013-01-04 False False False False False True