Pandas之缺失值


1、pandas 用np.nan表示缺失值。

2、缺失值默认不会包含在计算中。

reindexing 允许你改变某个轴的 index(以下代码制造一个示例用的 DataFrame)

import numpy as np
import pandas as pd
dates = pd.date_range('20130101', periods=6)
print(dates)

df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print('\n',df)

df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E','F'])
df1.loc[dates[1]:dates[4],'E'] = 1
df1.loc[dates[0]:dates[1],'F'] = 1
df1
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

                    A         B         C         D
2013-01-01 -0.140132 -1.197951  0.197453  0.269451
2013-01-02  0.198453 -0.114003  1.267294  0.105047
2013-01-03 -0.964521 -1.637295  0.443044 -1.146653
2013-01-04  0.840861 -2.151081 -0.063940 -0.886323
2013-01-05  0.514730 -0.610716 -0.719966 -0.863810
2013-01-06 -1.675452  2.274091  0.055401  0.679852
                   A         B         C  D   E   F
2013-01-01  0.000000  0.000000 -1.509059  5 NaN   1
2013-01-02  1.212112 -0.173215  0.119209  5   1   1
2013-01-03 -0.861849 -2.104569 -0.494929  5   2 NaN
2013-01-04  0.721555 -0.706771 -1.039575  5   3 NaN

丢弃有 NaN 的行

df1.dropna()
                   A         B         C  D  F  E
2013-01-02  1.212112 -0.173215  0.119209  5  1  1

填充缺失值

df1.fillna(value=5)
                   A         B         C  D  F  E
2013-01-01  0.000000  0.000000 -1.509059  5  5  1
2013-01-02  1.212112 -0.173215  0.119209  5  1  1
2013-01-03 -0.861849 -2.104569 -0.494929  5  2  5
2013-01-04  0.721555 -0.706771 -1.039575  5  3  5

获取布尔值的 mask:哪些值是 NaN

pd.isnull(df1)
             A     B     C     D     F    E
2013-01-01 False False False False True False
2013-01-02 False False False False False False
2013-01-03 False False False False False True
2013-01-04 False False False False False True

results matching ""

    No results matching ""