Steven's Blog

A Dream Land of Peace!

Pandas使用tips集锦

1. 替换字符

1
auto['horsepower'] = auto['horsepower'].replace('?',np.nan)

2. 删除空值

1
auto = auto.dropna()

3. 改变字段类型

1
auto['horsepower'] = auto['horsepower'].astype('int')

4. 按照某一个/多个字段排序

1
2
3
auto = auto.sort_values(by = ['horsepower'],ascending = True, axis = 0)

boston.sort_values(by= ['CRIM','TAX','PTRATIO'],ascending=False).head().index

5. pandas设置索引

1
college = college.set_index(['Unnamed: 0'], append=True, verify_integrity=True) college.rename_axis([None, 'Name'], inplace=True)

6. 对某一列进行判断并且置值

1
2
3
4
college['Elite'] = np.where(college['Top10perc'] > 50,'Yes','No')

# 使用map的方式
credit['Student2'] = credit.Student.map({'No':0, 'Yes':1})

7. 对某一列的值进行计数

1
college['Elite'].value_counts()

8. 对某一列的值进行筛选

1
elite_colleges = college[college['Elite'] == 'Yes']

9. 对某一列的数值进行分桶操作

1
college['Enroll'] = pd.cut(college['Enroll'], bins=3, labels = ['Low','Medium','High'])

10. 查看df中的各个字段的unique的数目和字段类型信息等

1
2
3
auto.nunique()
auto.info()
auto['horsepower'].unique() # 看某一列的unique数目

11. 筛选单列和多列

1
2
info['range'] = info['max'] - info['min']
info = info[['mean','range','std']]

12. 根据index筛选和删除多列

1
info = auto.drop(auto.index[10:85]).describe().T

13. load数据并且指定列名

1
2
3
boston = pd.DataFrame(load_boston().data,columns = load_boston().feature_names )

data = pd.DataFrame(boston.data,columns = boston['feature_names'])

14. 通过iloc选取行和列

1
2
3
4
5
corr_matrix = boston.corr()
corr_matrix.iloc[1:,0].sort_values()

# 选取所有行,和从第二列开始的所有列
data = data.iloc[:,1:]

15. 按照行/列进行df的拼接

1
2
# 按行拼接
features = pd.concat([constant,features],axis = 1)

16. load数据并且选定列

1
2
credit = pd.read_csv('Data/Credit.csv', usecols=list(range(1,12)))
advertising = pd.read_csv('Data/Advertising.csv', usecols=[1,2,3,4])

17. 读取数据的时候,就设置空值的表示方法

1
auto = pd.read_csv('Data/Auto.csv', na_values='?').dropna()

18. 构造pandas的df

1
2
3
X = np.random.normal(size = 100)
y = np.random.permutation(X)
data = pd.DataFrame({'X':X,'y':y})

19. 读取文本文件的时候,设置某列为索引列

1
data = pd.read_csv('../data/Smarket.csv',index_col=0)

20. 原地替换

1
df_x.replace(to_replace={0:'No', 1:'Yes', 'True':'Yes', 'False':'No'}, inplace=True)

参考链接