Pandas之读取、写入数据

官方网页：http://pandas.pydata.org/pandas-docs/stable/io.html#io-store-in-csv

pandas I/O

pandas处理文件的API，也成为pandas I/O，能处理了CSV、JSON、HTML、PDF等常见格式。

The pandas I/O API is a set of top levelreaderfunctions accessed likepd.read_csv()that generally return apandasobject. The correspondingwriterfunctions are object methods that are accessed likedf.to_csv()

Format Type	Data Description	Reader	Writer
text	CSV	read_csv	to_csv
text	JSON	read_json	to_json
text	HTML	read_html	to_html
text	Local clipboard	read_clipboard	to_clipboard
binary	MS Excel	read_excel	to_excel
binary	HDF5 Format	read_hdf	to_hdf
binary	Feather Format	read_feather	to_feather
binary	Parquet Format	read_parquet	to_parquet
binary	Msgpack	read_msgpack	to_msgpack
binary	Stata	read_stata	to_stata
binary	SAS	read_sas
binary	Python Pickle Format	read_pickle	to_pickle
SQL	SQL	read_sql	to_sql
SQL	Google Big Query	read_gbq	to_gbq

Hereis an informal performance comparison for some of these IO methods.

Note

For examples that use theStringIOclass, make sure you import it according to your Python version, i.e.fromStringIOimportStringIOfor Python 2 andfromioimportStringIOfor Python 3.

pandas I/O性能

Pandas提供了IO工具可以将大文件分块读取，测试了一下性能，完整加载9800万条数据也只需要263秒左右，还是相当不错了。

文章参考：http://python.jobbole.com/84118/

Pandas读取csv文件

CSV（Comma-Separated Values）格式的文件是指以纯文本形式存储的表格数据，这意味着不能简单的使用Excel表格工具进行处理，而且Excel表格处理的数据量十分有限，而使用Pandas来处理数据量巨大的CSV文件就容易的多了。

Pandas读取本地CSV文件并设置Dataframe(数据格式)

简单课程，可以参考：http://blog.csdn.net/sinat_29957455/article/details/79054126

import pandas as pd
import numpy as np
#filename可以直接从盘符开始，标明每一级的文件夹直到csv文件，header=None表示头部为空，sep=' '表示数据间使用空格作为分隔符，
#如果分隔符是逗号，只需换成 ‘，’即可。
df=pd.read_csv('filename',header=None,sep=' ') 

print df.head()
print df.tail()
#作为示例，输出CSV文件的前5行和最后5行，这是pandas默认的输出5行，可以根据需要自己设定输出几行的值

注：

文件处理，后续会根据实际情况，推出详细的课程，专门讲述python在处理大数据库的应用，以及python与spark结合处理大文件的情况。

11-Pandas之读取、写入数据

Pandas之读取、写入数据

pandas I/O

pandas I/O性能

Pandas读取csv文件

注：

results matching ""

No results matching ""