Pandas基本介绍
Pandas的官方网站:http://pandas.pydata.org
Python Data Analysis Library
_pandas_is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for thePythonprogramming language.
_pandas_is aNumFOCUSsponsored project. This will help ensure the success of development of_pandas_as a world-class open-source project, and makes it possible todonateto the project.
Best way to Install
The best way to get pandas is viaconda
condainstallpandas
Packages are available forall supported python versionson Windows, Linux, and MacOS.
Wheels are also uploaded toPyPIand can be installed with
pip install pandas
What problem does_pandas_solve?
Python has long been great for data munging and preparation, but less so for data analysis and modeling._pandas_helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.
Combined with the excellentIPythontoolkit and other libraries, the environment for doing data analysis in Python excels in performance, productivity, and the ability to collaborate.
_pandas_does not implement significant modeling functionality outside of linear and panel regression; for this, look tostatsmodelsandscikit-learn. More work is still needed to make Python a first class statistical modeling environment, but we are well on our way toward that goal.
开发者文档
Release
0.22.0 - December 2017
Development
0.23.0 - 2018
Pandas的组要作用
【下面这段话引自文章:http://codingpy.com/article/a-quick-intro-to-pandas/】
Pandas是我最喜爱的库之一。通过带有标签的列和索引,Pandas使我们可以以一种所有人都能理解的方式来处理数据。它可以让我们毫不费力地从诸如
csv类型的文件中导入数据。我们可以用它快速地对数据进行复杂的转换和过滤等操作。Pandas真是超级棒。
我觉得它和Numpy、Matplotlib一起构成了一个 Python 数据探索和分析的强大基础。Scipy(将会在下一篇推文里介绍)当然也是一大主力并且是
一个绝对赞的库,但是我觉得前三者才是 Python 科学计算真正的顶梁柱。
简而言之,使用Pandas最多的地方是从多种数据格式的文件中读取数据,然后进行处理,学习过数据库的知道,大部分比较SQL语法或者现在比较火的MongoDb都提供了一些数据的连接,过滤,转换和聚合,这些功能我们强大的Pandas竟然也提供了(逆天了),可以非常方便的帮助我们处理复杂的数据的运算。
内容概要
pandas是一个强大的工具,本文接下来会分按照下述主题来介绍Pandas,但不一定代表严格按照这种顺序来。
1)数据结构:Series和DataFrame介绍和构建
2)Series和DataFrame的索引、切片、过滤,算术运算与数据对齐,函数映射,排序等
3)Series和DataFrame的汇总和计算统计描述
4)Series和DataFrame的层次化索引
本篇博客主要讲述Series和DataFrame的基本功能,包括索引、切片、过滤,算术运算与数据对齐,函数映射,排序等,内容安排如下所示:
1)重新索引
2)丢弃指定轴上的项
3)索引、选取和过滤
4)算术运算和数据对齐
5)函数应用和映射
6)排序
7)带有重复值的轴索引