pandas常见函数详细使用

佚名 7年前 (2019-04-09) 随笔 1095人围观抢沙发百度已收录

groupby函数

pandas提供了一个灵活高效的groupby功能，它使你能以一种自然的方式对数据集进行切片、切块、摘要等操作，根据一个或多个键（可以是函数、数组、Series或DataFrame列名）拆分pandas对象，继而计算分组摘要统计，如计数、平均值、标准差，或用户自定义函数。

SRE实战互联网时代守护先锋，助力企业售后服务体系运筹帷幄！一键直达领取阿里云限量特价优惠。

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}

1、根据series进行分组

按照Team进行分组，并计算Points列的平均值：我们可以先访问Points，并根据Team调用groupby:

grouped = df['Points'].groupby(df['Team'])
#等价于df['Points'].groupby(df.Team) 以及 df['Points'].groupby(df.Team.values)
print(grouped.groups)
grouped.mean()

输出：

{'Devils': Int64Index([2, 3], dtype='int64'), 'Kings': Int64Index([4, 6, 7], dtype='int64'), 'Riders': Int64Index([0, 1, 8, 11], dtype='int64'), 'Royals': Int64Index([9, 10], dtype='int64'), 'kings': Int64Index([5], dtype='int64')}

Team
Devils    768.000000
Kings     761.666667
Riders    762.250000
Royals    752.500000
kings     812.000000
Name: Points, dtype: float64

View Code

说明：数据（Series）根据分组键进行了聚合，产生了一个新的Series，其索引为Team列中的唯一值。

2、根据数组进行分组,实际上分组键可以是任何长度适当的数组（长度得等于行数），不一定得是series

grouped = df['Points'].groupby([df['Team'], df['Year']])
print(grouped.groups)

age = np.array([2, 3, 3, 4, 5, 6, 7, 8, 8, 8, 8, 13])
grouped = df['Points'].groupby(age)
print(grouped.groups)

输出：

{('Devils', 2014): Int64Index([2], dtype='int64'), ('Devils', 2015): Int64Index([3], dtype='int64'), ('Kings', 2014): Int64Index([4], dtype='int64'), ('Kings', 2016): Int64Index([6], dtype='int64'), ('Kings', 2017): Int64Index([7], dtype='int64'), ('Riders', 2014): Int64Index([0], dtype='int64'), ('Riders', 2015): Int64Index([1], dtype='int64'), ('Riders', 2016): Int64Index([8], dtype='int64'), ('Riders', 2017): Int64Index([11], dtype='int64'), ('Royals', 2014): Int64Index([9], dtype='int64'), ('Royals', 2015): Int64Index([10], dtype='int64'), ('kings', 2015): Int64Index([5], dtype='int64')}

{2: Int64Index([0], dtype='int64'), 3: Int64Index([1, 2], dtype='int64'), 4: Int64Index([3], dtype='int64'), 5: Int64Index([4], dtype='int64'), 6: Int64Index([5], dtype='int64'), 7: Int64Index([6], dtype='int64'), 8: Int64Index([7, 8, 9, 10], dtype='int64'), 13: Int64Index([11], dtype='int64')}

View Code

3、将列名作为分组键

grouped = df.groupby('Team')
print(grouped.groups)
print(grouped.mean())

输出：

{'Devils': Int64Index([2, 3], dtype='int64'), 'Kings': Int64Index([4, 6, 7], dtype='int64'), 'Riders': Int64Index([0, 1, 8, 11], dtype='int64'), 'Royals': Int64Index([9, 10], dtype='int64'), 'kings': Int64Index([5], dtype='int64')}

            Rank         Year      Points
Team                                     
Devils  2.500000  2014.500000  768.000000
Kings   1.666667  2015.666667  761.666667
Riders  1.750000  2015.500000  762.250000
Royals  2.500000  2014.500000  752.500000
kings   4.000000  2015.000000  812.000000

View Code

1、对groupby结果进行迭代

grouped = df.groupby(['Team', 'Year'])
for (team, year), v in grouped:
    print (("{0}--{1}").format(team, year))
    print(v)

2、选取一个或多个列

对于由DataFrame产生的GroupBy对象，如果用一个（单个字符串）或一组（字符串数组）列名对其进行索引，就能实现选取部分列进行聚合的目的

# 下面两行等价
print(df.groupby('Team')['Points'].groups)
print(df['Points'].groupby(df['Team']).groups)

扫码关注我们

微信号：SRE实战

拒绝背锅运筹帷幄

赞 0 赏分享

转载请注明 : 文章转载自小翔博客 pandas常见函数详细使用

本文标题：pandas常见函数详细使用

本文链接：https://www.liuyixiang.com/post/26918.html

上一篇 : Java 策略模式

下一篇 : Random类与Random方法

评论列表暂无评论

发表评论

pandas常见函数详细使用

选择打赏方式：

选择分享方式：

Petter

101481

12

121484300

« 2024年2月 »
一	二	三	四	五	六	日
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29

pandas常见函数详细使用

选择打赏方式：

选择分享方式：

Petter

101481

12

121484300

User Login

帐号或密码错误,请重试.