今天,我們將探討如何在 Python 的 Pandas 庫(kù)中創(chuàng)建 GroupBy 對(duì)象以及該對(duì)象的工作原理。我們將詳細(xì)了解分組過(guò)程的每個(gè)步驟,可以將哪些方法應(yīng)用于 GroupBy 對(duì)象上,以及我們可以從中提取哪些有用信息
不要再觀望了,一起學(xué)起來(lái)吧
使用 Groupby 三個(gè)步驟
首先我們要知道,任何 groupby 過(guò)程都涉及以下 3 個(gè)步驟的某種組合:
讓我先來(lái)大致瀏覽下今天用到的測(cè)試數(shù)據(jù)集
import pandas as pdimport numpy as nppd.set_option(‘max_columns’, None)df = pd.read_csv(‘complete.csv’)df = df[[‘awardYear’, ‘category’, ‘prizeAmount’, ‘prizeAmountAdjusted’, ‘name’, ‘gender’, ‘birth_continent’]]df.head()
Output:
awardYearcategoryprizeAmountprizeAmountAdjustednamegenderbirth_continent02001Economic Sciences1000000012295082A. Michael SpencemaleNorth America11975Physics6300003404179Aage N. BohrmaleEurope22004Chemistry1000000011762861Aaron CiechanovermaleAsia31982Chemistry11500003102518Aaron KlugmaleEurope41979Physics8000002988048Abdus SalammaleAsia
將原始對(duì)象拆分為組
在這個(gè)階段,我們調(diào)用 pandas DataFrame.groupby() 函數(shù)。我們使用它根據(jù)預(yù)定義的標(biāo)準(zhǔn)將數(shù)據(jù)分組,沿行(默認(rèn)情況下,axis=0)或列(axis=1)。換句話說(shuō),此函數(shù)將標(biāo)簽映射到組的名稱。
例如,在我們的案例中,我們可以按獎(jiǎng)項(xiàng)類別對(duì)諾貝爾獎(jiǎng)的數(shù)據(jù)進(jìn)行分組:
grouped = df.groupby(‘category’)
也可以使用多個(gè)列來(lái)執(zhí)行數(shù)據(jù)分組,傳遞一個(gè)列列表即可。讓我們首先按獎(jiǎng)項(xiàng)類別對(duì)我們的數(shù)據(jù)進(jìn)行分組,然后在每個(gè)創(chuàng)建的組中,我們將根據(jù)獲獎(jiǎng)年份應(yīng)用額外的分組:
grouped_category_year = df.groupby([‘category’, ‘awardYear’])
現(xiàn)在,如果我們嘗試打印剛剛創(chuàng)建的兩個(gè) GroupBy 對(duì)象之一,我們實(shí)際上將看不到任何組:
print(grouped)
Output:
我們要注意的是,創(chuàng)建 GroupBy 對(duì)象成功與否,只檢查我們是否通過(guò)了正確的映射;在我們顯式地對(duì)該對(duì)象使用某些方法或提取其某些屬性之前,都不會(huì)真正執(zhí)行拆分-應(yīng)用-組合鏈的任何操作
為了簡(jiǎn)要檢查生成的 GroupBy 對(duì)象并檢查組的拆分方式,我們可以從中提取組或索引屬性。它們都返回一個(gè)字典,其中鍵是創(chuàng)建的組,值是原始 DataFrame 中每個(gè)組的實(shí)例的軸標(biāo)簽列表(對(duì)于組屬性)或索引(對(duì)于索引屬性):
grouped.indices
Output:
{‘Chemistry’: array([ 2, 3, 7, 9, 10, 11, 13, 14, 15, 17, 19, 39, 62, 64, 66, 71, 75, 80, 81, 86, 92, 104, 107, 112, 129, 135, 153, 169, 175, 178, 181, 188, 197, 199, 203, 210, 215, 223, 227, 239, 247, 249, 258, 264, 265, 268, 272, 274, 280, 282, 284, 289, 296, 298, 310, 311, 317, 318, 337, 341, 343, 348, 352, 357, 362, 365, 366, 372, 374, 384, 394, 395, 396, 415, 416, 419, 434, 440, 442, 444, 446, 448, 450, 455, 456, 459, 461, 463, 465, 469, 475, 504, 505, 508, 518, 522, 523, 524, 539, 549, 558, 559, 563, 567, 571, 572, 585, 591, 596, 599, 627, 630, 632, 641, 643, 644, 648, 659, 661, 666, 667, 668, 671, 673, 679, 681, 686, 713, 715, 717, 719, 720, 722, 723, 725, 726, 729, 732, 738, 742, 744, 746, 751, 756, 759, 763, 766, 773, 776, 798, 810, 813, 814, 817, 827, 828, 829, 832, 839, 848, 853, 855, 862, 866, 880, 885, 886, 888, 889, 892, 894, 897, 902, 904, 914, 915, 920, 921, 922, 940, 941, 943, 946, 947], dtype=int64), ‘Economic Sciences’: array([ 0, 5, 45, 46, 58, 90, 96, 139, 140, 145, 152, 156, 157, 180, 187, 193, 207, 219, 231, 232, 246, 250, 269, 279, 283, 295, 305, 324, 346, 369, 418, 422, 425, 426, 430, 432, 438, 458, 467, 476, 485, 510, 525, 527, 537, 538, 546, 580, 594, 595, 605, 611, 636, 637, 657, 669, 670, 678, 700, 708, 716, 724, 734, 737, 739, 745, 747, 749, 750, 753, 758, 767, 800, 805, 854, 856, 860, 864, 871, 882, 896, 912, 916, 924], dtype=int64), ‘Literature’: array([ 21, 31, 40, 49, 52, 98, 100, 101, 102, 111, 115, 142, 149, 159, 170, 177, 201, 202, 220, 221, 233, 235, 237, 253, 257, 259, 275, 277, 278, 286, 312, 315, 316, 321, 326, 333, 345, 347, 350, 355, 359, 364, 370, 373, 385, 397, 400, 403, 406, 411, 435, 439, 441, 454, 468, 479, 480, 482, 483, 492, 501, 506, 511, 516, 556, 569, 581, 602, 604, 606, 613, 614, 618, 631, 633, 635, 640, 652, 653, 655, 656, 665, 675, 683, 699, 761, 765, 771, 774, 777, 779, 780, 784, 786, 788, 796, 799, 803, 836, 840, 842, 850, 861, 867, 868, 878, 881, 883, 910, 917, 919, 927, 928, 929, 930, 936], dtype=int64), ‘Peace’: array([ 6, 12, 16, 25, 26, 27, 34, 36, 44, 47, 48, 54, 61, 65, 72, 78, 79, 82, 95, 99, 116, 119, 120, 126, 137, 146, 151, 166, 167, 171, 200, 204, 205, 206, 209, 213, 225, 236, 240, 244, 255, 260, 266, 267, 270, 287, 303, 320, 329, 356, 360, 361, 377, 386, 387, 388, 389, 390, 391, 392, 393, 433, 447, 449, 471, 477, 481, 489, 491, 500, 512, 514, 517, 528, 529, 530, 533, 534, 540, 542, 544, 545, 547, 553, 555, 560, 562, 574, 578, 590, 593, 603, 607, 608, 609, 612, 615, 616, 617, 619, 620, 628, 634, 639, 642, 664, 677, 688, 697, 703, 705, 710, 727, 736, 787, 793, 795, 806, 823, 846, 847, 852, 865, 875, 876, 877, 895, 926, 934, 935, 937, 944, 948, 949], dtype=int64), ‘Physics’: array([ 1, 4, 8, 20, 23, 24, 30, 32, 38, 51, 59, 60, 67, 68, 69, 70, 74, 84, 89, 97, 103, 105, 108, 109, 114, 117, 118, 122, 125, 127, 128, 130, 133, 141, 143, 144, 155, 162, 163, 164, 165, 168, 173, 174, 176, 179, 183, 195, 212, 214, 216, 222, 224, 228, 230, 234, 238, 241, 243, 251, 256, 263, 271, 276, 291, 292, 297, 301, 306, 307, 308, 323, 327, 328, 330, 335, 336, 338, 349, 351, 353, 354, 363, 367, 375, 376, 378, 381, 382, 398, 399, 402, 404, 405, 408, 410, 412, 413, 420, 421, 424, 428, 429, 436, 445, 451, 453, 457, 460, 462, 470, 472, 487, 495, 498, 499, 509, 513, 515, 521, 526, 532, 535, 536, 541, 548, 550, 552, 557, 561, 564, 565, 566, 573, 576, 577, 579, 583, 586, 588, 592, 601, 610, 621, 622, 623, 629, 647, 650, 651, 654, 658, 674, 676, 682, 684, 690, 691, 693, 694, 695, 696, 698, 702, 707, 711, 714, 721, 730, 731, 735, 743, 752, 755, 770, 772, 775, 781, 785, 790, 792, 797, 801, 802, 808, 822, 833, 834, 835, 844, 851, 870, 872, 879, 884, 887, 890, 893, 900, 901, 903, 905, 907, 908, 909, 913, 925, 931, 932, 933, 938, 942, 945], dtype=int64), ‘Physiology or Medicine’: array([ 18, 22, 28, 29, 33, 35, 37, 41, 42, 43, 50, 53, 55, 56, 57, 63, 73, 76, 77, 83, 85, 87, 88, 91, 93, 94, 106, 110, 113, 121, 123, 124, 131, 132, 134, 136, 138, 147, 148, 150, 154, 158, 160, 161, 172, 182, 184, 185, 186, 189, 190, 191, 192, 194, 196, 198, 208, 211, 217, 218, 226, 229, 242, 245, 248, 252, 254, 261, 262, 273, 281, 285, 288, 290, 293, 294, 299, 300, 302, 304, 309, 313, 314, 319, 322, 325, 331, 332, 334, 339, 340, 342, 344, 358, 368, 371, 379, 380, 383, 401, 407, 409, 414, 417, 423, 427, 431, 437, 443, 452, 464, 466, 473, 474, 478, 484, 486, 488, 490, 493, 494, 496, 497, 502, 503, 507, 519, 520, 531, 543, 551, 554, 568, 570, 575, 582, 584, 587, 589, 597, 598, 600, 624, 625, 626, 638, 645, 646, 649, 660, 662, 663, 672, 680, 685, 687, 689, 692, 701, 704, 706, 709, 712, 718, 728, 733, 740, 741, 748, 754, 757, 760, 762, 764, 768, 769, 778, 782, 783, 789, 791, 794, 804, 807, 809, 811, 812, 815, 816, 818, 819, 820, 821, 824, 825, 826, 830, 831, 837, 838, 841, 843, 845, 849, 857, 858, 859, 863, 869, 873, 874, 891, 898, 899, 906, 911, 918, 923, 939], dtype=int64)}
要查找 GroupBy 對(duì)象中的組數(shù),我們可以從中提取 ngroups 屬性或調(diào)用 Python 標(biāo)準(zhǔn)庫(kù)的 len 函數(shù):
print(grouped.ngroups)print(len(grouped))
Output:
66
如果我們需要可視化每個(gè)組的所有或部分條目,那么可以遍歷 GroupBy 對(duì)象:
for name, entries in grouped: print(f’First 2 entries for the “{name}” category:’) print(30*’-‘) print(entries.head(2), ”)
Output:
First 2 entries for the “Chemistry” category:—————————— awardYear category prizeAmount prizeAmountAdjusted name 2 2004 Chemistry 10000000 11762861 Aaron Ciechanover 3 1982 Chemistry 1150000 3102518 Aaron Klug gender birth_continent 2 male Asia 3 male Europe First 2 entries for the “Economic Sciences” category:—————————— awardYear category prizeAmount prizeAmountAdjusted 0 2001 Economic Sciences 10000000 12295082 5 2019 Economic Sciences 9000000 9000000 name gender birth_continent 0 A. Michael Spence male North America 5 Abhijit Banerjee male Asia First 2 entries for the “Literature” category:—————————— awardYear category prizeAmount prizeAmountAdjusted 21 1957 Literature 208629 2697789 31 1970 Literature 400000 3177966 name gender birth_continent 21 Albert Camus male Africa 31 Alexandr Solzhenitsyn male Europe First 2 entries for the “Peace” category:—————————— awardYear category prizeAmount prizeAmountAdjusted 6 2019 Peace 9000000 9000000 12 1980 Peace 880000 2889667 name gender birth_continent 6 Abiy Ahmed Ali male Africa 12 Adolfo Pérez Esquivel male South America First 2 entries for the “Physics” category:—————————— awardYear category prizeAmount prizeAmountAdjusted name gender 1 1975 Physics 630000 3404179 Aage N. Bohr male 4 1979 Physics 800000 2988048 Abdus Salam male birth_continent 1 Europe 4 Asia First 2 entries for the “Physiology or Medicine” category:—————————— awardYear category prizeAmount prizeAmountAdjusted 18 1963 Physiology or Medicine 265000 2839286 22 1974 Physiology or Medicine 550000 3263449 name gender birth_continent 18 Alan Hodgkin male Europe 22 Albert Claude male Europe
相反,如果我們想以 DataFrame 的形式選擇單個(gè)組,我們應(yīng)該在 GroupBy 對(duì)象上使用 get_group() 方法:
grouped.get_group(‘Economic Sciences’)
Output:
awardYearcategoryprizeAmountprizeAmountAdjustednamegenderbirth_continent02001Economic Sciences1000000012295082A. Michael SpencemaleNorth America52019Economic Sciences90000009000000Abhijit BanerjeemaleAsia452012Economic Sciences80000008361204Alvin E. RothmaleNorth America461998Economic Sciences76000009713701Amartya SenmaleAsia582015Economic Sciences80000008384572Angus DeatonmaleEurope……………………8822002Economic Sciences1000000012034660Vernon L. SmithmaleNorth America8961973Economic Sciences5100003331882Wassily LeontiefmaleEurope9122018Economic Sciences90000009000000William D. NordhausmaleNorth America9161990Economic Sciences40000006329114William F. SharpemaleNorth America9241996Economic Sciences74000009490424William VickreymaleNorth America
按組應(yīng)用函數(shù)
在拆分原始數(shù)據(jù)并檢查結(jié)果組之后,我們可以對(duì)每個(gè)組執(zhí)行以下操作之一或其組合:
- Aggregation(聚合):計(jì)算每個(gè)組的匯總統(tǒng)計(jì)量(例如,組大小、平均值、中位數(shù)或總和)并為許多數(shù)據(jù)點(diǎn)輸出單個(gè)數(shù)字
- Transformation(變換):按組進(jìn)行一些操作,例如計(jì)算每個(gè)組的z-score
- Filtration(過(guò)濾):根據(jù)預(yù)定義的條件拒絕某些組,例如組大小、平均值、中位數(shù)或總和,還可以包括從每個(gè)組中過(guò)濾掉特定的行
Aggregation
要聚合 GroupBy 對(duì)象的數(shù)據(jù)(即按組計(jì)算匯總統(tǒng)計(jì)量),我們可以在對(duì)象上使用 agg() 方法:
# Showing only 1 decimal for all float numberspd.options.display.float_format = ‘{:.1f}’.formatgrouped.agg(np.mean)
Output:
awardYearprizeAmountprizeAmountAdjustedcategoryChemistry1972.33629279.46257868.1Economic Sciences1996.16105845.27837779.2Literature1960.92493811.25598256.3Peace1964.53124879.26163906.9Physics1971.13407938.66086978.2Physiology or Medicine1970.43072972.95738300.7
上面的代碼生成一個(gè) DataFrame,其中組名作為其新索引,每個(gè)數(shù)字列的平均值作為分組
我們可以直接在 GroupBy 對(duì)象上應(yīng)用其他相應(yīng)的 Pandas 方法,而不僅僅是使用 agg() 方法。最常用的方法是 mean()、median()、mode()、sum()、size()、count()、min()、max()、std()、var()(計(jì)算每個(gè)的方差 group)、describe()(按組輸出描述性統(tǒng)計(jì)信息)和 nunique()(給出每個(gè)組中唯一值的數(shù)量)
grouped.sum()
Output:
awardYearprizeAmountprizeAmountAdjustedcategoryChemistry3629126677874181151447726Economic Sciences167674512891000658373449Literature227468289282102649397731Peace263248418733807825963521Physics4198377258909281296526352Physiology or Medicine4315086729810661256687857
通常情況下我們只對(duì)某些特定列或列的統(tǒng)計(jì)信息感興趣,因此我們需要指定它們。在上面的例子中,我們絕對(duì)不想總結(jié)所有年份,相應(yīng)的我們可能希望按獎(jiǎng)品類別對(duì)獎(jiǎng)品價(jià)值求和。為此我們可以選擇 GroupBy 對(duì)象的 PrizeAmountAdjusted 列,就像我們選擇 DataFrame 的列,然后對(duì)其應(yīng)用 sum() 函數(shù):
grouped[‘prizeAmountAdjusted’].sum()
Output:
categoryChemistry 1151447726Economic Sciences 658373449Literature 649397731Peace 825963521Physics 1296526352Physiology or Medicine 1256687857Name: prizeAmountAdjusted, dtype: int64
對(duì)于上面的代碼片段,我們可以在選擇必要的列之前使用對(duì) GroupBy 對(duì)象應(yīng)用函數(shù)的等效語(yǔ)法:grouped.sum()[‘prizeAmountAdjusted’]。但是前面的語(yǔ)法更可取,因?yàn)樗男阅芨?,尤其是在大型?shù)據(jù)集上,效果更為明顯
如果我們需要聚合兩列或更多列的數(shù)據(jù),我們使用雙方括號(hào):
grouped[[‘prizeAmount’, ‘prizeAmountAdjusted’]].sum()
Output:
prizeAmountprizeAmountAdjustedcategoryChemistry6677874181151447726Economic Sciences512891000658373449Literature289282102649397731Peace418733807825963521Physics7258909281296526352Physiology or Medicine6729810661256687857
可以一次將多個(gè)函數(shù)應(yīng)用于 GroupBy 對(duì)象的一列或多列。為此我們?cè)俅涡枰?agg() 方法和感興趣的函數(shù)列表:
grouped[[‘prizeAmount’, ‘prizeAmountAdjusted’]].agg([np.sum, np.mean, np.std])
Output:
prizeAmountprizeAmountAdjustedsummeanstdsummeanstdcategoryChemistry6677874183629279.44070588.411514477266257868.13276027.2Economic Sciences5128910006105845.23787630.16583734497837779.23313153.2Literature2892821022493811.23653734.06493977315598256.33029512.1Peace4187338073124879.23934390.98259635216163906.93189886.1Physics7258909283407938.64013073.012965263526086978.23294268.5Physiology or Medicine6729810663072972.93898539.312566878575738300.73241781.0
此外,我們可以考慮通過(guò)傳遞字典將不同的聚合函數(shù)應(yīng)用于 GroupBy 對(duì)象的不同列:
grouped.agg({‘prizeAmount’: [np.sum, np.size], ‘prizeAmountAdjusted’: np.mean})
Output:
prizeAmountprizeAmountAdjustedsumsizemeancategoryChemistry6677874181846257868.1Economic Sciences512891000847837779.2Literature2892821021165598256.3Peace4187338071346163906.9Physics7258909282136086978.2Physiology or Medicine6729810662195738300.7
Transformation
與聚合方法不同,轉(zhuǎn)換方法返回一個(gè)新的 DataFrame,其形狀和索引與原始 DataFrame 相同,但具有轉(zhuǎn)換后的各個(gè)值。這里需要注意的是,transformation 一定不能修改原始 DataFrame 中的任何值,也就是這些操作不能原地執(zhí)行
轉(zhuǎn)換 GroupBy 對(duì)象數(shù)據(jù)的最常見(jiàn)的 Pandas 方法是 transform()。例如它可以幫助計(jì)算每個(gè)組的 z-score:
grouped[[‘prizeAmount’, ‘prizeAmountAdjusted’]].transform(lambda x: (x – x.mean()) / x.std())
Output:
prizeAmountprizeAmountAdjusted01.01.31-0.7-0.821.61.73-0.6-1.04-0.6-0.9………945-0.7-0.8946-0.8-1.1947-0.90.3948-0.5-1.0949-0.7-1.0
使用轉(zhuǎn)換方法,我們還可以用組均值、中位數(shù)、眾數(shù)或任何其他值替換缺失數(shù)據(jù):
grouped[‘gender’].transform(lambda x: x.fillna(x.mode()[0]))
Output:
0 male1 male2 male3 male4 male … 945 male946 male947 female948 male949 maleName: gender, Length: 950, dtype: object
我們當(dāng)然還可以使用其他一些 Pandas 方法來(lái)轉(zhuǎn)換 GroupBy 對(duì)象的數(shù)據(jù):bfill()、ffill()、diff()、pct_change()、rank()、shift()、quantile()等
Filtration
過(guò)濾方法根據(jù)預(yù)定義的條件從每個(gè)組中丟棄組或特定行,并返回原始數(shù)據(jù)的子集。例如我們可能希望只保留所有組中某個(gè)列的值,其中該列的組均值大于預(yù)定義值。在我們的 DataFrame 的情況下,讓我們過(guò)濾掉所有組均值小于 7,000,000 的prizeAmountAdjusted 列,并在輸出中僅保留該列:
grouped[‘prizeAmountAdjusted’].filter(lambda x: x.mean() > 7000000)
Output:
0 122950825 900000045 836120446 971370158 8384572 … 882 12034660896 3331882912 9000000916 6329114924 9490424Name: prizeAmountAdjusted, Length: 84, dtype: int64
另一個(gè)例子是過(guò)濾掉具有超過(guò)一定數(shù)量元素的組:
grouped[‘prizeAmountAdjusted’].filter(lambda x: len(x) < 100)
Output:
0 122950825 900000045 836120446 971370158 8384572 … 882 12034660896 3331882912 9000000916 6329114924 9490424Name: prizeAmountAdjusted, Length: 84, dtype: int64
在上述兩個(gè)操作中,我們使用了 filter() 方法,將 lambda 函數(shù)作為參數(shù)傳遞。這樣的函數(shù),應(yīng)用于整個(gè)組,根據(jù)該組與預(yù)定義統(tǒng)計(jì)條件的比較結(jié)果返回 True 或 False。換句話說(shuō),filter()方法中的函數(shù)決定了哪些組保留在新的 DataFrame 中
除了過(guò)濾掉整個(gè)組之外,還可以從每個(gè)組中丟棄某些行。這里有一些有用的方法是 first()、last() 和 nth()。將其中一個(gè)應(yīng)用于 GroupBy 對(duì)象會(huì)相應(yīng)地返回每個(gè)組的第一個(gè)/最后一個(gè)/第 n 個(gè)條目:
grouped.last()
Output:
awardYearprizeAmountprizeAmountAdjustednamegenderbirth_continentcategoryChemistry19111406957327865Marie CuriefemaleEuropeEconomic Sciences199674000009490424William VickreymaleNorth AmericaLiterature19683500003052326Yasunari KawabatamaleAsiaPeace19632650002839286International Committee of the Red CrossmaleAsiaPhysics19724800003345725John BardeenmaleNorth AmericaPhysiology or Medicine201680000008301051Yoshinori OhsumimaleAsia
對(duì)于 nth() 方法,我們必須傳遞表示要為每個(gè)組返回的條目索引的整數(shù):
grouped.nth(1)
Output:
awardYearprizeAmountprizeAmountAdjustednamegenderbirth_continentcategoryChemistry198211500003102518Aaron KlugmaleEuropeEconomic Sciences201990000009000000Abhijit BanerjeemaleAsiaLiterature19704000003177966Alexandr SolzhenitsynmaleEuropePeace19808800002889667Adolfo Pérez EsquivelmaleSouth AmericaPhysics19798000002988048Abdus SalammaleAsiaPhysiology or Medicine19745500003263449Albert ClaudemaleEurope
上面的代碼收集了所有組的第二個(gè)條目
另外兩個(gè)過(guò)濾每個(gè)組中的行的方法是 head() 和 tail(),分別返回每個(gè)組的第一/最后 n 行(默認(rèn)為 5):
grouped.head(3)
Output:
awardYearcategoryprizeAmountprizeAmountAdjustednamegenderbirth_continent02001Economic Sciences1000000012295082A. Michael SpencemaleNorth America11975Physics6300003404179Aage N. BohrmaleEurope22004Chemistry1000000011762861Aaron CiechanovermaleAsia31982Chemistry11500003102518Aaron KlugmaleEurope41979Physics8000002988048Abdus SalammaleAsia52019Economic Sciences90000009000000Abhijit BanerjeemaleAsia62019Peace90000009000000Abiy Ahmed AlimaleAfrica72009Chemistry1000000010958504Ada E. YonathfemaleAsia82011Physics1000000010545557Adam G. RiessmaleNorth America121980Peace8800002889667Adolfo Pérez EsquivelmaleSouth America162007Peace1000000011301989Al GoremaleNorth America181963Physiology or Medicine2650002839286Alan HodgkinmaleEurope211957Literature2086292697789Albert CamusmaleAfrica221974Physiology or Medicine5500003263449Albert ClaudemaleEurope281937Physiology or Medicine1584634716161Albert Szent-Gy?rgyimaleEurope311970Literature4000003177966Alexandr SolzhenitsynmaleEurope402013Literature80000008365867Alice MunrofemaleNorth America452012Economic Sciences80000008361204Alvin E. RothmaleNorth America
整合結(jié)果
split-apply-combine 鏈的最后一個(gè)階段——合并結(jié)果——由Ppandas 在后臺(tái)執(zhí)行。它包括獲取在 GroupBy 對(duì)象上執(zhí)行的所有操作的輸出并將它們重新組合在一起,生成新的數(shù)據(jù)結(jié)構(gòu),例如 Series 或 DataFrame。將此數(shù)據(jù)結(jié)構(gòu)分配給一個(gè)變量,我們可以用它來(lái)解決其他任務(wù)
總結(jié)
今天我們介紹了使用 pandas groupby 函數(shù)和使用結(jié)果對(duì)象的許多知識(shí)
- 分組過(guò)程所包括的步驟
- split-apply-combine 鏈?zhǔn)侨绾我徊揭徊焦ぷ鞯?/li>
- 如何創(chuàng)建 GroupBy 對(duì)象
- 如何簡(jiǎn)要檢查 GroupBy 對(duì)象
- GroupBy 對(duì)象的屬性
- 可應(yīng)用于 GroupBy 對(duì)象的操作
- 如何按組計(jì)算匯總統(tǒng)計(jì)量以及可用于此目的的方法
- 如何一次將多個(gè)函數(shù)應(yīng)用于 GroupBy 對(duì)象的一列或多列
- 如何將不同的聚合函數(shù)應(yīng)用于 GroupBy 對(duì)象的不同列
- 如何以及為什么要轉(zhuǎn)換原始 DataFrame 中的值
- 如何過(guò)濾 GroupBy 對(duì)象的組或每個(gè)組的特定行
- Pandas 如何組合分組過(guò)程的結(jié)果
- 分組過(guò)程產(chǎn)生的數(shù)據(jù)結(jié)構(gòu)