在线不卡日本ⅴ一区v二区_精品一区二区中文字幕_天堂v在线视频_亚洲五月天婷婷中文网站

<menu id="lky3g"></menu>

<pre id="lky3g"><tt id="lky3g"></tt></pre>

<label id="qilu0"><th id="qilu0"></th></label>

<dfn id="qilu0"><source id="qilu0"></source></dfn>

口袋妖怪XY下載安裝(口袋妖怪xy下載)

用戶投稿 ? 2023年8月19日 15:03 ? 游戲

我相信這么優(yōu)秀的你

已經(jīng) 置頂了我

　　轉(zhuǎn)自：36大數(shù)據(jù)（dashuju36）

　　前言

　　從智聯(lián)招聘爬取相關(guān)信息后，我們關(guān)心的是如何對(duì)內(nèi)容進(jìn)行分析，獲取用用的信息。

　　分析關(guān)鍵詞為“python”的爬取數(shù)據(jù)的情況，獲取包括全國(guó)python招聘數(shù)量Top10的城市列表以及其他相關(guān)信息。

　　一、主要分析步驟

數(shù)據(jù)讀取

數(shù)據(jù)整理

對(duì)職位數(shù)量在全國(guó)主要城市的分布情況進(jìn)行分析

對(duì)全國(guó)范圍內(nèi)的職位月薪情況進(jìn)行分析

對(duì)該職位招聘崗位要求描述進(jìn)行詞云圖分析，獲取頻率最高的關(guān)鍵字

選取兩個(gè)城市，分別分析月薪分布情況以及招聘要求的詞云圖分析

　　二、具體分析過(guò)程

　　import pymongoimport pandas as pdimport matplotlib.pyplot as pltimport numpy as np% matplotlib inlineplt.style.use(‘ggplot’) # 解決matplotlib顯示中文問(wèn)題plt.rcParams[‘font.sans-serif’] = [‘SimHei’] # 指定默認(rèn)字體plt.rcParams[‘axes.unicode_minus’] = False # 解決保存圖像是負(fù)號(hào)’-‘顯示為方塊的問(wèn)題

　　1 讀取數(shù)據(jù)

　　client = pymongo.MongoClient(‘localhost’)db = client[‘zhilian’]table = db[‘python’]columns = [‘zwmc’, ‘gsmc’, ‘zwyx’, ‘gbsj’, ‘gzdd’, ‘fkl’, ‘brief’, ‘zw_link’, ‘_id’, ‘save_date’]# url_set = set([records[‘zw_link’] for records in table.find()])# print(url_set)df = pd.DataFrame([records for records in table.find()], columns=columns)# columns_update = [‘職位名稱’,# ‘公司名稱’,# ‘職位月薪’,# ‘公布時(shí)間’,# ‘工作地點(diǎn)’,# ‘反饋率’,# ‘招聘簡(jiǎn)介’,# ‘網(wǎng)頁(yè)鏈接’,# ‘_id’,# ‘信息保存日期’]# df.columns = columns_updateprint(‘總行數(shù)為：{}行’.format(df.shape[0]))df.head(2)

　　結(jié)果如圖1所示：

　　

　　2 數(shù)據(jù)整理

　　2.1 將str格式的日期變?yōu)?datatime

　　df[‘save_date’] = pd.to_datetime(df[‘save_date’])print(df[‘save_date’].dtype)# df[‘save_date’] datetime64[ns]

　　2.2 篩選月薪格式為“XXXX-XXXX”的信息

　　df_clean = df[[‘zwmc’, ‘gsmc’, ‘zwyx’, ‘gbsj’, ‘gzdd’, ‘fkl’, ‘brief’, ‘zw_link’, ‘save_date’]]# 對(duì)月薪的數(shù)據(jù)進(jìn)行篩選，選取格式為“XXXX-XXXX”的信息，方面后續(xù)分析df_clean = df_clean[df_clean[‘zwyx’].str.contains(‘d+-d+’, regex=True)]print(‘總行數(shù)為：{}行’.format(df_clean.shape[0]))# df_clean.head() 總行數(shù)為：22605行

　　2.3 分割月薪字段，分別獲取月薪的下限值和上限值

　　# http://stackoverflow.com/questions/14745022/pandas-dataframe-how-do-i-split-a-column-into-two# http://stackoverflow.com/questions/20602947/append-column-to-pandas-dataframe# df_temp.loc[: ,’zwyx_min’],df_temp.loc[: , ‘zwyx_max’] = df_temp.loc[: , ‘zwyx’].str.split(‘-‘,1).str #會(huì)有警告s_min, s_max = df_clean.loc[: , ‘zwyx’].str.split(‘-‘,1).strdf_min = pd.DataFrame(s_min)df_min.columns = [‘zwyx_min’]df_max = pd.DataFrame(s_max)df_max.columns = [‘zwyx_max’]df_clean_concat = pd.concat([df_clean, df_min, df_max], axis=1)# df_clean[‘zwyx_min’].astype(int)df_clean_concat[‘zwyx_min’] = pd.to_numeric(df_clean_concat[‘zwyx_min’])df_clean_concat[‘zwyx_max’] = pd.to_numeric(df_clean_concat[‘zwyx_max’])# print(df_clean[‘zwyx_min’].dtype)print(df_clean_concat.dtypes)df_clean_concat.head(2)

　　運(yùn)行結(jié)果如圖2所示：

　　

將數(shù)據(jù)信息按職位月薪進(jìn)行排序

df_clean_concat.sort_values(‘zwyx_min’,inplace=True)# df_clean_concat.tail()

判斷爬取的數(shù)據(jù)是否有重復(fù)值

# 判斷爬取的數(shù)據(jù)是否有重復(fù)值print(df_clean_concat[df_clean_concat.duplicated(‘zw_link’)==True]) Empty DataFrameColumns: [zwmc, gsmc, zwyx, gbsj, gzdd, fkl, brief, zw_link, save_date, zwyx_min, zwyx_max]Index: []

從上述結(jié)果可看出，數(shù)據(jù)是沒(méi)有重復(fù)的。

　　3 對(duì)全國(guó)范圍內(nèi)的職位進(jìn)行分析

　　3.1 主要城市的招聘職位數(shù)量分布情況

　　# from IPython.core.display import display, HTMLADDRESS = [ ‘北京’, ‘上海’, ‘廣州’, ‘深圳’, ‘天津’, ‘武漢’, ‘西安’, ‘成都’, ‘大連’, ‘長(zhǎng)春’, ‘沈陽(yáng)’, ‘南京’, ‘濟(jì)南’, ‘青島’, ‘杭州’, ‘蘇州’, ‘無(wú)錫’, ‘寧波’, ‘重慶’, ‘鄭州’, ‘長(zhǎng)沙’, ‘福州’, ‘廈門’, ‘哈爾濱’, ‘石家莊’, ‘合肥’, ‘惠州’, ‘太原’, ‘昆明’, ‘煙臺(tái)’, ‘佛山’, ‘南昌’, ‘貴陽(yáng)’, ‘南寧’]df_city = df_clean_concat.copy()# 由于工作地點(diǎn)的寫上，比如北京，包含許多地址為北京-朝陽(yáng)區(qū)等# 可以用替換的方式進(jìn)行整理，這里用pandas的replace()方法for city in ADDRESS: df_city[‘gzdd’] = df_city[‘gzdd’].replace([(city+’.*’)],[city],regex=True)# 針對(duì)全國(guó)主要城市進(jìn)行分析df_city_main = df_city[df_city[‘gzdd’].isin(ADDRESS)]df_city_main_count = df_city_main.groupby(‘gzdd’)[‘zwmc’,’gsmc’].count()df_city_main_count[‘gsmc’] = df_city_main_count[‘gsmc’]/(df_city_main_count[‘gsmc’].sum())df_city_main_count.columns = [‘number’, ‘percentage’]# 按職位數(shù)量進(jìn)行排序df_city_main_count.sort_values(by=’number’, ascending=False, inplace=True)# 添加輔助列，標(biāo)注城市和百分比，方面在后續(xù)繪圖時(shí)使用df_city_main_count[‘label’]=df_city_main_count.index+ ‘ ‘+ ((df_city_main_count[‘percentage’]*100).round()).astype(‘int’).astype(‘str’)+’%’print(type(df_city_main_count))# 職位數(shù)量最多的Top10城市的列表print(df_city_main_count.head(10)) class ‘pandas.core.frame.DataFrame’ number percentage labelgzdd 北京 6936 0.315948 北京 32%上海 3213 0.146358 上海 15%深圳 1908 0.086913 深圳 9%成都 1290 0.058762 成都 6%杭州 1174 0.053478 杭州 5%廣州 1167 0.053159 廣州 5%南京 826 0.037626 南京 4%鄭州 741 0.033754 鄭州 3%武漢 552 0.025145 武漢 3%西安 473 0.021546 西安 2%

對(duì)結(jié)果進(jìn)行繪圖：

from matplotlib import cmlabel = df_city_main_count[‘label’]sizes = df_city_main_count[‘number’]# 設(shè)置繪圖區(qū)域大小fig, axes = plt.subplots(figsize=(10,6),ncols=2)ax1, ax2 = axes.ravel()colors = cm.PiYG(np.arange(len(sizes))/len(sizes)) # colormaps: Paired, autumn, rainbow, gray,spring,Darks# 由于城市數(shù)量太多，餅圖中不顯示labels和百分比patches, texts = ax1.pie(sizes,labels=None, shadow=False, startangle=0, colors=colors)ax1.axis(‘equal’) ax1.set_title(‘職位數(shù)量分布’, loc=’center’)# ax2 只顯示圖例（legend）ax2.axis(‘off’)ax2.legend(patches, label, loc=’center left’, fontsize=9)plt.savefig(‘job_distribute.jpg’)plt.show()

　　運(yùn)行結(jié)果如下述餅圖所示：

　　

　　3.2 月薪分布情況（全國(guó)）

　　from matplotlib.ticker import FormatStrFormatterfig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)x_pos = list(range(df_clean_concat.shape[0]))y1 = df_clean_concat[‘zwyx_min’]ax1.plot(x_pos, y1)ax1.set_title(‘Trend of min monthly salary in China’, size=14)ax1.set_xticklabels(”)ax1.set_ylabel(‘min monthly salary(RMB)’)bins = [3000,6000, 9000, 12000, 15000, 18000, 21000, 24000, 100000]counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype=’bar’, facecolor=’g’, rwidth=0.8)ax2.set_title(‘Hist of min monthly salary in China’, size=14)ax2.set_yticklabels(”)# ax2.set_xlabel(‘min monthly salary(RMB)’)# http://stackoverflow.com/questions/6352740/matplotlib-label-each-binax2.set_xticks(bins) #將bins設(shè)置為xticksax2.set_xticklabels(bins, rotation=-90) # 設(shè)置為xticklabels的方向# Label the raw counts and the percentages below the x-axis…bin_centers = 0.5 * np.diff(bins) + bins[:-1]for count, x in zip(counts, bin_centers):# # Label the raw counts# ax2.annotate(str(count), xy=(x, 0), xycoords=(‘data’, ‘axes fraction’),# xytext=(0, -70), textcoords=’offset points’, va=’top’, ha=’center’, rotation=-90) # Label the percentages percent = ‘%0.0f%%’ % (100 * float(count) / counts.sum()) ax2.annotate(percent, xy=(x, 0), xycoords=(‘data’, ‘axes fraction’), xytext=(0, -40), textcoords=’offset points’, va=’top’, ha=’center’, rotation=-90, color=’b’, size=14)fig.savefig(‘salary_quanguo_min.jpg’)

　　運(yùn)行結(jié)果如下述圖所示：

　　

　　不考慮部分極值后，分析月薪分布情況

　　df_zwyx_adjust = df_clean_concat[df_clean_concat[‘zwyx_min’]=20000]fig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)x_pos = list(range(df_zwyx_adjust.shape[0]))y1 = df_zwyx_adjust[‘zwyx_min’]ax1.plot(x_pos, y1)ax1.set_title(‘Trend of min monthly salary in China (adjust)’, size=14)ax1.set_xticklabels(”)ax1.set_ylabel(‘min monthly salary(RMB)’)bins = [3000,6000, 9000, 12000, 15000, 18000, 21000]counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype=’bar’, facecolor=’g’, rwidth=0.8)ax2.set_title(‘Hist of min monthly salary in China (adjust)’, size=14)ax2.set_yticklabels(”)# ax2.set_xlabel(‘min monthly salary(RMB)’)# http://stackoverflow.com/questions/6352740/matplotlib-label-each-binax2.set_xticks(bins) #將bins設(shè)置為xticksax2.set_xticklabels(bins, rotation=-90) # 設(shè)置為xticklabels的方向# Label the raw counts and the percentages below the x-axis…bin_centers = 0.5 * np.diff(bins) + bins[:-1]for count, x in zip(counts, bin_centers):# # Label the raw counts# ax2.annotate(str(count), xy=(x, 0), xycoords=(‘data’, ‘axes fraction’),# xytext=(0, -70), textcoords=’offset points’, va=’top’, ha=’center’, rotation=-90) # Label the percentages percent = ‘%0.0f%%’ % (100 * float(count) / counts.sum()) ax2.annotate(percent, xy=(x, 0), xycoords=(‘data’, ‘axes fraction’), xytext=(0, -40), textcoords=’offset points’, va=’top’, ha=’center’, rotation=-90, color=’b’, size=14)fig.savefig(‘salary_quanguo_min_adjust.jpg’)

　　運(yùn)行結(jié)果如下述圖所示：

　　

　　3.3 相關(guān)技能要求

　　brief_list = list(df_clean_concat[‘brief’])brief_str = ”.join(brief_list)print(type(brief_str))# print(brief_str)# with open(‘brief_quanguo.txt’, ‘w’, encoding=’utf-8′) as f:# f.write(brief_str) class ‘str’

　　對(duì)獲取到的職位招聘要求進(jìn)行詞云圖分析，代碼如下：

　　# -*- coding: utf-8 -*-“””Created on Wed May 17 2017@author: lemon”””import jiebafrom wordcloud import WordCloud, ImageColorGeneratorimport matplotlib.pyplot as pltimport osimport PIL.Image as Imageimport numpy as npwith open(‘brief_quanguo.txt’, ‘rb’) as f: # 讀取文件內(nèi)容 text = f.read() f.close()# 首先使用 jieba 中文分詞工具進(jìn)行分詞wordlist = jieba.cut(text, cut_all=False) # cut_all, True為全模式，F(xiàn)alse為精確模式wordlist_space_split = ‘ ‘.join(wordlist)d = os.path.dirname(__file__)alice_coloring = np.array(Image.open(os.path.join(d,’colors.png’)))my_wordcloud = WordCloud(background_color=’#F0F8FF’, max_words=100, mask=alice_coloring, max_font_size=300, random_state=42).generate(wordlist_space_split)image_colors = ImageColorGenerator(alice_coloring)plt.show(my_wordcloud.recolor(color_func=image_colors))plt.imshow(my_wordcloud) # 以圖片的形式顯示詞云plt.axis(‘off’) # 關(guān)閉坐標(biāo)軸plt.show()my_wordcloud.to_file(os.path.join(d, ‘brief_quanguo_colors_cloud.png’))

　　得到結(jié)果如下：

　　

　　4 北京

　　4.1 月薪分布情況

　　df_beijing = df_clean_concat[df_clean_concat[‘gzdd’].str.contains(‘北京.*’, regex=True)]df_beijing.to_excel(‘zhilian_kw_python_bj.xlsx’)print(‘總行數(shù)為：{}行’.format(df_beijing.shape[0]))# df_beijing.head() 總行數(shù)為：6936行

　　參考全國(guó)分析時(shí)的代碼，月薪分布情況圖如下：

　　

　　4.2 相關(guān)技能要求

　　brief_list_bj = list(df_beijing[‘brief’])brief_str_bj = ”.join(brief_list_bj)print(type(brief_str_bj))# print(brief_str_bj)# with open(‘brief_beijing.txt’, ‘w’, encoding=’utf-8′) as f:# f.write(brief_str_bj) class ‘str’

　　詞云圖如下：

　　

　　5 長(zhǎng)沙

　　5.1 月薪分布情況

　　df_changsha = df_clean_concat[df_clean_concat[‘gzdd’].str.contains(‘長(zhǎng)沙.*’, regex=True)]# df_changsha = pd.DataFrame(df_changsha, ignore_index=True)df_changsha.to_excel(‘zhilian_kw_python_cs.xlsx’)print(‘總行數(shù)為：{}行’.format(df_changsha.shape[0]))# df_changsha.tail() 總行數(shù)為：280行

　　參考全國(guó)分析時(shí)的代碼，月薪分布情況圖如下：

　　

　　5.2 相關(guān)技能要求

　　brief_list_cs = list(df_changsha[‘brief’])brief_str_cs = ”.join(brief_list_cs)print(type(brief_str_cs))# print(brief_str_cs)# with open(‘brief_changsha.txt’, ‘w’, encoding=’utf-8′) as f:# f.write(brief_str_cs) class ‘str’

　　詞云圖如下：

　　

鄭重聲明：本文內(nèi)容及圖片均整理自互聯(lián)網(wǎng)，不代表本站立場(chǎng)，版權(quán)歸原作者所有，如有侵權(quán)請(qǐng)聯(lián)系管理員(admin#wlmqw.com)刪除。

炎之軌跡停運(yùn)行了嗎?(炎之軌跡)

上一篇 2023年8月19日 12:03

真實(shí)汽車模擬駕駛破解版下載(真實(shí)汽車模擬駕駛)

下一篇 2023年8月19日 21:10

我為卿狂下載
　　信管家軟件是專為證券、期貨公司打造的用來(lái)管理客戶資金、控制客戶交易風(fēng)險(xiǎn)的投資管理工具。信管家手機(jī)軟件平臺(tái)可以交易的期貨品種有：香港恒生指數(shù)、德國(guó)股指、美原油、美黃金、美白銀、美…
2024年1月24日
0
jdk官方下載 jdk17官方下載
　　Java零基礎(chǔ)如何學(xué)好Java語(yǔ)言編程，作為一個(gè)Java初學(xué)者，怎樣從一個(gè)新手快速入門。首先，這里鑫韭緣設(shè)計(jì)告訴你，你要想明白自己為什么要學(xué)Java？只是一種興趣愛(ài)好，還是為了…
2024年1月24日
0
劍齒虎圖片_劍齒虎圖片冰川時(shí)代
　　美國(guó)最著名生態(tài)學(xué)家同時(shí)也是博士級(jí)別的專業(yè)科學(xué)家最近發(fā)表了一個(gè)非常震驚論述：經(jīng)過(guò)和多位科學(xué)家們的共同研究，發(fā)現(xiàn)人類的起源并不是在地球，而是在20萬(wàn)年-6萬(wàn)年前被外星人送至地球的…
2024年1月24日
0
玉蒲團(tuán)之官人我要下載_
1、誘人的聲音不停的在呢喃著“官人，我要！~~~~~” 挑逗紅果果的挑逗 “你真當(dāng)爺把你伺候不了么？”樂(lè)山瘋子暴怒而起，策。 2、很長(zhǎng)一段時(shí)間“三級(jí)片”這個(gè)專有名詞，常與咸濕下流這…
2024年1月24日
0
新宿事件下載原聲(新宿事件下載)
　　由真實(shí)犯罪事件改編而成的華語(yǔ)電影，比魑魅魍魎更可怕的是人心。近日《解救吾先生》正在火熱上映，高端營(yíng)銷推廣平臺(tái)鹿豹座從中也了解到了一種電影的形式，即真實(shí)事件改變的電影故事。下面就…
2024年1月24日
0
6歲兒童教育軟件哪個(gè)好(兒童學(xué)習(xí)游戲)
　　游戲特色一、學(xué)數(shù)學(xué) (學(xué)數(shù)字學(xué)數(shù)數(shù) 數(shù)水果比大小學(xué)結(jié)賬學(xué)計(jì)算移木板) 　　二、學(xué)拼音 (學(xué)韻母學(xué)聲母猜聲調(diào) 猜拼音) 　　三、學(xué)音樂(lè) (識(shí)樂(lè)器彈鋼琴學(xué)打鼓) …
2024年1月24日
0
雙人大富翁下載(雙人大富翁)
如果用一件事來(lái)比喻我們的人生說(shuō)它像一場(chǎng)游戲再合適不過(guò)了雖然游戲與現(xiàn)實(shí)存在著很多差距但二者之間卻有著太多奇妙的相通之處都說(shuō)游戲是人生的一種折射很多相似的生活情節(jié)都能在游戲找…
2024年1月24日
0
尼爾系列游戲順序_尼爾系列好玩嗎?
　　【17173專稿，轉(zhuǎn)載請(qǐng)注明出處】　　　　動(dòng)作RPG類游戲《尼爾》最早在2010年登錄了PS3和Xbox360平臺(tái)，當(dāng)時(shí)就獲得了很好的口碑。Square Enix公司去年公…
2024年1月24日
0
西虹市首富下載_西虹市首富app
西虹市首富，我不知道什么是水果，但我知道設(shè)么是西紅柿。西虹市首富百度網(wǎng)盤資源下載鏈接 Lz1zpw ？pwd=1234 提取碼 1234西虹市首富是由閆非彭大魔編劇兼執(zhí)導(dǎo)，沈騰宋…
2024年1月24日
0
九游客服電話轉(zhuǎn)人工(九游客服電話)
　　如何充值暴走魔獸手游？本次是海外充值暴走魔獸手游的方法介紹，想知道海外充值Itunes方法的海外朋友可以搜索往期記錄。　　在海外充值暴走魔獸手游有時(shí)候會(huì)遇到一些意想不到的困難…
2024年1月24日
0

聯(lián)系我們

聯(lián)系郵箱：admin#wlmqw.com
工作時(shí)間：周一至周五，10:30-18:30，節(jié)假日休息

<ul id="bap88"></ul>

<div id="bap88"></div>