<small id='orL1W'></small> <noframes id='qi0xKBQ'>

  • <tfoot id='qOjulP'></tfoot>

      <legend id='GXIHNs'><style id='LFhXjBZ'><dir id='PZaYp'><q id='tUa7xV'></q></dir></style></legend>
      <i id='toFiOUak'><tr id='HCAyn'><dt id='l1SDL2N'><q id='OvKwcR'><span id='mBn7k'><b id='v5T21j84x'><form id='Cs2gVaDWI3'><ins id='ifCWL5p'></ins><ul id='kGZaMqjDU'></ul><sub id='hWXVO'></sub></form><legend id='zNtdV3'></legend><bdo id='ErhVOWmy'><pre id='pEZ5'><center id='HVbUfTE2vM'></center></pre></bdo></b><th id='Q1AeLx0Raj'></th></span></q></dt></tr></i><div id='pe8k1w'><tfoot id='HXbtZ2Cuk'></tfoot><dl id='ohxqj87P'><fieldset id='B5di4W'></fieldset></dl></div>

          <bdo id='CwWta'></bdo><ul id='6O2C75s4'></ul>

          1. <li id='hZATOnd8H'></li>
            登陆

            章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~

            admin 2019-05-14 331人围观 ,发现0个评论

            先简略介绍一下jieba中文分词包,jieba包首要有三种分词形式:

            • 准确形式:默许情况下是准确形式,准确地分词,合适文本剖析;
            • 全形式:把一切能成词的词语都分出来, 可是词语会存有歧义;
            • 搜索引擎形式:在准确形式的基础上,对长词再次切分,合适用于搜索引擎分词。

            jieba 包常用的句子:

            • 准确形式分词:jieba.cut(text,cut_all = False),当cut_all = True时为全形式
            • 自界说词典:jieba.load_userdict(file_name)
            • 增加词语:jieba.add_word(seg,freq,flag)
            • 删去词语:jieba.del_word(seg)

            《哈利波特》是英国作家JK罗琳的奇幻文学系列小说,描绘主角哈利波特在霍格沃茨魔法校园7年学习日子中的冒险故事。下面将以《哈利波特》扑朔迷离的人物联系为例,实践一下jieba包。

            需求学习Python材料的小伙伴转发重视私信小编python收取材料!!!!

            #加载所需包

            import numpy as np

            imp章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~ort pandas as pd

            import jieba,codecs

            import jieba.posseg as pseg #标示词性模块

            from pyecharts import Bar,WordCloud

            #导入人名、停用词、特定词库

            renmings = pd.read_csv('人名.txt',engine='python',encoding='utf-8',names=['renming'])['renming']

            stopwords = pd.read_csv('mystopwords.txt',engine='python',encoding='utf-8',names=['stopwords'])['stopwords'].tolist()

            book = open('哈利波特.txt',encoding='utf-8').read()

            jieba.load_userdict('哈利波特词库.txt')

            #界说一个分词函数

            def words_cut(book):

            words = list(jieba.cut(book))

            stopwords1 = [w for w in words if len(w)==1] #增加停用词

            seg = set(words) - set(stopwords) - set(stopwords1) #过滤停用词,得到更为准确的分词

            result = [i for i in words if i in seg]

            return result

            #初度分词

            bookwords = words_cut(book)

            renming = [i.split(' ')[0] for i in set(renmings)] #只需人物姓名,出掉词频以及词性

            nameswords = [i for i in bookwords if i in set(renming)] #挑选出人物姓名

            #计算词频

            bookwords_count = pd.Series(bookwords).value_counts().sort_values(ascending=False)

            nameswords_count = pd.Series(nameswords).value_counts().sort_values(ascending=False)

            bookwords_count[:100].index

            经过初度分词之后,咱们发现大部分的词语现已ok了,可是仍是有小部分姓名类的词语分得不准确,比如说'布利'、'罗恩说'、'伏地'、'斯内'、'地说'等等,还有像'乌姆里奇'、'霍格沃兹'等分红两个词语的。

            #自界说部分词语

            jieba.add_word('邓布利多',100,'nr')

            jieba.add_word('霍格沃茨',100,'n')

            jieba.add_word('乌姆里奇',100,'nr')

            jieba.add_word('拉唐克斯',100,'nr')

            jieba.a章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~dd_word('伏地魔',100,'nr')

            jieba.del_word('罗恩说')

            jieba.del_word('地说')

            jieba.del_word('斯内')

            #再次分词

            bookwords = words_cut(book)

            nameswords = [i for i in bookwords if i in set(renming)]

            bookwords_count = pd.Series(bookwords).value_counts().sort_values(ascending=False)

            nameswords_count = pd.Series(nameswords).value_counts().sort_values(ascending=False)

            bookwords_count[:100].index

            再次分词之后,咱们能够看到在初度分词呈现的过错现已得到批改了,接下来咱们计算剖析。

            #计算词频TOP15的词语

            bar = Bar('呈现最多的词语TOP15',background_color = 'white',title_pos = 'center',title_text_size = 20)

            x = bookwords_count[:15].index.tolist()

            y = bookwords_count[:15].values.tolist()

            bar.add('',x, y,xaxis_interval = 0,xax章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~is_rotate = 30,is_label_show = True)

            bar

            整部小说呈现最多的词语TOP15中呈现了哈利、赫敏、罗恩、邓布利多、魔杖、魔法、马尔福、斯内普和小天狼星等字眼。

            咱们自己串一下,大约能够知道《哈利波特》的首要内容了,便是哈利在小伙伴赫敏、罗恩的陪同下,经过大法师邓布利多的协助与培育,运用魔杖运用魔法把大boss伏地魔k.o的故事。当然啦,《哈利波特》仍是十分精彩的。

            需求学习Python材料的小伙伴转发重视私信小编python收取材料!!!!

            #计算人物姓名TOP20的词语

            bar = Bar('首要人物Top20',background_color = 'white',title_pos = 'center',title_text_size = 20)血源诅咒

            x = nameswords_count[:20].index.tolist()

            y =nameswords_count[:20].values.tolist()

            bar.add('',x, y,xaxis_interval = 0,xaxis_rotate = 30,is_label_show = True)

            bar

            整部小说依照进场次数,咱们发现哈利作为主角的位置无可撼动,比排名第二的赫敏远超13000屡次,当然这也是十分正常的,究竟这本书是《哈利波特》,而不是《赫敏格兰杰》。

            #整本小说的词语词云剖析

            name = bookwords_count.index.tolist()

            value = bookwords_count.values.tolist()

            wc = WordCloud(background_color = 'white')

            wc.add("", name, value, word_size_range=[10, 200],shape = 'diamond')

            wc

            #人物联系剖析

            names = {}

            relationships = {}

            lineNames = []

            with codecs.open('哈利波特.txt','r','utf8') as f:

            n = 0

            for line in f.readlines():

            n+=1

            print('正在处理第{}行'.format(n))

            poss = pseg.cut(line)

            lineNames.append([])

            for w in poss:

            if w.word in set(nameswords):

            lineNames[-1].append(w.word)

            if names.get(w.word) is None:

            names[w.word] = 0

            relationships[w.word] = {}

            names[w.word] += 1

            for line in lineNames:

            for name1 in line:

            for name2 in line:

            if name1 == name2:

            continue

            if relationships[name1].get(name2) is None:

            relationships[name1][name2]= 1

            else:

            relationships[name1][name2] = relationships[name1][name2]+ 1

            node = pd.DataFrame(columns=['Id','Label','Weight'])

            edge = pd.DataFrame(columns=['Source','Target','Weight'])

            for name,times in names.items():

            node.loc[len(node)] = [name,name,times]

            for name,edges in relationships.items():

            for v, w in ed章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~ges.items():

            if w > 3:

            edge.loc[len(edge)] = [name,v,w]

            处理之后,咱们发现同一个人物呈现了不同的称号,因而合并并计算,得出88个节点。

            node.loc[node['Id']=='哈利','Id'] = '哈利波特'

            node.loc[node['Id']=='波特','Id'] = '哈利波特'

            node.loc[node['Id']=='阿不思','Id'] = '邓布利多'

            node.loc[node['L章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~abel']=='哈利','Label'] = '哈利波特'

            node.loc[node['Label']=='波特','Label'] = '哈利波特'

            node.loc[node['Label']=='阿不思','Label'] = '邓布利多'

            edge.loc[edge['Source']=='哈利','Source'] = '哈利波特'

            edge.loc[edge['Source']=='波特','Source'] = '哈利波特'

            edge.loc[edge['Source']=='阿不思','Source'] = '邓布利多'

            edge.loc[edge['Target']=='哈利','Target'] = '哈利波特'

            edge.loc[edge['Target']=='波特','Target'] = '哈利波特'

            edge.loc[edge['Target']=='阿不思','Target'] = '邓布利多'

            nresult = node['Weight'].groupby([node['Id'],node['Label']]).agg({'Weight':np.sum}).sort_values('Weight',ascending = False)

            eresult = edge.sort_values('Weight',ascending = False)

            nresult.to_csv('node.csv',index = False)

            eresult.to_csv('edge.csv',index = False)

            有了节点node以及边edge后,经过gephi对《哈利波特》的人物联系进行剖析章鱼彩票电脑-用Python解读哈利波特的魔法国际!这无处安放的魅力~:

            需求学习Python材料的小伙伴转发重视私信小编python收取材料!!!

            请关注微信公众号
            微信二维码
            不容错过
            Powered By Z-BlogPHP