import torch, torch.nn as nn
import torch.nn.functional as F

torch.random.manual_seed(2020)
input = torch.randn(4,3)
print(input)

sm = nn.Softmax(dim =0)
x0=sm(input)
print(x0)

sm = nn.Softmax(dim =1)
x1=sm(input)
print(x1)

x2 = torch.log(x1)
print(x2)

target = torch.tensor([0,2,1,2])
print(-(x2[0,0]+x2[1,2]+x2[2,1]+x2[3,2])/4)
print(F.nll_loss(x2, target))
print(F.nll_loss(x2, target, size_average = False))
print(F.nll_loss(x2, target, reduction = 'sum'))

# CrossEntropyLoss就是把以上Softmax–Log–NLLLoss合并成一步
loss = nn.CrossEntropyLoss()
print(loss(input, target))

transformers

官方文档

Pytorch：transforms的二十二个方法

包

Stanford parser

NLP工具——stanford Parser使用手册

（Java）利用Stanford parser与多线程获取语句中名词集合工具实现

Stanford Parser 标记含义

from stanfordcorenlp import StanfordCoreNLP
path = '../stanford-corenlp-4.2.2'
nlp = StanfordCoreNLP(path, lang='en')
s = 'Stanford University is located in California. It is a great university, founded in 1891.'
 
token = nlp.word_tokenize(s)
postag = nlp.pos_tag(s)
ner = nlp.ner(s)
parse = nlp.parse(s)
dependencyParse = nlp.dependency_parse(s)
 
print(' '.join(token))
print('|'.join([','.join(i) for i in postag]))
print('|'.join([','.join(i) for i in ner]))
print(parse)
for i, begin, end in dependencyParse:
    print(i, '-'.join([str(begin), token[begin-1]]), '-'.join([str(end),token[end-1]]))

nlp.close()

torchcrf

pytorch-crf的使用

1 2	from torchcrf import CRF pip install pytorch-crf==0.4.0

训练

torch.nn.DataParallel

Cuda out of memory

1	os.environ["CUDA_VISIBLE_DEVICES"] = '3,4,5'

Pytorch（五）入门：DataLoader 和 Dataset

多GPU训练

Batch

batch size过小，花费时间多，同时梯度震荡严重，不利于收敛；batch size过大，不同batch的梯度方向没有任何变化，容易陷入局部极小值。

经典模型

评估

NLP（二十三）序列标注算法评估模块seqeval的使用 - 山阴少年 - 博客园 (cnblogs.com)

Macro-F1: macro f1需要先计算出每一个类别的准召及其f1 score，然后通过求均值得到在整个样本上的f1 score.

Micro-F1: micro f1不需要区分类别，直接使用总体样本的准召计算f1 score.

Weighted-F1: 在macro f1基础上，考虑每一类别的个数，进行加权平均。

micro-f1 & macro-f1 & weighted-f1

常用

Anaconda设置CUDA版本和系统默认版本共存

Python操作excel：用xlwt设置excel单元格背景颜色，给字体加粗

读写csv

import csv
ine_num=2
file_path = ""
with open(file_path, 'r') as f:
    reader = csv.reader(f)
    print(type(reader))    
    i=0
    for row in reader:
        i+=1
        print(row)
        if i==ine_num:
            break