二狗子
发布于 2024-12-02 / 76 阅读
0
0

python赋值运算符竟使得梯度求导报错

之前遇到过一类难以理解的错误,不明所以的报错位置,按报错提示做也不行,如下所示:

fine_tunning_label_text_features = fine_tunning_model.encode_text(label_text)
fine_tunning_label_text_features/=fine_tunning_label_text_features.norm(
  dim=-1, keepdim=True)
fine_tunning_output_label = (fine_tunning_image_features @
    fine_tunning_label_text_features.T).softmax(dim=-1)

错误如下:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [6, 1024]], which is output 0 of MmBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). 

你按照报错提示去设置detect_anomaly吧,也有问题,我查了很久,发现很多国外的回答就是要你把ReLU的inplace设置成False(StackOverflow),但这其实是python赋值运算符(/=)的问题,这里其实可以推广到其它赋值运算符如-=、+=、*=。这类赋值运算符的意思是将运算符左右两边的数值运算后再赋值给左边的变量。那这里就出现问题了:

a *= b # 报错
a = a * b # 可以

如果一个变量后续会进行自动梯度求导,那其值是不能在原地被直接改变的(inplace operation)。所以只要把 *= 改成更明确的代码就好。


评论