Pytorch、TensorFlow、Keras如何固定随机种子

2022-11-02人工智能106

1. 可能引入随机性的地方

cuDNN中大量nondeterministic的算法
GPU多线程
多个num_workers带来的随机性
来自复杂模型的随机性（比如一些版本的RNN和LSTM、Conv、Dropout、Dense、GRUCell层的初始化）
一些第三方库（因此需要固定对应库RNG的种子）
优化器（比如Adam）
不同的开发环境，比如软件版本、CPU类型

2. Pytorch如何固定随机种子

在其他模块的导入或者其他代码之前，在文件的顶端部分通过调用seed_torch()函数固定随机种子，即设置各个随机数字生成器（RNG）的种子。


def seed_torch(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

    torch.backends.cudnn.deterministic = True

seed_torch(seed=42)

其他：

fastiai中用augmentation时，由于多线程的data loading，会带来随机性。 When you use threads for data loading, the augmentation for each image is done inside different threads. So, even if you have set a random seed before, threads (since they share resources) will update the state of the random and share this state as they perform augmentations.
Always and always provide the random seed for functions in the code that require them; a classic example of such a function is train_test_split() where the random_state parameter is expected.
对pytorch而言，DataLoader will reseed workers following Randomness in multi-process data loading algorithm. Use worker_init_fn() and generator to preserve reproducibility
有人pytorch不能复现，加gradient clipping后可以复现了

知识补充：

torch.backends.cudnn.benchmark = False，用于控制是否去选择最快、最优的算法（通过基准测试，在多种实现不同尺寸卷积的方式中选取一个算法。接下来就都用这个算法。）。Due to benchmarking noise and different hardware, the benchmark may select different algorithms on subsequent runs, even on the same machine. 设置为False的时候，保证了CUDA每次都选择相同的算法，但是不保证该算法是deterministic的。这个设置好像是针对卷积运算的

torch.use_deterministic_algorithms(True) 和 torch.backends.cudnn.deterministic = True可以保证使用的算法是deterministic。后者只保证这一点，但是 torch.use_deterministic_algorithms(True)，遇到nondeterministic的算法时会去搜索是否有deterministic的算法可以替换，若有则自动替换，若没有则报错。------ 参考

3. TensorFlow和Keras如何固定随机种子

Keras 从 NumPy 随机生成器中获得随机源，所以不管使用 Theano 或者 TensorFlow 后端的哪一个，都必须设置种子。

另外，TensorFlow 有自己的随机数生成器，该生成器也必须在 NumPy 随机数生成器之后通过立马调用 seed_tensorflow() 函数设置种子点。

以上操作都应当在文件顶端实现。这是最佳的实现方式（best practice），因为当各种各样的Keras或者Theano(或者其他的)库作为初始化的一部分被导入时，甚至在直接使用他们之前，都可能会用到一些随机性。

具体操作如下：

def seed_tensorflow(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'

seed_tensorflow(42)

在调用seed_tensorflow后还需设置model.fit中shuffle=False、worker=1.

其他

设置Keras后端为Theano switched to Theano as Keras backend with "conv.algo_bwd_data=deterministic" and "conv.algo_bwd_filter=deterministic" in config file. Looks reproducible. Was not able to reproduce with Tensorflow. 但也有人说还是不可复现。另外，theano的默认是channel_first的。
在旧版本的tf中（图执行模式、用Session），随机操作依懒与两种seed，即global seed和operation seed。tf.set_random_seed固定是global seed，作为参数传入的是operation seed。随机性不仅与graph的状态有关也与我们的随机数设置有关。另外，global seed对graph创建前的改变是很敏感的。新版本的tf似乎不太一样。。。
pip install tensorflow-determinism


from tfdeterminism import patch
patch()

pip install tensorflow-determinism
os.environ['TF_DETERMINISTIC_OPS'] = 1即可

Do not import from tf.python.keras Ensure all imports are consistent (i.e. don't do from keras.layers import ... and from tensorflow.keras.optimizers import ...) ------ 参考

4. 补充提醒

有时候在代码中才固定 PYTHONHASHSEED 会太晚，此时可选择在终端就提前固定。

PYTHONHASHSEED=0 python3 train.py

避免使用并行操作 avoid parallelism.Because of floating point errors, the order of execution matters. For example, 1 + 1 + 1/3 is not perfectly equal to 1/3 + 1 + 1 (check it out in a Python shell).To make TensorFlow single-threaded, you can set the TF_NUM_INTEROP_THREADS and TF_NUM_INTRAOP_THREADS environment variables to 1 before starting Python (or at least before importing TensorFlow).
查看官方API文档，了解随机性的来源。
一些很好的搜寻平台包括GitHub、StackOverflow 、CrossValidated、Machine Leaning Mastery、Quora、StackExchange、medium。

5. 和解

深度学习一个很大的特点就是"随机性"，it's a feature，not a bug！

当你将神经网络的操作看作数学操作，那么你希望所有的事情都是确定的。卷积、激活、交叉熵计算，都得是确定的。甚至伪随机操作，比如shuffle、drop-out、noise等，都应该被seed所确定。

然而，如果你将神经网络的操作看作是大规模并行运算（多线程 multi-threads），这将产生随机性，除非你很小心。

当所有线程处理各自的数据时结果是确定的，比如activation操作是确定的。并行计算的随机性是指，当多个线程需要同步，比如sum，那么结果将依赖于求和的顺序，或者哪个线程先结束。另外，I think it has to do with rounding errors that get accumulated.

也许固定随机数没那么重要，当你训练的时候，由于randomness，你的loss是来自一个置信区间（confidence interval）的值。比较这些值以优化超参数而忽略这些置信区间没有多大意义——因此，在我看来，在这种情况下以及许多其他情况下花费太多精力来修复非确定性是徒劳的。

cm 可以多跑几次，计算mean，来衡量有效性。

另外，即使不能固定所有随机种子，在你固定了一些之后，训练结果相对于不固定还是要稳定一些的，这个可以通过计算训练结果的std来看。

英伟达官方和TensorFlow官方、pytorch官方均不保证可复现性

Pytorch、TensorFlow、Keras如何固定随机种子

; 6. 总结

比较重要的是在一开始调用set_seed()函数。对于tf+keras，需要model.fit()中的shuffle=False，worker=1。另外再用上tfdm。其他的像是手动设置Conv2d、Conv3D、Dense、Dropout的seed什么的基本不需要。

7. 参考

Original: https://blog.csdn.net/weixin_43987408/article/details/122492885
Author: rain-fallz
Title: Pytorch、TensorFlow、Keras如何固定随机种子

一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Pytorch、TensorFlow、Keras如何固定随机种子

1. 可能引入随机性的地方

2. Pytorch如何固定随机种子

3. TensorFlow和Keras如何固定随机种子

4. 补充提醒

5. 和解

; 6. 总结

7. 参考

猿创征文｜时间序列分析算法之平稳时间序列预测算法和自回归模型(AR)详解+Python代码实现

logistic回归模型—基于R

环境混合物总体效应：加权分位数和回归（WQS）

数学建模学习：岭回归和lasso回归

R 计算均方差MSE(mean squared error)

python数据相关性绘图-散点图正态分布图回归图等及鸢尾花数据集可视化（附Python代码）

基于Lasso回归的实证分析（Python实现代码）

目标检测中边框回归的直观理解 bbox regression

通过R语言实现平稳时间序列的建模–基础（ARMA模型）

【sklearn使用】sklearn中调用R2（回归问题评价指标）的3种方式

【项目实战】Python实现GBDT(梯度提升树)回归模型(GradientBoostingRegressor算法)项目实战

机器学习算法系列（四）- 岭回归算法（Ridge Regression Algorithm）

stata基础–回归，画散点图，异质性分析

机器学习之分类回归树（CART）

机器学习基础：用 Lasso 做特征选择

利用lasso回归建立预测模型并绘制列线图二分类结局资料的lasso回归与列线图绘制

计量经济学笔记6-Eviews操作-自相关的检验与消除（DW、LM检验与FGLS、广义差分变换）

Pytorch：全连接神经网络-MLP回归

机器学习实验——回归预测算法

基于MATLAB的随机森林（RF）回归与变量影响程度（重要性）排序

机器学习算法、Python、数据分析、学习资料 & 面试大汇总（免费送）