Asynchronous SGD with stale gradient dynamic adjustment for deep learning training

doi:10.1016/j.ins.2024.121220

CSpace

	Asynchronous SGD with stale gradient dynamic adjustment for deep learning training
	Tan, Tao 1; Xie, Hong 1; Xia, Yunni 2; Shi, Xiaoyu3 ; Shang, Mingsheng3
	2024-10-01
摘要	Asynchronous stochastic gradient descent (ASGD) is a computationally efficient algorithm, which speeds up deep learning training and plays an important role in distributed deep learning. However, ASGD suffers from the stale gradient problem, i.e., the gradient of worker may mismatch the weight of parameter server. This problem seriously affects the model performance and even causes the divergence. To address this issue, this paper designs a dynamic adjustment scheme via the momentum algorithm, which uses both stale penalty and stale compensation, , i.e., stale penalty is to reduce the trust in stale gradient, stale compensation is to compensate the hurt of stale gradient. Based on this dynamic adjustment scheme, this paper proposes a dynamic asynchronous stochastic gradient descent algorithm (DASGD), which dynamically adjusts the compensation factor and the penalty factor via stale size. Moreover, we prove that DASGD is convergent under some mild assumptions. Finally, we build a real distributed training cluster to evaluate our DASGD on Cifar10 and ImageNet datasets. Compared with four SOTA baselines, experiment results confirm the superior performance of DASGD. More specifically, our DASGD has nearly the same test accuracy as SGD on Cifar10 and ImageNet, , and only uses around 27.6% and 40.8% training time that of SGD, respectively.
关键词	ASGD DASGD Stale compensation Stale penalty
DOI	10.1016/j.ins.2024.121220
发表期刊	INFORMATION SCIENCES
ISSN	0020-0255
卷号	681 页码:16
通讯作者	Xie, Hong(xiehong2018@foxmail.com)
收录类别	SCI
WOS记录号	WOS:001302691400001
语种	英语

中国科学院重庆绿色智能技术研究院机构知识库