CSpace
Asynchronous SGD with stale gradient dynamic adjustment for deep learning training
Tan, Tao1; Xie, Hong1; Xia, Yunni2; Shi, Xiaoyu3; Shang, Mingsheng3
2024-10-01
摘要Asynchronous stochastic gradient descent (ASGD) is a computationally efficient algorithm, which speeds up deep learning training and plays an important role in distributed deep learning. However, ASGD suffers from the stale gradient problem, i.e., the gradient of worker may mismatch the weight of parameter server. This problem seriously affects the model performance and even causes the divergence. To address this issue, this paper designs a dynamic adjustment scheme via the momentum algorithm, which uses both stale penalty and stale compensation, , i.e., stale penalty is to reduce the trust in stale gradient, stale compensation is to compensate the hurt of stale gradient. Based on this dynamic adjustment scheme, this paper proposes a dynamic asynchronous stochastic gradient descent algorithm (DASGD), which dynamically adjusts the compensation factor and the penalty factor via stale size. Moreover, we prove that DASGD is convergent under some mild assumptions. Finally, we build a real distributed training cluster to evaluate our DASGD on Cifar10 and ImageNet datasets. Compared with four SOTA baselines, experiment results confirm the superior performance of DASGD. More specifically, our DASGD has nearly the same test accuracy as SGD on Cifar10 and ImageNet, , and only uses around 27.6% and 40.8% training time that of SGD, respectively.
关键词ASGD DASGD Stale compensation Stale penalty
DOI10.1016/j.ins.2024.121220
发表期刊INFORMATION SCIENCES
ISSN0020-0255
卷号681页码:16
通讯作者Xie, Hong(xiehong2018@foxmail.com)
收录类别SCI
WOS记录号WOS:001302691400001
语种英语