KMS Chongqing Institute of Green and Intelligent Technology, CAS
Asynchronous SGD with stale gradient dynamic adjustment for deep learning training | |
Tan, Tao1; Xie, Hong1; Xia, Yunni2; Shi, Xiaoyu3; Shang, Mingsheng3 | |
2024-10-01 | |
摘要 | Asynchronous stochastic gradient descent (ASGD) is a computationally efficient algorithm, which speeds up deep learning training and plays an important role in distributed deep learning. However, ASGD suffers from the stale gradient problem, i.e., the gradient of worker may mismatch the weight of parameter server. This problem seriously affects the model performance and even causes the divergence. To address this issue, this paper designs a dynamic adjustment scheme via the momentum algorithm, which uses both stale penalty and stale compensation, , i.e., stale penalty is to reduce the trust in stale gradient, stale compensation is to compensate the hurt of stale gradient. Based on this dynamic adjustment scheme, this paper proposes a dynamic asynchronous stochastic gradient descent algorithm (DASGD), which dynamically adjusts the compensation factor and the penalty factor via stale size. Moreover, we prove that DASGD is convergent under some mild assumptions. Finally, we build a real distributed training cluster to evaluate our DASGD on Cifar10 and ImageNet datasets. Compared with four SOTA baselines, experiment results confirm the superior performance of DASGD. More specifically, our DASGD has nearly the same test accuracy as SGD on Cifar10 and ImageNet, , and only uses around 27.6% and 40.8% training time that of SGD, respectively. |
关键词 | ASGD DASGD Stale compensation Stale penalty |
DOI | 10.1016/j.ins.2024.121220 |
发表期刊 | INFORMATION SCIENCES |
ISSN | 0020-0255 |
卷号 | 681页码:16 |
通讯作者 | Xie, Hong(xiehong2018@foxmail.com) |
收录类别 | SCI |
WOS记录号 | WOS:001302691400001 |
语种 | 英语 |