#5851. Improving scalability of parallel CNN training by adaptively adjusting parameter update frequency

July 2026publication date
Proposal available till 17-05-2025
4 total number of authors per manuscript0 $

The title of the journal is available only for the authors who have already paid for
Journal’s subject area:
Theoretical Computer Science;
Computer Networks and Communications;
Hardware and Architecture;
Software;
Artificial Intelligence;
Places in the authors’ list:
place 1place 2place 3place 4
FreeFreeFreeFree
2350 $1200 $1050 $900 $
Contract5851.1 Contract5851.2 Contract5851.3 Contract5851.4
1 place - free (for sale)
2 place - free (for sale)
3 place - free (for sale)
4 place - free (for sale)

Abstract:
Synchronous SGD with data parallelism, the most popular parallelization strategy for CNN training, suffers from the expensive communication cost of averaging gradients among all workers. The iterative parameter updates of SGD cause frequent communications and it becomes the performance bottleneck. In this paper, we propose a lazy parameter update algorithm that adaptively adjusts the parameter update frequency to address the expensive communication cost issue. Our algorithm accumulates the gradients if the difference of the accumulated gradients and the latest gradients is sufficiently small. The less frequent parameter updates reduce the per-iteration communication cost while maintaining the model accuracy.
Keywords:
Communication cost; Data parallelism; Deep learning; Parameter update frequency

Contacts :
0