CuWide Towards Efficient Flow-based Training for Sparse Wide Models on GPUs
Abstract: Recommendation, CTR prediction, and image recognition use wide models like generalized linear models and factorization-based models. The memory-bounded models limit CPU performance improvement. GPUs have many computation units and high memory bandwidth, making them…