CuWide Towards Efficient Flow-based Training for Sparse Wide Models on GPUs

Abstract:

Recommendation, CTR prediction, and image recognition use wide models like generalized linear models and factorization-based models. The memory-bounded models limit CPU performance improvement. GPUs have many computation units and high memory bandwidth, making them ideal for training machine learning models. Due to sparsity and irregularity, GPU training for wide models is suboptimal. GPU-based wide models are even slower than CPU ones. Wide model training is not optimized for GPU architecture, which has many random memory accesses and redundant read/write of intermediate values. We present cuWide, an efficient GPU-training framework for large-scale wide models. CuWide uses a new flow-based schema for training to fully utilize GPU memory hierarchy and reduce GPU global memory communication. To efficiently realize the flow-based schema, we use a bigraph computation model and three flexible programming interfaces. We use the 2D partition of mini-batch (in sample and feature dimensions) with proposed graph abstraction to optimize GPU memory access for sparse data and apply several spatial-temporal caching mechanisms (importance-based model caching and cross-stage accumulation caching) to achieve a high-performance kernel. We propose GPU-oriented optimizations like feature-oriented data layout to improve data locality, replication mechanism to reduce update conflicts in shared memory, and multi-stream scheduling to overlap data transferring and kernel computing to efficiently implement cuWide. We demonstrate that cuWide is more than 20x faster than current GPU and multi-core CPU solutions.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

Ameerpet

We are South India’s largest edu-tech company and Training institute in Hyderabad, India, proudly serving as the creator of a unique and innovative live project training platform for students, engineers, and researchers.

Multisource-Refined Transfer Network for Industrial Fault Diagnosis Under Domain and Category Inconsistencies

Abstract: Unsupervised cross-domain fault diagnosis has been studied recently. It learns transferable features that reduce distribution inconsistency between source and target domains without target supervision. Most cross-domain fault diagnosis methods assume source and target fault…

Attention-Guided Discriminative Region Localization and Label Distribution Learning for Bone Age Assessment

Abstract: BAA can diagnose endocrine and metabolic disorders in children. Deep learning-based bone age classification methods use global images or local information from bounding boxes or key points. Training with the global image underutilizes discriminative…

DFR-TSD A Deep Learning Based Framework for Robust Traffic Sign Detection Under Challenging Weather Conditions

Abstract: Autonomous vehicle technology requires reliable traffic sign detection and recognition (TSDR). The importance of this task has led to extensive research and many promising methods in the literature. Most of these methods have been…

Stroke Risk Prediction with Hybrid Deep Transfer Learning Framework

Abstract: Without a cure, stroke is the leading cause of death and disability worldwide. Deep learning-based stroke risk prediction models can outperform existing models with large, well-labeled data. Stroke data is usually split among hospitals…

On Smart Gaze based Annotation of Histopathology Images for Training of Deep Convolutional Neural Networks

Abstract: Deep learning in histopathology requires large training datasets. Pathologists spend a lot of time labeling virtual slides, even though whole slide imaging scanners speed up data acquisition. Eye gaze annotations may speed slide labeling….

GREN Graph-Regularized Embedding Network for Weakly-Supervised Disease Localization in X-ray Images

Abstract: Annotating chest X-rays saves time and effort. Recent weakly-supervised algorithms like multi-instance learning (MIL) and class activation maps (CAM) attempted this task, but they often produce inaccurate or incomplete regions. One reason is the…

CuWide Towards Efficient Flow-based Training for Sparse Wide Models on GPUs

Abstract:

Did you like this final year project?

Multisource-Refined Transfer Network for Industrial Fault Diagnosis Under Domain and Category Inconsistencies

Attention-Guided Discriminative Region Localization and Label Distribution Learning for Bone Age Assessment

DFR-TSD A Deep Learning Based Framework for Robust Traffic Sign Detection Under Challenging Weather Conditions

Stroke Risk Prediction with Hybrid Deep Transfer Learning Framework

On Smart Gaze based Annotation of Histopathology Images for Training of Deep Convolutional Neural Networks

GREN Graph-Regularized Embedding Network for Weakly-Supervised Disease Localization in X-ray Images

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects

Abstract:

Did you like this final year project?

You may also like:

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects