Python Machine Learning Projects

Abstract:

Recommendation, CTR prediction, and image recognition use wide models like generalized linear models and factorization-based models. The memory-bounded models limit CPU performance improvement. GPUs have many computation units and high memory bandwidth, making them ideal for training machine learning models. Due to sparsity and irregularity, GPU training for wide models is suboptimal. GPU-based wide models are even slower than CPU ones. Wide model training is not optimized for GPU architecture, which has many random memory accesses and redundant read/write of intermediate values. We present cuWide, an efficient GPU-training framework for large-scale wide models. CuWide uses a new flow-based schema for training to fully utilize GPU memory hierarchy and reduce GPU global memory communication. To efficiently realize the flow-based schema, we use a bigraph computation model and three flexible programming interfaces. We use the 2D partition of mini-batch (in sample and feature dimensions) with proposed graph abstraction to optimize GPU memory access for sparse data and apply several spatial-temporal caching mechanisms (importance-based model caching and cross-stage accumulation caching) to achieve a high-performance kernel. We propose GPU-oriented optimizations like feature-oriented data layout to improve data locality, replication mechanism to reduce update conflicts in shared memory, and multi-stream scheduling to overlap data transferring and kernel computing to efficiently implement cuWide. We demonstrate that cuWide is more than 20x faster than current GPU and multi-core CPU solutions.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

You may also like: