Cloud Computing Projects

Abstract:

Spark, Tez, and Piccolo use memory caches heavily. Data analytics clusters must efficiently manage caches due to their large performance impact and small sizes. However, most data analytics systems use LRU and LFU cache management policies that ignore the application semantics of data dependency, expressed as directed acyclic graphs (DAGs).

Without this knowledge, cache management can only “guess” future data access patterns based on history, resulting in inefficient, erroneous caching with a low hit rate and a long response time. Worse, without data dependency knowledge, cluster applications cannot retain their all-or-nothing cache property, which means a compute task cannot be sped up unless all dependent data is in main memory.

Least Reference Count (LRC), a new cache replacement policy, uses application data dependency information to optimize cache management. LRC always evicts the data block with the lowest reference count, which is the number of uncomputed dependent child blocks.

LRC also incorporates the all-or-nothing requirement by coordinating the reference counts of all input data blocks for the same computation. Empirical analysis and cluster deployments against popular benchmarking workloads prove LRC’s efficacy.

Our Spark implementation shows that the proposed policies meet the all-or-nothing requirement and significantly improve cache performance. LRC outperforms LRU and MEMTUNE in caching typical workloads in production clusters by 22 and 284 percent, respectively.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

You may also like: