Towards Dependency-Aware Cache Management for Data Analytics Applications

Abstract:

Spark, Tez, and Piccolo use memory caches heavily. Data analytics clusters must efficiently manage caches due to their large performance impact and small sizes. However, most data analytics systems use LRU and LFU cache management policies that ignore the application semantics of data dependency, expressed as directed acyclic graphs (DAGs).

Without this knowledge, cache management can only “guess” future data access patterns based on history, resulting in inefficient, erroneous caching with a low hit rate and a long response time. Worse, without data dependency knowledge, cluster applications cannot retain their all-or-nothing cache property, which means a compute task cannot be sped up unless all dependent data is in main memory.

Least Reference Count (LRC), a new cache replacement policy, uses application data dependency information to optimize cache management. LRC always evicts the data block with the lowest reference count, which is the number of uncomputed dependent child blocks.

LRC also incorporates the all-or-nothing requirement by coordinating the reference counts of all input data blocks for the same computation. Empirical analysis and cluster deployments against popular benchmarking workloads prove LRC’s efficacy.

Our Spark implementation shows that the proposed policies meet the all-or-nothing requirement and significantly improve cache performance. LRC outperforms LRU and MEMTUNE in caching typical workloads in production clusters by 22 and 284 percent, respectively.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

Ameerpet

We are South India’s largest edu-tech company and Training institute in Hyderabad, India, proudly serving as the creator of a unique and innovative live project training platform for students, engineers, and researchers.

Robust Fuzzy Learning for Partially Overlapping Channels Allocation in UAV Communication Networks

Abstract: The emerging cellular-enabled unmanned aerial vehicle (UAV) communication paradigm challenges UAV application research due to the dynamic characteristics of new aerial users. The high mobility of UAV nodes and the unexpected disturbance of external…

An In-Network Replica Selection Framework for Latency-Critical Distributed Data Stores

Abstract: In cloud-deployed distributed data stores, server performance fluctuates. Thus, the reading request replica directly impacts response latency. Key-value stores are latency-critical, making replica selection difficult. Small data stores require clients to choose replicas. Even…

Multi-Access Filtering for Privacy-Preserving Fog Computing

Abstract: Military and government agencies are becoming more interested in fog computing. The Internet-of-Things (IoT) and society’s interconnectivity contribute to this. Fog computing deployment requires privacy protection. This paper introduces fog-based access filter (FAF), a…

Energy Utilization Task Scheduling for MapReduce in Heterogeneous Clusters

Abstract: Energy costs dominate cloud computing today. Energy-aware task scheduling is crucial. Heterogeneous clusters can save energy by scheduling tasks based on deadlines, data locality, and resource utilization. Task list construction, scheduling, and slot list…

Joint Transceiver Beamforming Design for Hybrid Full-Duplex and Half-Duplex Ad-Hoc Networks

Abstract: This project proposes a joint transceiver beamforming design method for hybrid full-duplex (FD) and half-duplex (HD) ad-hoc networks to cancel co-channel interference and improve system spectral efficiency. Using a stochastic geometry model, we derive…

Say No to Price Discrimination: Decentralized and Automated Incentives for Price Auditing in Ride-Hailing Services

Abstract: Ride-hailing, the most successful sharing economy application, serves millions of users daily. Big data helps ride-hailing service providers (SPs) improve their services by collecting users’ personal data. SPs may also use user data to…

Towards Dependency-Aware Cache Management for Data Analytics Applications

Abstract:

Did you like this final year project?

Robust Fuzzy Learning for Partially Overlapping Channels Allocation in UAV Communication Networks

An In-Network Replica Selection Framework for Latency-Critical Distributed Data Stores

Multi-Access Filtering for Privacy-Preserving Fog Computing

Energy Utilization Task Scheduling for MapReduce in Heterogeneous Clusters

Joint Transceiver Beamforming Design for Hybrid Full-Duplex and Half-Duplex Ad-Hoc Networks

Say No to Price Discrimination: Decentralized and Automated Incentives for Price Auditing in Ride-Hailing Services

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects

Abstract:

Did you like this final year project?

You may also like:

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects