Optimizing Speculative Execution in Spark Heterogeneous Environments

Abstract:

Slow tasks in Spark computing environments extend stage execution time. Spark uses speculative execution to solve the straggler problem by launching extra backup for the straggler to finish early. The Spark original and improved speculative execution strategies cannot solve this problem due to task characteristics and runtime environment complexity.

This paper introduces ETWR to improve Spark’s speculative execution efficiency. When we address straggler identification, backup node selection, and effectiveness guarantee in speculative execution, we consider the heterogeneous environment.

First, we divide the task into sub-phases based on task type and use process speed and progress rate within a phase to find the straggler quickly.

Second, we estimate the task’s execution time using the Locally Weighted Regression model to calculate its remaining and backup time.

Third, we present iMCP model to ensure speculative task effectiveness and node load balancing.

Finally, backup nodes are chosen based on speed and location. ETWR reduces job execution time by 23.8 percent and cluster throughput by 33.2 percent compared to Spark-2.2.0, according to extensive experiments.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

Ameerpet

We are South India’s largest edu-tech company and Training institute in Hyderabad, India, proudly serving as the creator of a unique and innovative live project training platform for students, engineers, and researchers.

Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing

Abstract: A general-purpose, assumption-free imputation method for multivariate missing data, fractional hot-deck imputation (FHDI) fills each missing item with multiple observed values without using artificial values. FHDI J. Im, I. Cho, and J. K. Kim,…

Rice Mapping Using RADARSAT-2 Dual- and Quad-Pol Data in a Complex Land-Use Watershed Cau River Basin

Abstract: SAR can map paddy rice crops in Southern Vietnam’s Mekong Delta due to rice’s strong temporal backscatter signature. Due to the complex land-use/land-cover in the Red River Delta, SAR rice mapping is difficult. Hydrological…

Exploring Potential and Feasibility of Binary Code Sharing in Mobile Computing

Abstract: Mobile apps’ rapid growth brings performance and energy issues. However, application-level code sharing is limited. This paper designs, implements, and deploys mobile device transparent machine code sharing. SnapCode allows machine code sharing across many…

Trust-based Scheduling Framework for Big Data Processing with MapReduce

Abstract: Cloud computing platforms risk data leakage, so security and privacy are major concerns. Data can leak while in storage, processing, or moving within a cloud or between cloud infrastructures, such as private to public…

T-PAIR Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation

Abstract: We study an automatic hypothesis generation (HG) problem, which involves discovering meaningful implicit connections between scientific terms like diseases, chemicals, drugs, and genes extracted from biomedical publication databases. Most previous studies of this problem…

Improving I/O Complexity of Triangle Enumeration

Abstract: In the age of big data, many graph algorithms must operate in external memory and maintain performance regardless of problem size. Triangle listing algorithms must carefully combine edges from multiple partitions to detect cycles…

Optimizing Speculative Execution in Spark Heterogeneous Environments

Abstract:

Did you like this final year project?

Parallel Fractional Hot-Deck Imputation and Variance Estimation for Big Incomplete Data Curing

Rice Mapping Using RADARSAT-2 Dual- and Quad-Pol Data in a Complex Land-Use Watershed Cau River Basin

Exploring Potential and Feasibility of Binary Code Sharing in Mobile Computing

Trust-based Scheduling Framework for Big Data Processing with MapReduce

T-PAIR Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation

Improving I/O Complexity of Triangle Enumeration

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects

Abstract:

Did you like this final year project?

You may also like:

RESOURCES

COMPANY

WORK WITH US

Ameerpet Courses

Ameerpet Trainings

Ameerpet Projects