Abstract:
Slow tasks in Spark computing environments extend stage execution time. Spark uses speculative execution to solve the straggler problem by launching extra backup for the straggler to finish early. The Spark original and improved speculative execution strategies cannot solve this problem due to task characteristics and runtime environment complexity.
This paper introduces ETWR to improve Spark’s speculative execution efficiency. When we address straggler identification, backup node selection, and effectiveness guarantee in speculative execution, we consider the heterogeneous environment.
First, we divide the task into sub-phases based on task type and use process speed and progress rate within a phase to find the straggler quickly.
Second, we estimate the task’s execution time using the Locally Weighted Regression model to calculate its remaining and backup time.
Third, we present iMCP model to ensure speculative task effectiveness and node load balancing.
Finally, backup nodes are chosen based on speed and location. ETWR reduces job execution time by 23.8 percent and cluster throughput by 33.2 percent compared to Spark-2.2.0, according to extensive experiments.
Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.
Did you like this final year project?
To download this project Code with thesis report and project training... Click Here