Abstract:
Data uncertainty is a major issue in database systems due to application inaccuracies. Probabilistic databases can store uncertain data and provide confident answers through querying. When systems propagate uncertainty, query or mining results may be unreliable.
This paper uses crowdsourcing to design Human Intelligence Tasks (HITs) to improve uncertain data quality. In particular, crowds include workers with different HIT accuracy rates. We optimize data quality with low HITs.
Two obstacles make selecting the best HITs very computationally expensive. First, crowds may answer incorrectly with different probabilities. Second, uncertainty-decomposed HITs are often correlated.
This paper presents an efficient approximation algorithm and heuristic solution for crowds with diverse accuracy rates. We derive tight filtering and estimation lower and upper bounds to improve efficiency. Our solutions are tested on simulated and real crowdsourcing platforms.
Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.
Did you like this final year project?
To download this project Code with thesis report and project training... Click Here