Python Machine Learning Projects

Abstract:

Data similarity (or distance) computation is a fundamental research topic that underpins many high-level machine learning and data mining applications based on similarity measures. Big data’s “3V” nature—volume, velocity, and variety—makes exact similarity computation difficult in large-scale real-world scenarios. Theoretically and practically, hashing techniques estimate similarity efficiently. Weighted MinHash is a popular method for estimating the generalized Jaccard similarity of weighted sets and binary sets. This review classifies and discusses weighted MinHash algorithms. In this review, we mainly categorize weighted MinHash algorithms into quantization-based approaches, “active index”-based ones, and others, and show their evolution and inherent connection from integer to real-valued. We released a Python algorithm toolbox on github. We test the standard and weighted MinHash algorithms in similarity estimation error and information retrieval.

Note: Please discuss with our team before submitting this abstract to the college. This Abstract or Synopsis varies based on student project requirements.

Did you like this final year project?

To download this project Code with thesis report and project training... Click Here

You may also like: