Data Mining Projects

Large Scale Tensor Factorization via Parallel Sketches

Abstract: Tensor factorization methods are popular now. Directly modeling multi-relational data makes tensors attractive. We propose ParaSketch, a massively parallel tensor factorization algorithm for large tensors. Compress the large tensor into multiple small ones, decompose…

Data Mining Projects

Iterative Refinement for Multi-source Visual Domain Adaptation

Abstract: Multi-source domain adaptation requires reducing domain discrepancy between source domains and target domains and then assessing domain relevance to determine how much knowledge should be transferred. Most previous approaches ignored domain discrepancies and relevance….

Data Mining Projects

Improving I/O Complexity of Triangle Enumeration

Abstract: In the age of big data, many graph algorithms must operate in external memory and maintain performance regardless of problem size. Triangle listing algorithms must carefully combine edges from multiple partitions to detect cycles…

Data Mining Projects

Identifying User Relationship on WeChat Money-Gifting Network

Abstract: With the rise of online social networks, identifying or classifying real-life relationships between users has become useful for financial fraud detection. People in different relationships usually give each other meaningful gifts on different dates….

Data Mining Projects

Heuristic 3D Interactive Walks for Multilayer Network Embedding

Abstract: Network embedding solves network analytics. Methods focus on single-layered homogeneous or heterogeneous networks. Multilayer networks—heterogeneous networks with multiple edge/relation types—can naturally represent many real-world complex systems. Multilayer network embedding struggles to capture and use…

Data Mining Projects

HAMHAM: Hybrid Associations Models for Sequential Recommendation

Abstract: Given a user’s purchase/rating trajectories, sequential recommendation recommends the next few items they’re most likely to buy/review. It helps users choose their favorites. This manuscript uses hybrid associations models (HAM) to generate sequential recommendations…

Data Mining Projects

GloDyNE: Global Topology Preserving Dynamic Network Embedding

Abstract: Due to the time-evolving nature of many real-world networks, learning low-dimensional topological representations in dynamic environments is gaining attention. Dynamic Network Embedding (DNE) aims to efficiently update node embeddings while preserving network topology at…

Data Mining Projects

Fully Dynamic kk-Center Clustering With Improved Memory Efficiency

Abstract: Machine learning libraries need static and dynamic clustering algorithms. The sliding window model or simpler models have dominated dynamic machine learning and data mining algorithm development. Many real-world applications require arbitrary deletions and insertions….