You are here

Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems


Long Cheng, Spyros Kotoulas, Tomas Ward, Georgios Theodoropoulos

Publication Type: 
Refereed Conference Meeting Proceeding
The performance of joins in parallel database management systems is critical for data intensive operations such as query- ing. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and per- formance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be fur- ther improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed imple- mentation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of- art PRPD method, we achieve 16% − 167% performance improvement and 24% − 54% less network communication under different join workloads.
Conference Name: 
Conference on Conference on Information and Knowledge Management
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
Digital Object Identifer (DOI): 
Publication Date: 
Conference Location: 
United States of America
Research Group: 
Open access repository: