Robust and Skew-resistant Parallel Joins in Shared-Nothing Systems
Refereed Conference Meeting Proceeding
The performance of joins in parallel database management systems is critical for data intensive operations such as query- ing. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and per- formance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be fur- ther improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed imple- mentation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically, compared to the state-of- art PRPD method, we achieve 16% − 167% performance improvement and 24% − 54% less network communication under different join workloads.
Conference on Conference on Information and Knowledge Management
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
Digital Object Identifer (DOI):
United States of America
Open access repository: