The experimental study of performance impairment of big data processing in dynamic and opportunistic environments
journal contribution
posted on 2021-07-19, 04:18authored byWei LiWei Li, Wanwu Guo
In contrast to HPC clusters, when big data is processing in a distributed, particularly dynamic and opportunistic environment, the overall performance must be impaired and even bottlenecked by the dynamics of overlay and the opportunism of computing nodes. The dynamics and opportunism are caused by churn and unreliability of a generic distributed environment, and they cannot be ignored or avoided. Understanding impact factors, their impact strength and the relevance between these impacts is the foundation of potential optimization. This paper derives the research background, methodology and results by reasoning the necessity of distributed environments for big data processing, scrutinizing the dynamics and opportunism of distributed environments, classifying impact factors, proposing evaluation metrics and carrying out a series of intensive experiments. The result analysis of this paper provides important insights to the impact strength of the factors and the relevance of impact across the factors. The production of the results aims at paving a way to future optimization or avoidance of potential bottlenecks for big data processing in distributed environments.