File(s) not publicly available
The scalability of volunteer computing for MapReduce big data applications
Volunteer Computing (VC) has been successfully applied to many compute-intensive scientific projects to solve embarrassingly parallel computing prob-lems. There exist some efforts in the current literature to apply VC to data-intensive (i.e. big data) applications, but none of them has confirmed the scalability of VC for the applications in the opportunistic volunteer envi-ronments. This paper chooses MapReduce as a typical computing paradigm in coping with big data processing in distributed environments and models it on DHT (Distributed Hash Table) P2P overlay to bring this computing para-digm into VC environments. The modelling results in a distributed prototype implementation and a simulator. The experimental evaluation of this paper has confirmed that the scalability of VC for the MapReduce big data (up to 10TB) applications in the cases, where the number of volunteers is fairly large (up to 10K), they commit high churn rates (up to 90%), and they have heterogeneous compute capacities (the fastest is 6 times of the slowest) and bandwidths (the fastest is up to 75 times of the slowest).