Master the robust features of R parallel programming to accelerate your data science computations
About This Book
- Create R programs that exploit the computational capability of your cloud platforms and computers to the fullest
- Become an expert in writing the most efficient and highest performance parallel algorithms in R
- Get to grips with the concept of parallelism to accelerate your existing R programs
Who This Book Is For
This book is for R programmers who want to step beyond its inherent single-threaded and restricted memory limitations and learn how to implement highly accelerated and scalable algorithms that are a necessity for the performant processing of Big Data. No previous knowledge of parallelism is required. This book also provides for the more advanced technical programmer seeking to go beyond high level parallel frameworks.
What You Will Learn
- Create and structure efficient load-balanced parallel computation in R, using R's built-in parallel package
- Deploy and utilize cloud-based parallel infrastructure from R, including launching a distributed computation on Hadoop running on Amazon Web Services (AWS)
- Get accustomed to parallel efficiency, and apply simple techniques to benchmark, measure speed and target improvement in your own code
- Develop complex parallel processing algorithms with the standard Message Passing Interface (MPI) using RMPI, pbdMPI, and SPRINT packages
- Build and extend a parallel R package (SPRINT) with your own MPI-based routines
- Implement accelerated numerical functions in R utilizing the vector processing capability of your Graphics Processing Unit (GPU) with OpenCL
- Understand parallel programming pitfalls, such as deadlock and numerical instability, and the approaches to handle and avoid them
- Build a task farm master-worker, spatial grid, and hybrid parallel R programs
In Detail