Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
About This Book
- Comprehend the intricacies of architecting a Data Lake and build a data strategy around your current data architecture
- Efficiently manage vast amounts of data and deliver it to multiple applications and systems with a high degree of performance and scalability
- Packed with industry best practices and use-case scenarios to get you up-and-running
Who This Book Is For
This book is for architects and senior managers who are responsible for building a strategy around their current data architecture, helping them identify the need for a Data Lake implementation in an enterprise context. The reader will need a good knowledge of master data management and information lifecycle management, and experience of Big Data technologies.
What You Will Learn
- Identify the need for a Data Lake in your enterprise context and learn to architect a Data Lake
- Learn to build various tiers of a Data Lake, such as data intake, management, consumption, and governance, with a focus on practical implementation scenarios
- Find out the key considerations to be taken into account while building each tier of the Data Lake
- Understand Hadoop-oriented data transfer mechanism to ingest data in batch, micro-batch, and real-time modes
- Explore various data integration needs and learn how to perform data enrichment and data transformations using Big Data technologies
- Enable data discovery on the Data Lake to allow users to discover the data
- Discover how data is packaged and provisioned for consumption
- Comprehend the importance of including data governance disciplines while building a Data Lake
In Detail
A Data Lake is a highly scalable platform for storing huge volumes of multistructured datalC–