Building a Multi-Purpose Logical Data Lake – The Engine Room of the Data-Driven Enterprise
  Mike Ferguson   Mike Ferguson
Managing Director
Intelligent Business Strategies Ltd


Monday, March 18, 2019
08:30 AM - 11:45 AM

Level:  Intermediate

Over the last several years there has been a lot of discussion about data lakes where a data lake was originally defined as a central data store on Hadoop holding raw data. Since then, new configurations of a data lake have emerged such as cloud storage and also a logical data lake made up of multiple data stores spread across data centre(s) and cloud(s) where data is captured in many organisations today.  Data lakes were initially seen as a place where raw data could be brought together to support data science. This is a single purpose use case. However, for many organisations, bringing together data just for data science is way too restrictive.  All that data is way too valuable to just set aside for a very small number of data scientists when there are many other purposes such a valuable collection of data could be used for. For example, it could be used to stage and process data to build a data warehouse. In addition, it could be used to build and maintain master data management (MDM) systems, or reference data management (RDM) systems. It could also be used to build a single customer view for marketing, all of which are in addition to data science. In that sense the data lake could be multi-purpose. It is this realisation that has opened up the idea that rather than build separate systems like data warehouses, MDM systems etc in silos, they could in fact turn the data lake into huge engine room data hub to produce all data assets needed to create a data driven enterprise. This session looks at this possibility and shows how companies can create a multi-purpose data lake to build reusable trusted data and analytical assets to enable rapid delivery of data warehouses, data marts, MDM, RDM, single customer view and data science. It not only looks at how companies can create these assets but also how they can publish them in a catalog to make them findable and how you can link these assets together as components to rapidly build data and analytical pipelines for competitive advantage

  • What is a data lake and how have they evolved?
  • Why create a data lake? - The data science use case
  • Limitations of a single purpose data lake
  • The benefits of a multi-purpose data lake
  • What is needed to build a multi-purpose data lake
  • Key technologies in a multi-purpose data lake
    • The data management platform / data fabric
    • The data catalog
    • Machine learning and advanced analytics
    • Data virtualisation
  • The importance of pipelines in a multi-purpose data lake
  • Building trusted re-usable data and analytical assets using pipelines
  • From data lake to multi-purpose logical data hub - Publishing assets to a catalog to fuel re-use
  • Orchestrating data and analytical assets in pipelines to rapidly deliver high value insights for competitive advantage

Mike Ferguson is the Managing Director of Intelligent Business Strategies. An independent IT industry analyst, he specializes in Data Management,  analytics, big data, and enterprise architecture. With over 40 years of experience, Mike has consulted for dozens of companies on BI/Analytics, data strategy, technology selection, enterprise architecture, and Data Management. Mike is also conference chairman of Big Data LDN, the largest data and analytics conference in Europe and a member of the EDMCouncil CDMC executuve advisory board. He has spoken at events all over the world and written numerous articles. He was formerly a principal and co-founder of Codd and Date – the inventors of the Relational Model, and a Chief Architect at Teradata. He teaches classes in: Data Warehouse Modernization, Big Data Architecture & Technology, Centralized Data Governance of a Distributed Data Landscape, Practical Guidelines for Implementing a Data Mesh, Embedded Analytics, Intelligent Apps & AI Automation, Migrating your Data Warehouse to the Cloud, Modern Data Architecture, and Data Virtualization.