Practical Guidelines for Designing Modern Data Architectures
Attend Rick van der Lans modern data architecture course! Rick van der Lans is a highly-respected independent analyst, consultant, author, and internationally acclaimed lecturer specializing in data architectures, data warehousing, business intelligence, big data, data virtualization, and database technology. He works for R20/Consultancy, which he founded in 1987.
arrangeres av DND og resten av fagfolkene fra faggruppen BI & Analytics
Should the new data architecture be based on a data lake, a more traditional data warehouse, a data hub, a data fabric, a data lakehouse, or a data mesh? Or should it be a combination? Should the architecture run in the cloud? Is it important to migrate to an analytical SQL database server, or deploy data warehouse automation? How can data streaming be included? What are the new requirements with respect to anonymization and other data privacy aspects? In general, what should a data architecture document contain? And where do you start the design process? So many questions must be answered when designing a new data architecture.
New data architectures are needed because organizations want to do more with data. Data must be deployed more widely, more efficiently, and more effectively to improve their business and decision-making processes and to increase their competitive power.
Technically, this implies that new forms of data usage need to be deployed, such as data science, real-time dashboarding, embedded BI, edge analytics, and customer-driven BI. Unfortunately, current IT systems such as the data warehouse and the transactional systems, cannot cope with these new, more intense, and resource-intense forms of data usage. The current systems for data delivery are already overstretched. Additionally, because they have become static and inflexible, implementing new reports, changing existing applications, and executing new forms of analytics have become time consuming exercises. In other words, the current data-architecture can’t cope with today’s wishes to do more with data.
Unfortunately, organizations consider designing new data architectures a challenge. This two-day seminar answers questions architects have when designing a modern data architecture. Guidelines, tips, and design rules are discussed. Concepts and technologies, such as data lakes, data hubs, data fabrics, big data, cloud, data virtualization, Hadoop, NoSQL, data catalog, data warehouse automation, and anonymization of data are explained. The seminar is based on practical experiences while designing and implementing modern data architectures. Also, the relationship between a modern data architecture and more organizational aspects are addressed as well, including data quality, data governance, data strategy, and migration to the new architecture.
Subjects
Part 1: Introduction – What is a Data Architecture?
- Why a new data architecture?
- What are the key elements of a data architecture?
- What are the differences between a data architecture and a solutions architecture?
- Benefits, drawbacks, and shortcomings of well-known reference architectures, such as the classic data warehouse architecture, the data lake, data hub, and data mesh
- The impact of new technology on data architectures – the holistic approach to designing data architectures
- 10 steps to design a data architecture
Part 2: Initial Phases of the Project
- Determine the real business motivations for a new data architecture: ICT cost reduction, competitive improvement, new business model, new laws and regulations, improving reaction speed to business demands, or a more efficient exploitation of available data?
- Relationship with business strategy and data strategy
- Determine new requirements and constraints
- Analyze the existing environment
- Determine maturity level of the IT organization
Part 3: Overview of Technologies and Products that Influence Data Architectures
- Data storage: analytical SQL, NoSQL, Hadoop, translytical SQL
- Data integration: ETL, data virtualization, data replication, data warehouse automation, enterprise service bus, API gateway
- Data streaming: messaging, Kafka, streaming SQL
- Data documentation: data glossary, data catalog, metadata management
- Reporting tools: self-service BI, dashboards, embedded BI
- Data science tools: programming languages, such as R and Python, machine learning automation tools, data science workbenches
- Data security: anonymization, authorization
Part 4: Design Principles for Data Architectures
- First the technology or first the data architecture?
- The importance of data processing specifications for integration, filtering, correcting, aggregation, masking, transformation of data
- Why migration to the cloud: unburdening, high performance, scalability, available software?
- Data minimization: a new principle for designing data architectures that focus on minimizing data copying resulting in more data-on-demand data architectures
- Are all software products suitable for the cloud?
- Design principles for dealing with data history and data cleansing
- Modernization of a classic data warehouse architecture
- Generating a data warehouse architecture with data warehouse automation tools
- New requirements for transactional systems, such as storing historic data and continuous logging
- The influence of GDPR: deleting customer data
Part 5: Innovative New Data Architectures
- The logical data warehouse architecture as an agile alternative
- Design rules, do’s and don’ts for a logical data warehouse architecture
- The changing role of the data lake: From a single-purpose to a multi-purpose data lake
- Processing and sharing operational data with a data hub
- A data lakehouse to support the BI use case and a data science use case
- Developing a data mesh to avoid a centralized, monolithic database
- Requirements for implementing data science models, such as transparency, immutability, and version control
- A data streaming architecture; when every microsecond counts
- Technical challenges: performance, inconsistent data streams, storing massive amounts of messages for analytics afterwards
- Operationalization of data science models
- Merging data architectures to one unified data delivery platform
- Differences between data lake, data hub, and data warehouse
- The data fabric architecture for frictionless access to data
Part 6: Designing a new Data Architecture
- Using track diagrams to design a new data architecture
- Data processing specifications are key to the architecture, they are the intellectual property of an organization
- Focus on the data processing specifications first, before data storage components, such as data lakes, hubs and warehouses are introduced
- Breaking the development of the architecture in small steps – think big, act small
- A metadata architecture is as important as a data architecture
- Determine the implementation approach
- Tips for selecting new products and technologies
- Prepare the organization for the new data architecture
Part 7: Closing Remarks
Learning Objectives
- What are the steps to take to come up with the perfect data architecture? From requirement analysis via proof of concepts to a data architecture.
- What is the importance of a holistic approach to analyzing technology, organization, and architecture in conjunction?
- What are real life examples of new data architectures?
- How can the new technology use optimally within a new data architecture?
- How do you develop a data architecture?
- Which components make up a data architecture?
- What are the use cases, pros and cons of new technologies and how do they influence data architectures?
- What is the value of well-known reference architectures, such as the logical data warehouse architecture, the data lake, the data hub, and the data mesh?
- What are the right criteria for a data architecture?
Related Articles and Blogs:
Part 1: Drowning in Data Delivery Systems, May 2018
Part 2: Key Benefits of a Unified Data Delivery Platform, June 2018
Part 3: How Siloed Data Delivery Systems Were Born, June 2018
Part 4: Big Data is Not the Biggest Change in IT, June 2018
Part 5: Requirements for a Unified Data Delivery Platform , June 2018
Part 6: A Unified Data Delivery Platform – A Summary, June 2018