Nettkurs 12. oktober New Big Database Technologies

Bli med på kurset "New Big Database Technologies - Market Overview of Technologies and Products" med den anerkjente forfatteren, analytikeren, konsulenten og kursholderen, Rick van der Lans.

arrangeres av Frances K. D’Silva og resten av fagfolkene fra faggruppen BI & Analytics

Meld på

NB: Seminaret holdes på engelsk.

Geared to:

• IT architects	• IT consultants
• Database specialists	• IT strategists
• Big data specialists	• Systems analysts
• BI specialists	• Database developers
• Data warehouse designers	• Database administrators
• Technology planners	• Solutions architects
• Technical architects	• Data architects
• Enterprise architects

With the introduction of big data and cloud platforms, a tsunami of new technologies and products for data storage, processing, and analytics has been introduced. Hadoop, Spark, NoSQL, NewSQL, triplestores, SQL-on-Hadoop are just a few of the countless technologies that have become available for developing big data systems. But also so many new powerful database engines have entered the market, including Amazon Athena, Cloudera, Exasol, Google BigQuery, Microsoft Synapse, MongoDB, Neo4j, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

Most organizations have many questions. How mature are all these new technologies? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in existing data warehouse architecture? Should they be used to develop data lakes? Are they the perfect platforms for data science, or for operational BI?

This seminar gives a clear, extensive, and critical overview of all the new key technologies for storing, processing, and analyzing big data. Technologies are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given. It’s the perfect update for those interested in the new market of big data technology.

Subjects

Big Data: State of the art

What exactly do we mean with big data?
The key application area of big data: business analytics
Differences between semi-structured, poly-structured, multi-structured, and unstructured data
Big data systems require specialization of database engines

Analytical SQL Database Servers

Classification of analytical SQL database servers, and can they compete with NoSQL products?
Techniques to improve performance and scalability, including column-based storage, sharding, in-memory analytics, and query compilation
How important is in-database analytics?
Is loading databases into internal memory the solution? Is it feasible?
Market overview, including Amazon Athena, Exasol, Google BigQuery, HP/Vertica, Microsoft Synapse, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

The World of Hadoop and Spark

The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
Characteristics and consequences of HDFS and file formats
Alternative implementations by Amazon, Google, and Microsoft
Kafka for fast messaging

NoSQL Database Stores

Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
It’s all about data scalability and performance
Why is schema-on-read more flexible than schema-on-write?
Are NoSQL products really database servers?
Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, MongoDB, and Neo4J

Exploring Data in Hadoop Using SQL

Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, Pivotal HawQ, Spark SQL and Splice Machine
Data virtualization for unleashing the information hidden in NoSQL and SQL systems

NewSQL and Translytical SQL database servers for transaction workloads

NewSQL database servers are designed for high-performance transactional systems
Simpler transaction mechanisms
The challenge of multi-table joins
Market overview, including CitusDB, Clustrix, and SingleStore

Concluding Remarks

What You Will Learn:

Why traditional database technology is not “big” enough
How analytical SQL engines can help to simplify data architectures
How different are Hadoop and NoSQL from traditional technology
How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
How to embed Hadoop technologies in existing BI systems
How Spark can boost performance for analytics
How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
Why graph databases are very different from all other systems
When to use NewSQL or NoSQL for developing transactional systems
How to simplify data access through SQL-on-Hadoop engines
When to use which new data storage technology and the pros and cons of each solution
Which products and technologies are winners and which are losers

InfiniteGraph: Extending Business, Social, and Government Intelligence with Graph Analytics; September 2010; sponsored by InfiniteGraph

Foredragsholdere:

Rick van der Lans

Rick van der Lans er en selvstendig analytiker og konsulent med datavarehus, virksomhetsinformasjon, Big Data og ulike former for databaseteknologier og dataarkitekturer som spesialfelt. Les...