Data Reliability for Data Lakes | Databricks

1 Просмотры

Успейте купить XRumer+XEvil выгодно! При покупке по этой ссылке скидка - 10$ c Лайт, - 20$ с Стандарт, - 30$ с Бизнес лицензии!


Понравился видеоролик "Data Reliability for Data Lakes | Databricks"? Если да, то напиши свой комментарий и поделись им в соцсетях. Тем самым ты поддержишь наш проект. Спасибо!

Описание видео: We created the Delta Lake project, open sourced under the Linux Foundation, to relieve data scientists and data engineers from these complex systems problems and instead enable them to focus on extracting value from data. His interests broadly include distributed systems, large-scale structured storage and query optimization.... ABOUT THE KEYNOTE SPEAKER Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. We\'ll discuss patterns that emerge when you can focus on data quality and the nitty gritty internals of ACID on Spark which enable this focus. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. In this talk, we\'ll dive into these challenges and how ACID transactions solve them. ABOUT THE KEYNOTE ( Building a modern data lake requires dealing with a lot of complexity: querying historical data + streaming data simultaneously (lambda architecture), validation to ensure data isn\'t too messy for data science and machine learning, reprocessing to handle failures, and ensuring ACID-compliant data updates.

Официальный источник видео именно от туда подгружается разрешенный автором к встраиванию контент и картинки. Авторские права полностью соблюдены!