Mark Rittman is joined by Dylan Baker, freelance analytics consultant, to talk about thinking probabalistically, analytics within venture-funded startups, devops and its role in scaling-out the modern BI stack.
Mark Rittman is joined by Matthew Halliday to talk about the challenge of ETL and analytics on complex relational OLTP data models, previous attempts to solve these problems with products such as Oracle Essbase and Oracle E-Business Suite Extensions for Oracle Endeca and how those experiences, and others, led to his current role as co-founder and VP of Products at Incorta.
Mark Rittman is joined by returning Special Guest Jake Stein, former co-founder and CEO of Stitch and now SVP of Stitch at Talend to talk about the evolution of the data pipeline-as-a-service, data catalogs and data governance and the vision behind Talend’s acquisition of Stitch.
Mark Rittman is joined by returning Special Guest Mark Grover to talk about his move from Cloudera and product engineering to a product manager role at Lyft; analytics use-cases in the ride-sharing industry; and the move from conversations about ETL tools, technology and engines to templates, paradigms and developer productivity.
- Mark Grover LinkedIn Profile and Github Profile
- "Hadoop Application Architectures"
- "Drill to Detail Ep. 7 'Apache Spark and Hadoop Application Architectures'
- Lyft Engineering Blog
- "Software Engineer to Product Manager" blog by Gwen Shapira
- "Introduction to the Oracle Data Integrator Topology" from the Oracle Data Integrator docs site
- Apache Airflow and Amazon Kinesis homepages
- "Experimentation in a Ridesharing Marketplace" by Nicholas Chamandy, Head of Data Science at Lyft
- "How Uber Eats Works with Restaurants"
- "Deliveroo has built a bunch of tiny kitchens to feed more hungry Londoners" - Wired.co.uk
Mark Rittman is joined by Will Davis from Trifacta to talk about the public beta of Google Cloud Dataprep, Trifacta's data wrangling platform and topics including metadata management, data quality and data management for big data and cloud data sources.
- Google Cloud Dataprep on Google Cloud Platform
- "Google Cloud Dataprep: Spreadsheet-Style Data Wrangling Powered by Google Cloud Dataflow"
- "A New Cloud-Based Data Prep Solution from Google & Trifacta"
- Trifacta website
- "A Breakthrough Approach to Exploring and Preparing Data"
- Trifacta platform architecture
- "Garbage In, Garbage Out: Why Data Quality Matters"
- "How to Put an Effective Metadata Strategy in Place"
Mark Rittman is joined in this episode by Taylor Brown from Fivetran to talk about middleware for SaaS data, their focus on integrations with SaaS vendors and how this differentiates their offering, his thoughts on packaged analytic applications announced at the recent Looker Join conference ... and where the name "Fivetran" came from.
In this episode Mark is joined by Jake Stein to talk about Stitch Data and their ETL tool for data engineers, the new open-source project Singer and his experiences building a software startup that both partners and competes with the big cloud platform vendors.
- Stitch Data
- Singer: Simple, Composable Open-Source ETL
- Setting the Data Strategy for Your Growing Organization
- The State of Data Engineering
- The State of Data Science
- Why our ETL Tool Doesn't Do Transformations
- Airflow: a workflow management platform
- Goodbye RJMetrics, Hello Fishtown Analytics
- Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department
In this episode Mark is joined by Tristan Handy from Fishtown Analytics to talk about building-out analytics functions in high-growth startups, and three related blog posts he wrote on this topic.
Stewart Bryson returns to the show to join Mark Rittman to discuss new-world BI and data warehousing development using Google BigQuery and Amazon Athena, Apache Kafka and StreamSets, and talks about his experiences with Looker, the cloud-native BI tool that brings semantic modeling and modern development practices to the world of business intelligence.
Mark Rittman is joined by Gwen Shapira from Confluent to talk about Apache Kafka, streaming data integration and how it differs from batch-based, GUI-developed ETL development, the problem with architects, exactly-once processing and how data governance is coming to Kafka development with Confluent's new schema registry server.