Mark is joined in this episode of Drill to Detail by Wes McKinney, to talk about the origins of the Python Pandas open-source package for data analysis and his subsequent work as a contributor to the Kudu (incubating) and Parquet projects within the Apache Software Foundation and Arrow, an in-memory data structure specification for use by engineers building data systems and the de-facto standard for columnar in-memory processing and interchange.Read More
Mark is joined in this episode by Avi Zloof from Evaluex to talk about the new world of elastically-provisioned cloud-hosted analytic databases such as Google BigQuery and Amazon Athena, how their pricing model and vendor strategy differs from the traditional database vendors, and how machine learning can be used to automate performance tuning and optimize workloads in this new world of large-scale distributed query and storage.
Mark is joined in this episode by Google Cloud Platform Developer Advocate Felipe Hoffa, talking about getting started as a developer using Google BigQuery along with Google Cloud Dataflow, Google Cloud Dataprep and Google Cloud Platform's machine learning APIs.
Mark Rittman is joined by Industry Analyst Mark Madsen to talk about marketing analytics and the rise of the omni-channel consumer, the use of AI in analytics and personalization and what this all means for brands, for advertisers and for marketers.
Mark Rittman is joined by Donald Farmer to talk about his work at Microsoft on SQL Server Analysis Services and Integration Services, why he moved to Qlik and the challenges of evolving a BI product strategy from focusing on desktops to focusing on the enterprise, and some advice for customers, software vendors and partners working with data and analytics tools.
In this episode Mark is joined by Tristan Handy from Fishtown Analytics to talk about building-out analytics functions in high-growth startups, and three related blog posts he wrote on this topic.
Mark Rittman is joined by Gwen Shapira from Confluent to talk about Apache Kafka, streaming data integration and how it differs from batch-based, GUI-developed ETL development, the problem with architects, exactly-once processing and how data governance is coming to Kafka development with Confluent's new schema registry server.
Mark Rittman is joined by Maxime Beauchemin to talk about analytics and data integration at Airbnb, the Apache Airflow and Superset open-source projects he helped launch and now works with day-to-day at Airbnb , and his recent Medium article on "The Rise of the Data Engineer".
- "The Rise of the Data Engineer" blog by Maxime Beauchemin
- Apache Airflow
- Airbnb Superset
- "Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department" blog by Jeff Magnusson
Mark Rittman is joined by Timo Elliott, originally of Business Objects and now Innovation Evangelist for SAP, to talk about the origins of self-service BI with Business Objects' innovative "Universe" and the role analytics now plays within SAP; why analytics is the most important function within your organization and why the vast majority of analytics is still reporting (which isn't so bad); and the role AI and other innovations will play in analytics going in the future.
Mark Rittman is joined by Daniel Mintz from Looker to talk about BI and analytics on Google BigQuery, data modelling on the new generation of cloud-based distributed-data warehousing platforms, and Looker's re-introduction of semantic models to big data analytics developers.