Patricks Data Journey

Posts

Maximizing Efficiency: Streamline Your Data Processing with Azure Event-Driven Ingestion

January 25, 2023

Event-driven data is a powerful concept that allows you to process new information as soon as it becomes available. In this blog post, we will explore how to use Azure services such as Logic Apps, Event Grid, and Blob Storage to create an event-driven data ingestion system. The first step in this process is to create fake order data using a Python tool that simulates an API and generates JSON files. These files are then uploaded to an Azure Blob Storage container. Once the files are uploaded, Azure Event Grid is configured to detect the new files and trigger a workflow using Logic Apps. The Logic App reads the JSON files and ingests the data into an Azure SQL Database. This allows the data to be available in a relational database for real-time querying and analysis. Also we will walk through the process of setting up this event-driven data ingestion system, step by step, using Azure services. We will cover how to configure Event Grid and Logic Apps, and how to ingest t...

External Data: A Guide to Working with Data Outside of Snowflake

January 21, 2023

Today, we'll explore how to work with data outside of Snowflake. Specifically, we'll walk through the steps of creating a stage to external files in an Azure Blob container, creating a file format for it, and then creating a view on it using data on one of my favorite topics: Assassin's Creed. We'll begin by creating a stage to the external files in an Azure Blob container, which I'll share for free. This will allow us to access and manipulate the data stored within the container. Then we'll create a file format for the data in the container, ensuring that Snowflake can properly interpret and utilize the information. Once the stage and file format are running, we'll setup a view on the data. In this case, we'll focus on all Assassin's Creed games, including their release year and the name of their protagonist. By the end of this process, we'll have a good view of the Assassin's Creed franchise that can be easily queried and analyzed. Let...

Semi-Structured Data: How to work with nested JSON in Snowflake

January 19, 2023

As the amount of data in the world continues to grow, the formats in which it is stored have also evolved. Gone are the days of strictly structured data. Instead we are now dealing more and more with semi-structured formats such as XML and JSON. These formats, while providing more flexibility and expressiveness, can also make working with the data more challenging. With APIs outputting JSON data, it's essential to have a robust and efficient way to handle this data. Snowflake offers a solution to this problem by providing an easy-to-use way for working with semi-structured and nested semi-structured data. In this post, we will explore how to use Snowflake to work with JSON data, using one of my favorite topics - Star Wars - as an example. Join me as we dive into the world of Star Wars characters and discover how to handle this data. Before we can start working, we first need to create a new database to hold our data. In Snowflake, we can ...

Streaming Data with Kafka in Python: A small walkthrough

January 18, 2023

Kafka is a powerful message streaming platform that allows for the efficient exchange of data between systems and applications. In this blog post, we will walk through the process of setting up a Kafka environment, creating a topic, and creating both a consumer and producer in Python. Setting up a Kafka environment: First, we will use Docker with a docker-compose file to create a Kafka broker and zookeeper. A Kafka broker is a server that runs Kafka and handles all the incoming and outgoing data from and to the clients, while zookeeper is a distributed coordination service that is often used in conjunction with Kafka to manage the distributed nature of a Kafka cluster. To get started, you will need to have Docker installed on your machine. You can follow this guide to install and start Docker on your PC. I'm using Docker Desktop. https://docs.docker.com/desktop/install/windows-install/ Now lets begin. If you want you can just clone my repo or create a new project...

Data Transfer from Azure SQL to Snowflake: My Experience Using Slowly Changing Dimensions, Change Data Capture, and Column Data Masking with Fivetran

January 15, 2023

Creating a data pipeline from Azure SQL Server to Snowflake can be a complex and time-consuming task, but with the help of Fivetran, the process is made much simpler. In this blog post, we will discuss the steps involved in creating a robust pipeline that includes the use of Slowly Changing Dimensions Type 2 for historical data, Change Data Capture for transferring only new data to Snowflake, and Column Data Masking for hiding Personally Identifiable Information (PII) for security reasons. We use the Azure SQL Database like it is an OLTP Database from which we want to pull the data into the OLAP System. This Blog is split into multiple parts: Azure SQL Server Setup Snowflake Setup Fivetran Setup Results The End Slowly Changing Dimensions Type 2 is a technique used to track changes to data over time. It allows us to maintain a history of changes to data, which is essential when working with large datasets. By using this method, we can ensure that our data is accurate and up t...