External Data: A Guide to Working with Data Outside of Snowflake
Today, we'll explore how to work with data outside of Snowflake.
Specifically, we'll walk through the steps of creating a stage to external files in an Azure Blob container, creating a file format for it, and then creating a view on it using data on one of my favorite topics: Assassin's Creed.
This will allow us to access and manipulate the data stored within the container.
Then we'll create a file format for the data in the container, ensuring that Snowflake can properly interpret and utilize the information.
In this case, we'll focus on all Assassin's Creed games, including their release year and the name of their protagonist.
By the end of this process, we'll have a good view of the Assassin's Creed franchise that can be easily queried and analyzed.
Let's begin by setting up the necessary components.
First, we'll use the sysadmin role and create a new database and schema.
Once the stage is created, it's a good idea to check that it was created correctly:
To list all the files that exist in the stage, we can use the following command:
When loading data from a file in a stage, Snowflake automatically assigns a default column name to each column in the file, starting with $1 for the first column, $2 for the second column, and so on.
Using the $ symbol allows us to reference these automatically assigned columns in our query.
From this, we can see that the delimiter used in the file is ";", but also we can see that some entries have spaces before their name.
To fix this, we can create a new file format that includes a TRIM_SPACE parameter set to TRUE.
The FIELD_DELIMITER parameter is set to ";", which matches the delimiter used in the file.
The TRIM_SPACE parameter is set to TRUE to remove any unwanted spaces.
Now that we have a working query to select the data from the file, we can create a view on top of it for easy access in the future.
This view is a virtual table that references the data in the file, rather than storing a copy of it.
Additionally, by creating a view, we could also apply security policies and masking rules on it so that users can only see the data that they are authorized to see.
We can now easily query this view just like we would a regular table.
This is because a view in Snowflake simply reuses the query that was used to create it, rather than storing a copy of the data.
So, when the view is queried, the data is queried again and the most up-to-date version is returned.
Instead, the view will always provide the most current information available.
One example is external tables, which allow you to define a table in Snowflake that references data stored outside of Snowflake.
These tables can be updated manually or even automatically by Snowflake using tasks or with a bit help of the cloud providers.
In future blog posts, we will for sure dive deeper into these things. Stay tuned for more updates on this topic.
Thanks for reading!
Greetings
Patrick
Comments
Post a Comment