How to use a custom Docker image to access a Snowflake DB in Deepnote.
This post will be about Data Science Notebook Tools and their limitations when it comes to R support. Additionally I want to show how to create a custom Docker image with an added Snowflake Driver to connect to database tables from within an R Deepnote notebook.
Why am I looking into Notebook tools in the first place? Because in our Data Team, we decided to move away from Tableau as our main Data visualization or BI tool and therefore explored other options.
There are several reasons leading to this step. One relatively obvious one is the high price.
Tableau can be a costly investment for organizations, especially as the number of users increases. Every user needs to have an account even if s/he only wants to view dashboards and is not planning to create their own. Therefore it isn’t easily possible to share links to dashboards to an entire organization. There are of course ways to share accounts but this is not the intended use of this software.
Another point are the limited collaboration features. Tableau’s collaboration and sharing features have been improving over time, but they still fall short when compared to some of the newer BI tools. We’d like a platform that fosters collaboration, enables real-time sharing, and allows stakeholders to interact with the data directly. Some modern BI tools provide better user management, real-time commenting, and collaborative features that streamline decision-making processes.
In the end we weren’t able to leverage the potential of Tableau and bring its value to our entire organization. This lead us to explore other options.
One possibility is the combination of a data science notebook tool for (advanced) data analyses or ad hoc reports as well as Googles lookerstudio (which is free) for standard visualizations and metrics.
We looked into multiple options and checked out tools like Hyperquery, Husprey, Deepnote, Noteable etc. We’re currently testing Deepnote further as it has SQL, Python as well as R support and the option of scheduling notebooks. Additionally it seems that there is a decent commenting and collaboration functionality. That means we can hopefully collaborate in real-time and share notebooks easily enabling seamless discussions and feedback. Team members as well as other stakeholders can leave comments directly on code cells, graphs or markdown sections, promoting communication which may lead to enhancing the overall quality of data analysis projects.
As described in a previous post we’re working with a Snowflake/dbt datawarehouse setup and I’d like to do analyses in R. One requirement is that I can connect to database tables and access data directly within a notebook. When testing notebook tools, I discovered that the support for the R language is quite bad. The dominance of Python as the lingua franca of Data Science led to businesses focusing on supporting mainly Python.
If they provide the option to also run R notebooks the set of features is incredibly limited. That usually means you cannot
I posed a question in the deepnote community forum discussing the R support and a Customer Support Manager confirmed that they are not actively working on improving the R experience.
One nice feature Deepnote offers is the possibility to use custom environments through Docker images. Here I thought this at least allowed me to avoid the shortcoming of not being able to query snowflake db tables from within an R notebook.
They provide their own R docker images on dockerhub from which I can then add a snowflake odbc driver. The requirements for the custom image is as follows:
There are many wonderful blog posts, introductions and helpful links for using Docker:
First we’ll create a new directory.
mkdir docker-r && cd docker-r
As we want to include a Snowflake odbc Driver we visit their website
and select the relevant file that we want to download. In our case we need a driver built for the
Linux/amd64 platform (as seen in the requirements).
This means I’ll download the snowflake_linux_x8664_odbc-2.25.12.tgz
file (x86-64 is also known as x64, AMD64 or Intel 64) and save it
in my newly created docker-r
folder.
Then we’ll need to create a Dockerfile. This will hold the instructions to build our image. Below are the contents of this file:
FROM deepnote/ir-with-libs:4.2.0
WORKDIR /app
COPY snowflake_linux_x8664_odbc-2.25.12.tgz /app/
RUN tar -xzvf snowflake_linux_x8664_odbc-2.25.12.tgz
RUN snowflake_odbc/unixodbc_setup.sh
When I unzipped the snowflake driver file I found the unixodbc_setup.sh
bash script that had all the relevant instructions to install the driver.
To read the Dockerfile:
deepnote/ir-with-libs:4.2.0
app
foldersnowflake_odbc/unixodbc_setup.sh
shell scriptNow we’ve got everything we need to build our image. We need to make sure that Docker Desktop is running (and has been installed previously) Then we can run
docker build --tag snowflaker .
If everything works well this builds our image. Afterwards we can run the image with
docker run snowflaker
Then we can start R from within this container and check if the Driver is installed by typing in
odbc::odbcListDrivers()
To host our newly created image on Dockerhub, I like to use the VS-Code Docker extension. There it is pretty straight forward. You can see on the left side all your images and containers and can right click to push them to Dockerhub.
This should then push your image to Dockerhub (here is the link to my image). In Deepnote you can point to your newly created image and are now able to run an R notebook that has a Snowflake Driver installed which lets you connect to the desired database table. One last step that is important is the handling of passwords. Here Deepnote offers the possibility to use Environment Variables.
A code cell establishing a connection in an R notebook in Deepnote migth look like this:
With the help of Sys.getenv("SNOWFLAKE_PWD)
we don’t have to expose our password
in the notebook.
That’s it! Now you’re able to connect to your Snowflake DB tables from within an Deepnote R notebook.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Gröninger (2023, July 29). Lukas R Blog: R Support for Data Science Notebook Tools. Retrieved from https://lukas-r.blog/posts/2023-07-29-r-support-for-data-science-notebook-tools/
BibTeX citation
@misc{gröninger2023r, author = {Gröninger, Lukas}, title = {Lukas R Blog: R Support for Data Science Notebook Tools}, url = {https://lukas-r.blog/posts/2023-07-29-r-support-for-data-science-notebook-tools/}, year = {2023} }