R Support for Data Science Notebook Tools

R Database connection Docker

How to use a custom Docker image to access a Snowflake DB in Deepnote.

Author

Affiliation

Lukas Gröninger

 

Published

July 28, 2023

Citation

Gröninger, 2023

This post will be about Data Science Notebook Tools and their limitations when it comes to R support. Additionally I want to show how to create a custom Docker image with an added Snowflake Driver to connect to database tables from within an R Deepnote notebook.

Random Ernst Ludwig Kirchner Dall-E image about a notebook (and R)

Why am I looking into Notebook tools in the first place? Because in our Data Team, we decided to move away from Tableau as our main Data visualization or BI tool and therefore explored other options.

Why did we decide to move away from Tableau?

There are several reasons leading to this step. One relatively obvious one is the high price.

Source: https://www.tableau.com/pricing/teams-orgs

Tableau can be a costly investment for organizations, especially as the number of users increases. Every user needs to have an account even if s/he only wants to view dashboards and is not planning to create their own. Therefore it isn’t easily possible to share links to dashboards to an entire organization. There are of course ways to share accounts but this is not the intended use of this software.

Another point are the limited collaboration features. Tableau’s collaboration and sharing features have been improving over time, but they still fall short when compared to some of the newer BI tools. We’d like a platform that fosters collaboration, enables real-time sharing, and allows stakeholders to interact with the data directly. Some modern BI tools provide better user management, real-time commenting, and collaborative features that streamline decision-making processes.

In the end we weren’t able to leverage the potential of Tableau and bring its value to our entire organization. This lead us to explore other options.

What else is out there?

One possibility is the combination of a data science notebook tool for (advanced) data analyses or ad hoc reports as well as Googles lookerstudio (which is free) for standard visualizations and metrics.

We looked into multiple options and checked out tools like Hyperquery, Husprey, Deepnote, Noteable etc. We’re currently testing Deepnote further as it has SQL, Python as well as R support and the option of scheduling notebooks. Additionally it seems that there is a decent commenting and collaboration functionality. That means we can hopefully collaborate in real-time and share notebooks easily enabling seamless discussions and feedback. Team members as well as other stakeholders can leave comments directly on code cells, graphs or markdown sections, promoting communication which may lead to enhancing the overall quality of data analysis projects.

R Support for Data Science Notebook Tools (Deepnote)

As described in a previous post we’re working with a Snowflake/dbt datawarehouse setup and I’d like to do analyses in R. One requirement is that I can connect to database tables and access data directly within a notebook. When testing notebook tools, I discovered that the support for the R language is quite bad. The dominance of Python as the lingua franca of Data Science led to businesses focusing on supporting mainly Python.

Random Ernst Ludwig Kirchner Dall-E image of a big Python looking at a man with a notebook

If they provide the option to also run R notebooks the set of features is incredibly limited. That usually means you cannot

I posed a question in the deepnote community forum discussing the R support and a Customer Support Manager confirmed that they are not actively working on improving the R experience.

Custom Environments through Docker

One nice feature Deepnote offers is the possibility to use custom environments through Docker images. Here I thought this at least allowed me to avoid the shortcoming of not being able to query snowflake db tables from within an R notebook.

They provide their own R docker images on dockerhub from which I can then add a snowflake odbc driver. The requirements for the custom image is as follows:

Source: https://deepnote.com/docs/custom-environments

Creating a Docker Image

There are many wonderful blog posts, introductions and helpful links for using Docker:

First we’ll create a new directory.

mkdir docker-r && cd docker-r

As we want to include a Snowflake odbc Driver we visit their website and select the relevant file that we want to download. In our case we need a driver built for the Linux/amd64 platform (as seen in the requirements). This means I’ll download the snowflake_linux_x8664_odbc-2.25.12.tgz file (x86-64 is also known as x64, AMD64 or Intel 64) and save it in my newly created docker-r folder.

Then we’ll need to create a Dockerfile. This will hold the instructions to build our image. Below are the contents of this file:

FROM deepnote/ir-with-libs:4.2.0

WORKDIR /app

COPY snowflake_linux_x8664_odbc-2.25.12.tgz /app/

RUN tar -xzvf snowflake_linux_x8664_odbc-2.25.12.tgz 

RUN snowflake_odbc/unixodbc_setup.sh

When I unzipped the snowflake driver file I found the unixodbc_setup.sh bash script that had all the relevant instructions to install the driver. To read the Dockerfile:

Building the Docker image and host it on Dockerhub

Now we’ve got everything we need to build our image. We need to make sure that Docker Desktop is running (and has been installed previously) Then we can run

docker build --tag snowflaker .

If everything works well this builds our image. Afterwards we can run the image with

docker run snowflaker

Then we can start R from within this container and check if the Driver is installed by typing in

odbc::odbcListDrivers()

To host our newly created image on Dockerhub, I like to use the VS-Code Docker extension. There it is pretty straight forward. You can see on the left side all your images and containers and can right click to push them to Dockerhub.

This should then push your image to Dockerhub (here is the link to my image). In Deepnote you can point to your newly created image and are now able to run an R notebook that has a Snowflake Driver installed which lets you connect to the desired database table. One last step that is important is the handling of passwords. Here Deepnote offers the possibility to use Environment Variables.

A code cell establishing a connection in an R notebook in Deepnote migth look like this:

Example of how to connect to Database table With the help of Sys.getenv("SNOWFLAKE_PWD) we don’t have to expose our password in the notebook.

That’s it! Now you’re able to connect to your Snowflake DB tables from within an Deepnote R notebook.

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Gröninger (2023, July 29). Lukas R Blog: R Support for Data Science Notebook Tools. Retrieved from https://lukas-r.blog/posts/2023-07-29-r-support-for-data-science-notebook-tools/

    BibTeX citation

    @misc{gröninger2023r,
      author = {Gröninger, Lukas},
      title = {Lukas R Blog: R Support for Data Science Notebook Tools},
      url = {https://lukas-r.blog/posts/2023-07-29-r-support-for-data-science-notebook-tools/},
      year = {2023}
    }