Integrating Git with Snowflake and Streamlit to Empower Machine Learning

 

Introduction:

Integrating Git with Snowflake ensures efficient collaboration, facilitating transparent tracking of code changes and minimizing the risk of errors. This integration streamlines development workflows, enabling teams to leverage Snowflake’s capabilities while maintaining code reliability and traceability.

This blog discusses the process of linking Snowflake to a Git repository and deploying an ML model on Streamlit app with scripts sourced from Git. We utilized the Iris dataset from scikit-learn to develop a classification model using Python scripts from Git and subsequently built a Streamlit app within Snowflake. The following provides a detailed step-by-step explanation.

(In order to demonstrate this capability, the iris flower dataset is used, and variable distribution charts and prediction functionality have been included).

Step 1: Create API Integration :

Develop an API integration to define the specifics of Snowflake’s interaction with the Git repository API.

    • Create an API integration named git sample integration which will allow git traffic from the Snowflake account to the URL or URL patterns specified.

Note:  Only ACCOUNTADMIN should create this role.

    • SECRET is an object within Snowflake that enables the secure storage of credentials. It ensures the security of sensitive information. Below is an example format.

Step 2: Create Git Repository:

    • Create a Git repository in the Snowflake account by referring the above created API integration
    • Point the specific branch from which the files are needed for development. This is where we synchronize files from the repository.

Step 3: List the files:

    • Now, list the files available in the git using the following command format

Step 4: Create Streamlit app:

    • Choose or create the database under which the Streamlit app is intended to be created.
    • Choose the root location path from git where the python script to create the streamlit app is located.
    • Now, we create a streamlit app named git_example_iris using the following command,

In this context, MAIN_FILE refers to the Python file, and ROOT_LOCATION denotes the python file’s directory path that we are utilizing.

Step 5: App Development:

    • While leveraging this feature, if a need arises for a new Python library to be imported, it would not be possible to update or import the library directly into this Streamlit app, as we typically would.
    • To address this issue, we have the environment file –  environment.yml.
    • This file lists the required Python packages and is in the same Git directory as our Python script that was used to launch our Streamlit app (which in this case is Iris_reg.py).

environment.yml file format:

    • After updating this environment file, a command should be executed in snowflake for the changes in git to be reflected in the snowflake streamlit app.          

Now consider a scenario where there’s a new package that was introduced in the python file. As mentioned earlier, we would not be able to directly import this in the streamlit app. If the package hasn’t been added in the environment.yml file, it will throw the error as shown below (in this example the seaborn (sns) package is used in the script, but not yet added to the environment.yml file). It throws the following error:

The error is resolved by adding an entry for seaborn in the yml file as highlighted below.

Now the package is free of errors and the app loads successfully displaying the necessary visualization and predictions.

Prediction:

The classification model on Iris dataset is built successfully and the prediction results are displayed below.

Conclusion:

 In this use case, we leveraged git, created a Streamlit app for an ML model and seamlessly integrated the same with Snowflake.

The integration with git facilitates creation of reusable code and scripts, enabling collaboration and seamless connection to any Snowflake account and deployment of applications without data loss or complexity. Moreover, the FETCH option enables smooth refresh from git and helps reflect any updates seamlessly. We also saw that by using Git_secrets we were able to secure our credentials.

Overall, integrating Git with Snowflake enables  collaborative development, ensuring transparency and accountability in code changes. It also streamlines the development process and facilitates continuous integration and deployment, leading to improved productivity and code reliability.

Cittabase is a select partner with Snowflake. Please feel free to contact us regarding your Snowflake solution needs. Our snowflake solutions encompass a suite of services for your data integration and migration needs. We are committed to providing personalized assistance and support customized to your requirements.



Leave a Reply