Snowflake Native App

Documentations for snowflake native app

Welcome to ChatBees

Run state-of-the-art serverless RAG pipeline within your own Snowflake account with ChatBees. ChatBees provides enterprise users with a complete and flexible end-to-end solution

With a set of simple APIs, you can ingest and index a variety of unstructured and semi-structured data, including PDF, CSV, DOCX, TXT, MD.

Required privileges for installation

ChatBees requires the following global privileges

  • CREATE COMPUTE POOL: Required to run ChatBees service inside container service.

  • CREATE WAREHOUSE: Required for ChatBees to execute queries.

  • IMPORTED PRIVILEGES ON SNOWFLAKE DB: Required to access Snowflake Cortex embedding and completion functions.

Note: Granting IMPORTED PRIVILEGES ON SNOWFLAKE DB privilege allows ChatBees to see information about usage and costs associated with the consumer account. ChatBees is committed to never accessing these data and other unrelated user data.

Additionally, you can configure the EGRESS network connection to

  • HuggingFace if you configure ChatBees to host an embedding model within the app.

  • Third-party API endpoints, such as Azure OpenAI, if you configure ChatBees to access models outside of Snowflake.

See Configure models section for details.

Quickstart

First configure the embedding and completion models you wish to use with ChatBees app. ChatBees supports models from major LLM vendors, including Snowflake Cortex.

You can change and experiment with the completion model at any time. However, you must ensure all existing collections are deleted before changing embedding model

For example, use the following command to configure ChatBees to use voyage-multilingual-2 embedding model and llama3.1-70b completion model from Snowflake Cortex.

NOTE: Replace chatbees_app with your actual installed app name.

-- Create ChatBees with Cortex models
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  

-- Make sure to wait until ChatBees service is ready before proceeding
CALL chatbees_app.admin.service_status();

You can also configure the compute instance family of ChatBees. We recommend

  • CPU_X64_S if you use Snowflake Cortex models exclusively

  • GPU_NV_S or GPU_NV_XS if you configure a huggingface embedding model. You can restart ChatBees with a different compute instance family at any time to scale up or down

Next, create a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.

select chatbees_app.api.create_collection('llm_documents');  

Next, ingest all files inside a stage via chatbees_app.api.ingest_files function. ChatBees supports a variety of file formats like pdf, csv, txt, docx.

-- Grant ChatBees app read-only access to mystage  
GRANT USAGE ON DATABASE spcs_db TO APPLICATION chatbees_app;  
GRANT USAGE ON SCHEMA spcs_db.spcs_schema TO APPLICATION chatbees_app;  
GRANT READ ON STAGE spcs_db.spcs_schema.mystage TO APPLICATION chatbees_app;  
  
SELECT chatbees_app.api.ingest_files('llm_documents', 'spcs_db.spcs_schema.mystage');

Finally, you can query your collection. ChatBees supports both RAG and semantic search.

-- Semantic search, returns top 15 results
SELECT chatbees_app.api.search_collection('llm_documents', 'what is LLM', 15);  
  
-- RAG  
SELECT chatbees_app.api.ask('llm_documents', 'what is LLM');  

Configure models

You can configure ChatBees to use a variety of models from different vendors. ChatBees can also host a HuggingFace embedding model for you within the app. Select embedding and completion models from supported vendors below, they can be from the same or different vendor.

Supported embedding model vendors and hosting options

  • Snowflake Cortex: All models

  • Hosted HuggingFace: All public models. Contact us if you need access to gated models

  • OpenAI: All models

  • Azure OpenAI: All models

Supported completion model vendors and hosting options

  • Snowflake Cortex: Llama3.1 and 3.2 models. Support for other models is coming soon

  • OpenAI: All models

  • Azure OpenAI: All models

  • Anthropic: (coming soon)

More vendors and hosting options can be supported by request

If you choose a third-party model, network access must be set up under "Connections" tab.

For huggingface models: ChatBees will download the relevant model file from one of HF's CDN endpoints (cdn-lfs.hf.co, cdn-lfs-eu-1.hf.co, cdn-lfs-us-1.hf.co)

For OpenAI models: ChatBees will connect to api.openai.com

For Azure OpenAI models: ChatBees will connect to your Azure OpenAI deployment at .openai.azure.com

Next, configure access credentials if you use OpenAI or Azure OpenAI models.

-- Configure OpenAI
CALL  chatbees_app.admin.configure_openai('SET', '<openai_key>');

-- Configure Azure OpenAI
CALL  chatbees_app.admin.configure_openai('SET', '<azure_openai_key>', '<azure_openai_endpoint>', '<api_version>');

Finally, specify the embedding and completion models you'd like to use. You can mix and match embedding and completion models from any supported vendor! Make sure to select a GPU instance if you're running embedding model within ChatBees for optimal performance.

-- Use Alibaba-NLP/gte-large-en-v1.5 embedding model hosted within ChatBees  
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S');  

-- Use OpenAI
CALL chatbees_app.admin.start_app('openai/text-embedding-3-large', 'openai/gpt-4o', 'CPU_X64_S');  

-- Use Azure OpenAI
CALL chatbees_app.admin.start_app('azureopenai/embedding_deployment', 'azureopenai/completion_deployment', 'CPU_X64_S');  

-- Mix and match!
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'openai/gpt-4o', 'CPU_X64_S');  

Update embedding and completion models.

You can restart ChatBees app with a different completion model at any time. You must call admin.stop_app() to stop any existing ChatBees service.

-- Use llama3.1-8b completion model  
CALL chatbees_app.admin.stop_app();
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-8b', 'CPU_X64_S');  
  
-- Later on, experiment with llama3.1-70b model. 
-- Simply restart the ChatBees app with the new config.  
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  

However, embedding model can only be changed when the app does not have any collection. If you need to change the embedding model, all existing collections must be deleted first.

-- Use cortex/voyage-multilingual-2 embedding model  
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  
  
-- Later on, experiment with multilingual-e5-large embedding model.  
  
-- All collections must be dropped first  
SELECT chatbees_app.api.delete_collection(*) FROM VALUES ('col1'), ('col2'), ('col3'), ... ;  
-- Confirm there're no collection left  
SELECT chatbees_app.api.list_collections();  
-- Change embedding model, then recreate collections and ingest data
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');  

Performance considerations

The performance of ChatBees app depends on both ChatBees RAG engine performance as well as the performance of the underlying embedding and completion models.

For example, if ChatBees is configured with Snowflake Cortex, the latency you experience will depend directly on the performance of the Cortex models.

  • Data ingestion: uses embedding model

  • Semantic search: uses embedding model

  • RAG: uses both embedding and completion models

You can export access logs to view a detailed breakdown of RAG performance

-- Export access logs
SELECT chatbees_app.api.export_access_logs();

-- Inspect performance metrics
select * from chatbees_app.access_logs.access_log_1730933619388682450;

-- Below is an example of a RAG metric. 4.8s was spent in in completion model. 
-- You can choose a more performant model to reduce the overall runtime of the API call.
{
  "completion_time_ms": 4892,
  "duration_ms": 5501,
  "get_embedding_time_ms": 0,
  "vector_search_time_ms": 463
  "input_tokens": 8025,
  "output_tokens": 597,
  ... other metrics
}

Tips to optimize the performance of your ChatBees app:

  • Completion time too high: Try a smaller model or vendor

  • Embedding time too high: Try a smaller model or vendor

  • Embedding time too high (huggingface): Try a smaller embedding model, or use a GPU instance

APIs

admin.start_app

Starts the ChatBees app. You can invoke this procedure again to change embedding or completion model.

  • embedding_model: The embedding model to use. Format is '<vendor>/<model>'

  • completion_model: The completion model to use. Format is '<vendor>/<model>'

  • instance_family: The instance family of ChatBees compute. We recommend CPU_X64_S if you are not running huggingface embedding model.

-- Example1: Use cortex embedding and completion models
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');  

-- Example2: Use https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5 embedding model. 
-- Use a GPU instance to speed up embedding computation.
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S'); 

admin.stop_app

Stops the ChatBees app and stops compute pool.

CALL chatbees_app.admin.stop_app();

admin.service_status

Returns the current status of ChatBees app (e.g. PENDING, READY)

CALL chatbees_app.admin.service_status();

admin.get_external_access_config

Returns the access configuration required.

  • reference_name: Name of the external access config. Only model_vendor is supported.

-- Example: Get model vendor external access integration config
CALL chatbees_app.admin.get_external_access_config('model_vendor');

admin.configure_openai

Configures connection to OpenAI

  • api_key: OpenAI API key

CALL chatbees_app.admin.configure_openai('apikey');

admin.configure_azure_openai

Configures connection to Azure OpenAI

  • api_key: Azure OpenAI API key

  • endpoint: Azure OpenAI endpoint

  • api_version: Azure OpenAI API version

CALL chatbees_app.admin.configure_azure_openai('apikey', 'https://org.openai.azure.com/', '2024-06-01');

admin.configure_huggingface

(coming soon) Configures connection to Huggingface to use gated repositories

  • api_key: OpenAI API key

CALL chatbees_app.admin.configure_huggingface('apikey');

api.create_collection

Creates a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.

  • collection_name: Name of the collection

SELECT chatbees_app.api.create_collection('hello');

api.list_collections

Lists all collections in ChatBees

  SELECT chatbees_app.api.list_collections();

api.delete_collection

Deletes a collection and its content

  • collection_name: Name of the collection

  SELECT chatbees_app.api.delete_collection('hello');

api.ingest_files

Ingests files from stage into ChatBees RAG pipeline. Please make sure to grant ChatBees app READ privilege of the stage, as well as USAGE privilege of the parent schema and database. This function supports incremental ingestion. Previously ingested files will not be ingested again.

  • collection_name: The name of the collection to ingest into

  • stage_path: The fully qualified name of the stage

If you accidentally deleted a file and wish to ingest it again, please re-upload the file into stage, refresh directory table, then invoke ingest_files. Note: You must enable directory table on the stage and make sure it is refreshed before calling ingest_files.

-- Example: Ingest all files from mydb.myschema.mystage
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');

-- Add or modify new files from stage

-- Ingest again. New files will be ingested into collection
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');

api.list_files

Lists all files inside a collection.

  • collection_name: Name of the collection

SELECT chatbees_app.api.list_files('col1');

api.delete_file

Deletes a file from a collection

  • collection_name: Name of the collection

  • file_name: Name of the file to delete

SELECT chatbees_app.api.delete_file('col', 'myfile.pdf');

api.ask

ChatBees RAG API. Gets conversational answer to your question, based on data inside your collection.

  • collection_name: Name of the collection

  • question: Question to ask

SELECT chatbees_app.api.ask('col', 'What is a transformer?');

api.search

ChatBees semantic search API. Returns top_k most relevant results from collection_name

  • collection_name: Name of the collection

  • question: Question to ask

  • top_k: How many search results to return

SELECT chatbees_app.api.search('col', 'What is a transformer?', 10);

api.export_access_logs

Exports all access logs into a snowflake table.

SELECT chatbees_app.api.export_access_logs();

Cost considerations

ChatBees app uses the following resources

  • 1x MEDIUM warehouse for running queries. This warehouse is auto-suspended if ChatBees is not actively processing requests.

  • 1x compute pool with one node to run ChatBees Service. You can specify the compute pool instance family at app startup.

You can stop ChatBees app at anytime and resume later. Our cloud-native architecture ensures that your data is always persisted on durable storage.

Last updated