Snowflake Native App
Documentations for snowflake native app
Welcome to ChatBees
Run state-of-the-art serverless RAG pipeline within your own Snowflake account with ChatBees. ChatBees provides enterprise users with a complete and flexible end-to-end solution
With a set of simple APIs, you can ingest and index a variety of unstructured and semi-structured data, including PDF
, CSV
, DOCX
, TXT
, MD
.
Required privileges for installation
ChatBees requires the following global privileges
CREATE COMPUTE POOL
: Required to run ChatBees service inside container service.CREATE WAREHOUSE
: Required for ChatBees to execute queries.IMPORTED PRIVILEGES ON SNOWFLAKE DB
: Required to access Snowflake Cortex embedding and completion functions.
Note: Granting IMPORTED PRIVILEGES ON SNOWFLAKE DB
privilege allows ChatBees to see information about usage and costs associated with the consumer account. ChatBees is committed to never accessing these data and other unrelated user data.
Additionally, you can configure the EGRESS
network connection to
HuggingFace if you configure ChatBees to host an embedding model within the app.
Third-party API endpoints, such as Azure OpenAI, if you configure ChatBees to access models outside of Snowflake.
See Configure models
section for details.
Quickstart
First configure the embedding and completion models you wish to use with ChatBees app. ChatBees supports models from major LLM vendors, including Snowflake Cortex.
You can change and experiment with the completion model at any time. However, you must ensure all existing collections are deleted before changing embedding model
For example, use the following command to configure ChatBees to use voyage-multilingual-2
embedding model and llama3.1-70b
completion model from Snowflake Cortex.
NOTE: Replace chatbees_app
with your actual installed app name.
-- Create ChatBees with Cortex models
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');
-- Make sure to wait until ChatBees service is ready before proceeding
CALL chatbees_app.admin.service_status();
You can also configure the compute instance family of ChatBees. We recommend
CPU_X64_S
if you use Snowflake Cortex models exclusivelyGPU_NV_S
orGPU_NV_XS
if you configure a huggingface embedding model. You can restart ChatBees with a different compute instance family at any time to scale up or down
Next, create a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.
select chatbees_app.api.create_collection('llm_documents');
Next, ingest all files inside a stage via chatbees_app.api.ingest_files
function. ChatBees supports a variety of file formats like pdf, csv, txt, docx.
-- Grant ChatBees app read-only access to mystage
GRANT USAGE ON DATABASE spcs_db TO APPLICATION chatbees_app;
GRANT USAGE ON SCHEMA spcs_db.spcs_schema TO APPLICATION chatbees_app;
GRANT READ ON STAGE spcs_db.spcs_schema.mystage TO APPLICATION chatbees_app;
SELECT chatbees_app.api.ingest_files('llm_documents', 'spcs_db.spcs_schema.mystage');
Finally, you can query your collection. ChatBees supports both RAG and semantic search.
-- Semantic search, returns top 15 results
SELECT chatbees_app.api.search_collection('llm_documents', 'what is LLM', 15);
-- RAG
SELECT chatbees_app.api.ask('llm_documents', 'what is LLM');
Configure models
You can configure ChatBees to use a variety of models from different vendors. ChatBees can also host a HuggingFace embedding model for you within the app. Select embedding and completion models from supported vendors below, they can be from the same or different vendor.
Supported embedding model vendors and hosting options
Snowflake Cortex: All models
Hosted HuggingFace: All public models. Contact us if you need access to gated models
OpenAI: All models
Azure OpenAI: All models
Supported completion model vendors and hosting options
Snowflake Cortex: Llama3.1 and 3.2 models. Support for other models is coming soon
OpenAI: All models
Azure OpenAI: All models
Anthropic: (coming soon)
More vendors and hosting options can be supported by request
If you choose a third-party model, network access must be set up under "Connections" tab.
For huggingface models: ChatBees will download the relevant model file from one of HF's CDN endpoints (cdn-lfs.hf.co, cdn-lfs-eu-1.hf.co, cdn-lfs-us-1.hf.co)
For OpenAI models: ChatBees will connect to api.openai.com
For Azure OpenAI models: ChatBees will connect to your Azure OpenAI deployment at .openai.azure.com
Next, configure access credentials if you use OpenAI or Azure OpenAI models.
-- Configure OpenAI
CALL chatbees_app.admin.configure_openai('SET', '<openai_key>');
-- Configure Azure OpenAI
CALL chatbees_app.admin.configure_openai('SET', '<azure_openai_key>', '<azure_openai_endpoint>', '<api_version>');
Finally, specify the embedding and completion models you'd like to use. You can mix and match embedding and completion models from any supported vendor! Make sure to select a GPU instance if you're running embedding model within ChatBees for optimal performance.
-- Use Alibaba-NLP/gte-large-en-v1.5 embedding model hosted within ChatBees
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S');
-- Use OpenAI
CALL chatbees_app.admin.start_app('openai/text-embedding-3-large', 'openai/gpt-4o', 'CPU_X64_S');
-- Use Azure OpenAI
CALL chatbees_app.admin.start_app('azureopenai/embedding_deployment', 'azureopenai/completion_deployment', 'CPU_X64_S');
-- Mix and match!
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'openai/gpt-4o', 'CPU_X64_S');
Update embedding and completion models.
You can restart ChatBees app with a different completion model at any time. You must call admin.stop_app()
to stop any existing ChatBees service.
-- Use llama3.1-8b completion model
CALL chatbees_app.admin.stop_app();
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-8b', 'CPU_X64_S');
-- Later on, experiment with llama3.1-70b model.
-- Simply restart the ChatBees app with the new config.
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');
However, embedding model can only be changed when the app does not have any collection. If you need to change the embedding model, all existing collections must be deleted first.
-- Use cortex/voyage-multilingual-2 embedding model
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');
-- Later on, experiment with multilingual-e5-large embedding model.
-- All collections must be dropped first
SELECT chatbees_app.api.delete_collection(*) FROM VALUES ('col1'), ('col2'), ('col3'), ... ;
-- Confirm there're no collection left
SELECT chatbees_app.api.list_collections();
-- Change embedding model, then recreate collections and ingest data
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');
Performance considerations
The performance of ChatBees app depends on both ChatBees RAG engine performance as well as the performance of the underlying embedding and completion models.
For example, if ChatBees is configured with Snowflake Cortex, the latency you experience will depend directly on the performance of the Cortex models.
Data ingestion: uses embedding model
Semantic search: uses embedding model
RAG: uses both embedding and completion models
You can export access logs to view a detailed breakdown of RAG performance
-- Export access logs
SELECT chatbees_app.api.export_access_logs();
-- Inspect performance metrics
select * from chatbees_app.access_logs.access_log_1730933619388682450;
-- Below is an example of a RAG metric. 4.8s was spent in in completion model.
-- You can choose a more performant model to reduce the overall runtime of the API call.
{
"completion_time_ms": 4892,
"duration_ms": 5501,
"get_embedding_time_ms": 0,
"vector_search_time_ms": 463
"input_tokens": 8025,
"output_tokens": 597,
... other metrics
}
Tips to optimize the performance of your ChatBees app:
Completion time too high: Try a smaller model or vendor
Embedding time too high: Try a smaller model or vendor
Embedding time too high (huggingface): Try a smaller embedding model, or use a GPU instance
APIs
admin.start_app
Starts the ChatBees app. You can invoke this procedure again to change embedding or completion model.
embedding_model
: The embedding model to use. Format is'<vendor>/<model>'
completion_model
: The completion model to use. Format is'<vendor>/<model>'
instance_family
: The instance family of ChatBees compute. We recommendCPU_X64_S
if you are not running huggingface embedding model.
-- Example1: Use cortex embedding and completion models
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');
-- Example2: Use https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5 embedding model.
-- Use a GPU instance to speed up embedding computation.
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S');
admin.stop_app
Stops the ChatBees app and stops compute pool.
CALL chatbees_app.admin.stop_app();
admin.service_status
Returns the current status of ChatBees app (e.g. PENDING
, READY
)
CALL chatbees_app.admin.service_status();
admin.get_external_access_config
Returns the access configuration required.
reference_name: Name of the external access config. Only
model_vendor
is supported.
-- Example: Get model vendor external access integration config
CALL chatbees_app.admin.get_external_access_config('model_vendor');
admin.configure_openai
Configures connection to OpenAI
api_key: OpenAI API key
CALL chatbees_app.admin.configure_openai('apikey');
admin.configure_azure_openai
Configures connection to Azure OpenAI
api_key: Azure OpenAI API key
endpoint: Azure OpenAI endpoint
api_version: Azure OpenAI API version
CALL chatbees_app.admin.configure_azure_openai('apikey', 'https://org.openai.azure.com/', '2024-06-01');
admin.configure_huggingface
(coming soon) Configures connection to Huggingface to use gated repositories
api_key: OpenAI API key
CALL chatbees_app.admin.configure_huggingface('apikey');
api.create_collection
Creates a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.
collection_name: Name of the collection
SELECT chatbees_app.api.create_collection('hello');
api.list_collections
Lists all collections in ChatBees
SELECT chatbees_app.api.list_collections();
api.delete_collection
Deletes a collection and its content
collection_name: Name of the collection
SELECT chatbees_app.api.delete_collection('hello');
api.ingest_files
Ingests files from stage into ChatBees RAG pipeline. Please make sure to grant ChatBees app READ privilege of the stage, as well as USAGE privilege of the parent schema and database. This function supports incremental ingestion. Previously ingested files will not be ingested again.
collection_name: The name of the collection to ingest into
stage_path: The fully qualified name of the stage
If you accidentally deleted a file and wish to ingest it again, please re-upload the file into stage, refresh directory table, then invoke ingest_files
. Note: You must enable directory table on the stage and make sure it is refreshed before calling ingest_files
.
-- Example: Ingest all files from mydb.myschema.mystage
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');
-- Add or modify new files from stage
-- Ingest again. New files will be ingested into collection
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');
api.list_files
Lists all files inside a collection.
collection_name: Name of the collection
SELECT chatbees_app.api.list_files('col1');
api.delete_file
Deletes a file from a collection
collection_name: Name of the collection
file_name: Name of the file to delete
SELECT chatbees_app.api.delete_file('col', 'myfile.pdf');
api.ask
ChatBees RAG API. Gets conversational answer to your question, based on data inside your collection.
collection_name: Name of the collection
question: Question to ask
SELECT chatbees_app.api.ask('col', 'What is a transformer?');
api.search
ChatBees semantic search API. Returns top_k
most relevant results from collection_name
collection_name: Name of the collection
question: Question to ask
top_k: How many search results to return
SELECT chatbees_app.api.search('col', 'What is a transformer?', 10);
api.export_access_logs
Exports all access logs into a snowflake table.
SELECT chatbees_app.api.export_access_logs();
Cost considerations
ChatBees app uses the following resources
1x
MEDIUM
warehouse for running queries. This warehouse is auto-suspended if ChatBees is not actively processing requests.1x compute pool with one node to run ChatBees Service. You can specify the compute pool instance family at app startup.
You can stop ChatBees app at anytime and resume later. Our cloud-native architecture ensures that your data is always persisted on durable storage.
Last updated