# Snowflake Native App

### Welcome to ChatBees

Run state-of-the-art serverless RAG pipeline within your own Snowflake account with ChatBees. ChatBees provides enterprise users with a complete and flexible end-to-end solution

With a set of simple APIs, you can ingest and index a variety of unstructured and semi-structured data, including `PDF`, `CSV`, `DOCX`, `TXT`, `MD`.

### Required privileges for installation

ChatBees requires the following global privileges

* `CREATE COMPUTE POOL`: Required to run ChatBees service inside container service.
* `CREATE WAREHOUSE`: Required for ChatBees to execute queries.
* `IMPORTED PRIVILEGES ON SNOWFLAKE DB`: Required to access Snowflake Cortex embedding and completion functions.

**Note**: Granting `IMPORTED PRIVILEGES ON SNOWFLAKE DB` privilege allows ChatBees to see information about usage and costs associated with the consumer account. ChatBees is committed to never accessing these data and other unrelated user data.

Additionally, you can configure the `EGRESS` network connection to

* HuggingFace if you configure ChatBees to host an embedding model within the app.
* Third-party API endpoints, such as Azure OpenAI, if you configure ChatBees to access models outside of Snowflake.

See `Configure models` section for details.

### Quickstart

First configure the **embedding** and **completion** models you wish to use with ChatBees app. ChatBees supports models from major LLM vendors, including Snowflake Cortex.

You can change and experiment with the **completion** model at any time. However, you must ensure all existing collections are deleted before changing **embedding** model

For example, use the following command to configure ChatBees to use `voyage-multilingual-2` embedding model and `llama3.1-70b` completion model from Snowflake Cortex.

NOTE: Replace `chatbees_app` with your actual installed app name.

```sql
-- Create ChatBees with Cortex models
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  

-- Make sure to wait until ChatBees service is ready before proceeding
CALL chatbees_app.admin.service_status();
```

You can also configure the compute instance family of ChatBees. We recommend

* `CPU_X64_S` if you use Snowflake Cortex models exclusively
* `GPU_NV_S` or `GPU_NV_XS` if you configure a huggingface embedding model. You can restart ChatBees with a different compute instance family at any time to scale up or down

Next, create a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.

```sql
select chatbees_app.api.create_collection('llm_documents');  
```

Next, ingest all files inside a stage via `chatbees_app.api.ingest_files` function. ChatBees supports a variety of file formats like pdf, csv, txt, docx.

```sql
-- Grant ChatBees app read-only access to mystage  
GRANT USAGE ON DATABASE spcs_db TO APPLICATION chatbees_app;  
GRANT USAGE ON SCHEMA spcs_db.spcs_schema TO APPLICATION chatbees_app;  
GRANT READ ON STAGE spcs_db.spcs_schema.mystage TO APPLICATION chatbees_app;  
  
SELECT chatbees_app.api.ingest_files('llm_documents', 'spcs_db.spcs_schema.mystage');
```

Finally, you can query your collection. ChatBees supports both RAG and semantic search.

```sql
-- Semantic search, returns top 15 results
SELECT chatbees_app.api.search_collection('llm_documents', 'what is LLM', 15);  
  
-- RAG  
SELECT chatbees_app.api.ask('llm_documents', 'what is LLM');  
```

### Configure models

You can configure ChatBees to use a variety of models from different vendors. ChatBees can also host a HuggingFace embedding model for you within the app.\
Select embedding and completion models from supported vendors below, they can be from the same or different vendor.

**Supported embedding model vendors and hosting options**

* **Snowflake Cortex**: All models
* **Hosted HuggingFace**: All public models. Contact us if you need access to gated models
* **OpenAI**: All models
* **Azure OpenAI**: All models

**Supported completion model vendors and hosting options**

* **Snowflake Cortex**: Llama3.1 and 3.2 models. Support for other models is coming soon
* **OpenAI**: All models
* **Azure OpenAI**: All models
* **Anthropic**: (coming soon)

*More vendors and hosting options can be supported by request*

If you choose a third-party model, network access must be set up under "Connections" tab.

For huggingface models: ChatBees will download the relevant model file from one of HF's CDN endpoints (cdn-lfs.hf.co, cdn-lfs-eu-1.hf.co, cdn-lfs-us-1.hf.co)&#x20;

For OpenAI models: ChatBees will connect to api.openai.com&#x20;

For Azure OpenAI models: ChatBees will connect to your Azure OpenAI deployment at .openai.azure.com

Next, configure access credentials if you use OpenAI or Azure OpenAI models.

```sql
-- Configure OpenAI
CALL  chatbees_app.admin.configure_openai('SET', '<openai_key>');

-- Configure Azure OpenAI
CALL  chatbees_app.admin.configure_openai('SET', '<azure_openai_key>', '<azure_openai_endpoint>', '<api_version>');
```

Finally, specify the embedding and completion models you'd like to use. You can mix and match embedding and completion models from any supported vendor! Make sure to select a GPU instance if you're running embedding model within ChatBees for optimal performance.

```sql
-- Use Alibaba-NLP/gte-large-en-v1.5 embedding model hosted within ChatBees  
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S');  

-- Use OpenAI
CALL chatbees_app.admin.start_app('openai/text-embedding-3-large', 'openai/gpt-4o', 'CPU_X64_S');  

-- Use Azure OpenAI
CALL chatbees_app.admin.start_app('azureopenai/embedding_deployment', 'azureopenai/completion_deployment', 'CPU_X64_S');  

-- Mix and match!
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'openai/gpt-4o', 'CPU_X64_S');  
```

### Update embedding and completion models.

You can restart ChatBees app with a different **completion** model at any time. You must call `admin.stop_app()` to stop any existing ChatBees service.

```sql
-- Use llama3.1-8b completion model  
CALL chatbees_app.admin.stop_app();
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-8b', 'CPU_X64_S');  
  
-- Later on, experiment with llama3.1-70b model. 
-- Simply restart the ChatBees app with the new config.  
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  
```

However, **embedding** model can only be changed when the app does not have any collection. If you need to change the embedding model, all existing collections must be deleted first.

```sql
-- Use cortex/voyage-multilingual-2 embedding model  
CALL chatbees_app.admin.start_app('cortex/voyage-multilingual-2', 'cortex/llama3.1-70b', 'CPU_X64_S');  
  
-- Later on, experiment with multilingual-e5-large embedding model.  
  
-- All collections must be dropped first  
SELECT chatbees_app.api.delete_collection(*) FROM VALUES ('col1'), ('col2'), ('col3'), ... ;  
-- Confirm there're no collection left  
SELECT chatbees_app.api.list_collections();  
-- Change embedding model, then recreate collections and ingest data
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');  
```

### Performance considerations

The performance of ChatBees app depends on both ChatBees RAG engine performance as well as the performance of the underlying embedding and completion models.

For example, if ChatBees is configured with Snowflake Cortex, the latency you experience will depend directly on the performance of the Cortex models.

* Data ingestion: uses **embedding** model
* Semantic search: uses **embedding** model
* RAG: uses both **embedding** and **completion** models

You can export access logs to view a detailed breakdown of RAG performance

```sql
-- Export access logs
SELECT chatbees_app.api.export_access_logs();

-- Inspect performance metrics
select * from chatbees_app.access_logs.access_log_1730933619388682450;

-- Below is an example of a RAG metric. 4.8s was spent in in completion model. 
-- You can choose a more performant model to reduce the overall runtime of the API call.
{
  "completion_time_ms": 4892,
  "duration_ms": 5501,
  "get_embedding_time_ms": 0,
  "vector_search_time_ms": 463
  "input_tokens": 8025,
  "output_tokens": 597,
  ... other metrics
}
```

Tips to optimize the performance of your ChatBees app:

* Completion time too high: Try a smaller model or vendor
* Embedding time too high: Try a smaller model or vendor
* Embedding time too high (huggingface): Try a smaller embedding model, or use a GPU instance

### APIs

#### admin.start\_app

Starts the ChatBees app. You can invoke this procedure again to change embedding or completion model.

* `embedding_model`: The embedding model to use. Format is `'<vendor>/<model>'`
* `completion_model`: The completion model to use. Format is `'<vendor>/<model>'`
* `instance_family`: The instance family of ChatBees compute. We recommend `CPU_X64_S` if you are not running huggingface embedding model.

```sql
-- Example1: Use cortex embedding and completion models
CALL chatbees_app.admin.start_app('cortex/multilingual-e5-large', 'cortex/llama3.1-70b', 'CPU_X64_S');  

-- Example2: Use https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5 embedding model. 
-- Use a GPU instance to speed up embedding computation.
CALL chatbees_app.admin.start_app('Alibaba-NLP/gte-large-en-v1.5', 'cortex/llama3.1-70b', 'GPU_NV_S'); 
```

#### admin.stop\_app

Stops the ChatBees app and stops compute pool.

```sql
CALL chatbees_app.admin.stop_app();
```

#### admin.service\_status

Returns the current status of ChatBees app (e.g. `PENDING`, `READY`)

```sql
CALL chatbees_app.admin.service_status();
```

#### admin.get\_external\_access\_config

Returns the access configuration required.

* **reference\_name**: Name of the external access config. Only `model_vendor` is supported.

```sql
-- Example: Get model vendor external access integration config
CALL chatbees_app.admin.get_external_access_config('model_vendor');
```

#### admin.configure\_openai

Configures connection to OpenAI

* **api\_key**: OpenAI API key

```sql
CALL chatbees_app.admin.configure_openai('apikey');
```

#### admin.configure\_azure\_openai

Configures connection to Azure OpenAI

* **api\_key**: Azure OpenAI API key
* **endpoint**: Azure OpenAI endpoint
* **api\_version**: Azure OpenAI API version

```sql
CALL chatbees_app.admin.configure_azure_openai('apikey', 'https://org.openai.azure.com/', '2024-06-01');
```

#### admin.configure\_huggingface

(coming soon) Configures connection to Huggingface to use gated repositories

* **api\_key**: OpenAI API key

```sql
CALL chatbees_app.admin.configure_huggingface('apikey');
```

#### api.create\_collection

Creates a collection. A Collection serves as the fundamental unit for data organization. You can put different data sets into different collections.

* **collection\_name**: Name of the collection

```sql
SELECT chatbees_app.api.create_collection('hello');
```

#### api.list\_collections

Lists all collections in ChatBees

```sql
  SELECT chatbees_app.api.list_collections();
```

#### api.delete\_collection

Deletes a collection and its content

* **collection\_name**: Name of the collection

```sql
  SELECT chatbees_app.api.delete_collection('hello');
```

#### api.ingest\_files

Ingests files from stage into ChatBees RAG pipeline. Please make sure to grant ChatBees app READ privilege of the stage, as well as USAGE privilege of the parent schema and database. This function supports incremental ingestion. Previously ingested files will not be ingested again.

* **collection\_name**: The name of the collection to ingest into
* **stage\_path**: The fully qualified name of the stage

If you accidentally deleted a file and wish to ingest it again, please re-upload the file into stage, refresh directory table, then invoke `ingest_files`. **Note**: You must enable directory table on the stage and make sure it is refreshed before calling `ingest_files`.

```sql
-- Example: Ingest all files from mydb.myschema.mystage
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');

-- Add or modify new files from stage

-- Ingest again. New files will be ingested into collection
SELECT chatbees_app.api.ingest_files('col1', 'mydb.myschema.mystage');
```

#### api.list\_files

Lists all files inside a collection.

* **collection\_name**: Name of the collection

```sql
SELECT chatbees_app.api.list_files('col1');
```

#### api.delete\_file

Deletes a file from a collection

* **collection\_name**: Name of the collection
* **file\_name**: Name of the file to delete

```sql
SELECT chatbees_app.api.delete_file('col', 'myfile.pdf');
```

#### api.ask

ChatBees RAG API. Gets conversational answer to your question, based on data inside your collection.

* **collection\_name**: Name of the collection
* **question**: Question to ask

```sql
SELECT chatbees_app.api.ask('col', 'What is a transformer?');
```

#### api.search

ChatBees semantic search API. Returns `top_k` most relevant results from `collection_name`

* **collection\_name**: Name of the collection
* **question**: Question to ask
* **top\_k**: How many search results to return

```sql
SELECT chatbees_app.api.search('col', 'What is a transformer?', 10);
```

#### api.export\_access\_logs

Exports all access logs into a snowflake table.

```sql
SELECT chatbees_app.api.export_access_logs();
```

### Cost considerations

ChatBees app uses the following resources

* 1x `MEDIUM` warehouse for running queries. This warehouse is auto-suspended if ChatBees is not actively processing requests.
* 1x compute pool with one node to run ChatBees Service. You can specify the compute pool instance family at app startup.

You can stop ChatBees app at anytime and resume later. Our cloud-native architecture ensures that your data is always persisted on durable storage.
