📖Serverless RAG

ChatBees Serverless Retrieval-Augmented Generation (RAG) is a new service designed from the ground up to provide a scalable and intelligent solution for information retrieval and response generation.

The Serverless architecture of ChatBees includes three key components:

Intelligent Ingest Engine

The Ingest Engine seamlessly integrates with Website, Google Drive, Confluence, Notion, and numerous text-based formats such as PDF, CSV, DOCX, MD and TXT files. The engine transforms the data into a structured format that is both searchable and understandable by machines. It generates embeddings, or vector representations of the data, facilitating the execution of complex queries with notable accuracy and speed. The ingest engine is designed to be dynamic and enables continuous data ingestion for operations teams

Data Engine

Once the data is ingested, it is logged and stored in a secure Blob storage. The Index Builder loads the data logs and creates indices for the data and vector embeddings. These indices are crucial for enabling quick retrieval, helping the service to find the most relevant information efficiently. The data and vector indices are also stored in the secure Blob storage. The vector engine is developed by ChatBees and enables a scalable, secure vector store that is able to retrieve data efficiently.

By storing all data in Blob storage, ChatBees achieves dynamic scalability of compute resources and becomes full serverless. The engine includes a Caching layer, which stores frequently accessed data to decrease latency and enhance response times for common queries, providing users with quick answers.

Intelligent Agentic Retrieval Engine

The Retrieval Engine is a key component to ChatBees service. It processes user requests and retrieves the appropriate data and vector embeddings. It is an agentic framework that applies techniques including auto-tuning, auto-reranking, etc, to identify and retrieve the most relevant information based on the query and embeddings. We use reflection techniques to auto-tune the output. This self-improving mechanism ensures that the service becomes smarter and more efficient with each query it processes.

Last updated