AI models

In the Ozeki AI Chat system, AI models generate intelligent responses, enhancing user experience across various applications. By utilizing local GGUF and vLLM AI models, and On-line AI models, such as Chat GPT or Copilot, users can achieve amazing productivity by building AI driven workflows. This article gives you an insight into the available AI models you can use when you setup your Ozeki AI Chat system.

What is an AI model

An AI model refers to a type of artificial intelligence (AI) designed to simulate human-like conversations through text-based interactions. These models use natural language processing (NLP) and machine learning algorithms to understand and respond to user queries in a more context-dependent and personalized manner. Chat models can serve various purposes, including customer service, technical support, language translation, and even providing recommendations based on user preferences.

Figure 1 - The role of the AI model in the Ozeki Chat System

AI models in the Ozeki AI Chat system

AI models in the Ozeki AI chat system are responsible for generating responses from a chat history and a user prompt. The role of such models are simple: They provide the chat bots with smart responses.

Local GGUF AI models

Running local AI models stored in GGUF files on Windows using Ozeki AI chat offers several advantages. Firstly, it allows for efficient use of consumer-grade hardware, eliminating the need for high-end GPUs or specialized equipment. This makes advanced AI capabilities more accessible and cost-effective. The GGUF format enhances interoperability and standardization, ensuring seamless integration with various tools and platforms. There are thousands of downloadable GGUF AI models on the Internet. Additionally, Ozeki AI leverages quantization techniques to reduce model size and memory footprint while maintaining performance. This results in faster processing times and lower resource consumption.

Figure 1 - Local GGUF AI models

Moreover, running models locally ensures data privacy and security, as sensitive information remains on the user’s machine. These benefits collectively make this setup ideal for developers, researchers and businesses looking to harness the power of large language models efficiently and securely.

Learn how to use local GGUF AI models in Ozeki AI Chat

Local vLLM AI models

To distribute AI load among local servers the best way is to install Linux servers with vLLM AI models. Ozeki AI chat can connect to such local AI models hosted on Linux as vLLMs and can use them as information providers for chat bots. This gives you the ability to offload your workload to multiple Linux servers where the AI processing is done. It also gives you access to models not offered as GGUF files.

Figure 2 - Local vLLM AI models

If you would like to use this option, you must understand what a vLLM is. A vLLM (Virtual Large Language Model) is a high-throughput, memory-efficient serving engine designed for large language models. It runs on Linux. It is written in Python. It uses pytorch to serve AI models as a simple webserver. It supports a wide range of open-source models, including those from HuggingFace Transformers1. vLLM optimizes GPU memory usage and enhances inference performance through techniques like PagedAttention and continuous batching. This makes it ideal for applications requiring fast and efficient processing of large-scale language models.

Using local VLLM (Virtual Large Language Model) AI models offers several advantages. Firstly, they provide increased throughput, as more servers can be used allowing more requests to be processed per second compared to running multiple models on a single server. This is complemented by reduced latency, ensuring faster response times for individual queries1. Additionally, local VLLM models are memory efficient, optimizing GPU memory usage to support larger models or more concurrent requests. This efficiency translates into cost-effectiveness, as better hardware utilization reduces the overall cost of serving large language models1.

Moreover, running models locally enhances privacy and control, as data remains on the user’s machine, ensuring higher security and customization4. These benefits make local VLLM AI models a powerful tool for various applications.

Local AI piplene

An AI pipline is a chain of models. It is also refferred to as Regressive Augmented Generration (RAG). The idea behind it is to ask the AI multiple questions and use the answers returned in subsequent questions to get to the desired results. When you build an AI pipeline in Ozeki AI studio, it will be available for chatbots as if it was a single local AI model.

Figure 3 - AI pipelines

This approach allows for incremental refinement of responses, where each model’s output informs the next step, leading to more accurate and contextually relevant results. By asking the AI multiple questions and using the answers in subsequent queries, an AI pipeline can handle complex tasks more effectively than a single model. This method also enhances flexibility and adaptability, as different models can be tailored to specific subtasks within the pipeline. Additionally, it improves efficiency by breaking down large problems into manageable parts, ensuring a more streamlined and coherent solution. Overall, Ozeki AI pipeline leverages the strengths of multiple models to achieve more precise and comprehensive outcomes.

On-line AI models

Online AI models are artificial intelligence models that are hosted and run on remote servers, typically accessed via the internet and typically you are required to pay a fee to use them. These models are often part of cloud-based services, allowing users to leverage powerful computational resources without needing specialized hardware.

Figure 4 - Online AI models

The primary advantages of online AI models include lower maintenance requirements, as the servers they run on are hosted and operated by a 3rd entity. These models are often updated and trained by other users inputs as well, so they become smarter over time. Since online AI models benefit from collaborative development, where multiple users contribute to and enhance the model’s capabilities, it is good option for education, non-for-profit organizations and for businesses where data privacy is not a fundamental aspect of the business.

Learn how to configure OpenAI's Chat GPT AI Model in Ozeki AI Chat

Conclusion

To sum it up, AI models play a pivotal role in Ozeki AI technology, enabling sophisticated and context-aware interactions across various applications. The Ozeki AI Chat system exemplifies the versatility and efficiency of these models, particularly through the use of local GGUF and vLLM AI models. Ozeki also offers access to on-line AI models such as ChatGPT, Copilot and others to allow users to take advantage of the capabilities of such services. By leveraging these technologies, users can achieve high performance, cost-effectiveness, and enhanced data privacy. Whether for developers, researchers, or businesses, the integration of AI models into local systems offers a robust solution for harnessing the power of artificial intelligence securely and efficiently.

More information