Getting the most out of your AI hardware and using the optimal AI models
If you're looking to get the most out of your AI hardware and models with Ozeki AI Chat, here's a breakdown of what we recommend for various levels of users, from beginners to professionals. This guide will help you choose the right hardware, scale performance efficiently, and understand AI model file naming conventions and architecture. If you are an absolute beginner, you should read our also read our AI technology overview chapter first.
Ozeki's recommended hardware
If you are new, this is what we recommend at Ozeki for you to start with:
PC: Intel i9 14900K CPU, 128GB DDR5 RAM, 4TB Nvme SSD
GPU: Nvidia RTX 3090 with 24GB VRAM
OS: Windows 11
AI software: Ozeki AI Chat
LLM model file: Meta-Llama-3.1-8B-Instruct-Q6_K.gguf
Use two SSD disks in RAID 0 configuration to double disk speed when loading models
Use two GPUs to double the GPU VRAM size. 2x24GB VRAM (48GB) can run a 70B model in Q4
Use faster GPUs: Nvidia RTX 5090 is much faster then Nvidia RTX 4090, which is much faster then Nvidia RTX 3090
Increase the number of GPUs to get as much VRAM as possible. 4 Nvidia GPUs can offer 96GB VRAM
Get a workstation motherboard with a lot of RAM slots (8 slots or 16 slots), and put a lot of RAM into your system
Get a workstation CPU, such as Intel Xeon or Amd Epyc or AMD Threadripper that can handle lots of RAM
Increase the number of GPUs to get as much VRAM as possible. 4 Nvidia GPUs can offer 96GB VRAM
Get a workstation motherboard with a lot of RAM slots (8 slots or 16 slots), and put a lot of RAM into your system
Get a workstation CPU, such as Intel Xeon or Amd Epyc or AMD Threadripper that can handle lots of RAM
Use dedicated AI GPUs: Nvidia Tesla A100 and H100 graphics cards with 80GB of VRAM are the way to go.
Use multiple servers: Build a distributed system with Ozeki Cluster to allow parallel AI execution accross servers.
(Two servers with 1 GPU each is better then 1 server with 2 GPUs.)
Use Ozeki AI chat: Ozeki AI has a high performance chat system that can distribute load automatically
between your AI servers. Ozeki AI chat can servers thousands of simultanous chat users per computer. It was designed to push as much computation
load to the clients (client browsers) and keep server resources to a minimal.
AI model file names
Ozeki AI uses the GGUF file format for AI models. The file names are formatted according to a nameing convention. It is a good idea to spend a minute by reading the following aticle, to understand the file names better. AI model file naming conventions for GGUF (GPTQ-GGML Unified Format) are designed to provide clear and consistent information about a model's architecture, quantization type, and other relevant details. GGUF is a format used for compressed, efficient versions of large language models, such as LLaMA or GPT variants, optimized for faster inference and reduced memory usage. File names in GGUF typically include key information such as the model size (e.g., number of parameters), quantization method (e.g., Q4, Q8), and sometimes the model version or specific architecture. These conventions ensure compatibility across systems and allow users to easily identify the model's characteristics at a glance.
Learn about AI model file names
AI hardware architectures
Picking the right hardware to run your AI models, and to understand which AI platform is best for your requremetns requires the understanding of their capabilities. Technologies like CPU-based vectorization (AVX2 and ARM NEON), GPU acceleration (Metal, cuBLAS, rocBLAS, Vulkan), and cross-platform frameworks (SYCL, CLBlast, Kompute) play key roles in speeding up tasks such as deep learning and data processing. Ozeki AI studio is often used on Windows system either on Generel Intel or AMD cpu and often on NVidia Cuda cpu's, such NVidia RTX 3090, NVidia RTX 4090 and NVidia RTX 5090. Learn about the differences in the hardware architectures used for AI
Understand AI hardware architectes (Nvidia Cuda, Intel AVX2, ARM NEON, Apple Metal...)
AI GPU introduction
Learn how to install the CUDA framework for your NVidia GPU, how to switch from a general purpose CPU to an Nvidia GPU in Ozeki AI Studio, learn about how to calculate GPU layer count for a given mode, and how to configure your GPU for optimal performance. and get a good overview of AI h
How to setup your NVidia GPU on WindowsHow to configure your system to use GPU for AI
How to calculate GPU layer offload count
Summary
Ozeki AI Chat aims to maximize the performance of AI systems in operation. It can be used on simple configurations, and can go up to advanced setups using multiple servers and dedicated AI GPUs like Nvidia Tesla A100/H100 for professional environments. We recommend you to start jour AI journey now and discover this amazing area of IT.