Over the past year or so, open source AI models have significantly caught up in performance with popular closed source models from OpenAI, Google, and others. However, they have not been widely adopted by developers due to the overhead involved in deploying and maintaining these models on different hardware. To solve this problem, Hugging Face today announced Hugging Face Generative AI Services (HUGS), an optimized, zero-configuration inference microservice for developers to accelerate the development of AI apps based on open models. ) was announced.
HUGS model deployment also provides an OpenAI-compatible API for drop-in replacement of existing apps built on the model provider API. This allows developers to easily migrate from OpenAI model-based apps to open source model-based apps.
HUGS is built on open source technologies such as text generation inference and transformers. Optimized to run open models on a variety of hardware accelerators, including NVIDIA GPUs, AMD GPUs, AWS Inferentia (coming soon), and Google TPUs (coming soon). Thirteen popular open LLMs are currently supported, including Meta’s LLaMa, and more will be supported in the future. HUGS can be deployed on Amazon Web Services, Google Cloud Platform, and Microsoft Azure (coming soon). HUGS offers on-demand pricing based on the uptime of each container on the public cloud.
According to Hugging Face, HUGS offers the following benefits:
- Within your infrastructure: Deploy an open model within your own secure environment. Keep your data and models off the internet.
- Zero-configuration deployment: HUGS reduces deployment time from weeks to minutes with zero-configuration setup and automatically optimizes model and service configurations for NVIDIA, AMD GPUs, or AI accelerators.
- Hardware-optimized inference: HUGS is built on Hugging Face’s text generation inference (TGI) and is optimized for best performance across a variety of hardware configurations.
- Hardware flexibility: You can run HUGS on a variety of accelerators, including NVIDIA GPUs and AMD GPUs. Support for AWS Inferentia and Google TPU will also be coming soon.
- Model flexibility: HUGS is compatible with a wide range of open source models, ensuring flexibility and choice for your AI applications.
- Industry-standard API: Easily deploy HUGS using Kubernetes with OpenAI API-compatible endpoints and minimal code changes.
- Enterprise Distribution: HUGS is an enterprise distribution of Hugging Face open source technology, offering long-term support, rigorous testing, and SOC2 compliance.
- Enterprise Compliance: Minimize compliance risks by including the necessary licenses and terms of use.
Learn more about HUGS here. With a focus on open source and ease of use, HUGS has the potential to democratize access to powerful AI models and accelerate the development of innovative AI applications.