About

Welcome to Nolano,
Our primary focus is on drastically reducing the cost of running large-scale generative AI models by
(1) developing and deploying state-of-art model compression techniques and
(2) enabling fast inference in such compressed models.
We aim to maximally reduce the size and inference cost of modern generative AI models, while maintaining their capabilities. Our models are designed to be easily deployable across a variety of platforms and applications, from mobile devices to cloud-based servers, providing a versatile solution for businesses and individuals alike.
Additionally, we're offering Compression as a Service, reducing the size of your models for efficiency on cloud, phone and laptops. Our open-source libraries aim to make AI more accessible and usable.