Together AI is a cloud platform that enables developers and enterprises to run, fine-tune, and deploy open-source large language models (LLMs) at scale with high performance and cost efficiency. The platform provides access to a wide range of open-source models including LLaMA, Mistral, and others through a unified API, along with tools for custom model fine-tuning and inference optimization. Together AI also conducts AI research and has developed its own inference infrastructure designed to deliver fast and affordable generative AI capabilities.
Founded
2022
Company Size
51-200 employees
Headquarters
San Francisco, USA
Funding
Series B
Drop-in replacement endpoints for OpenAI APIs supporting text, code, and multimodal models.
Deploy models on dedicated infrastructure optimized for speed, control, and economics.
Self-service GPU clusters up to 64 NVIDIA GPUs deployable in minutes with no wait times.
Inference engine up to 4x faster with cutting-edge optimizations and Together Kernel Collection.
Full and lightweight fine-tuning with private data control for 200+ open-source models.
Automatic scaling for GPU clusters to maintain performance without paying for idle resources.
Dedicated Grafana dashboards with GPU, networking, storage, and Kubernetes telemetry.
Free data transfer with managed storage optimized for AI workloads.
Fully managed inference API that auto-scales with request volume.
Access to H100, H200, B200, GB200 GPUs with InfiniBand and NVLink networking.
Turn-key health checks and remediations for production-ready infrastructure.
No subscriptions; pay per token for inference/fine-tuning or hourly for GPU rentals.
Managed orchestration supporting Kubernetes and Slurm for AI workloads.
Hosts 200+ models including LLaMA, DeepSeek, Qwen, Mixtral for text, code, and multimodal use.
SDKs and RESTful APIs for developers and enterprise teams.
Common questions about Together AI features, pricing, and capabilities
Together AI supports a wide range of state-of-the-art open-source models including Llama 3, Mixtral, Qwen, and Stable Diffusion. We constantly update our library to include the latest releases, allowing you to run inference on high-performance models without managing your own infrastructure.
Yes, Together AI provides a dedicated fine-tuning API that allows you to customize open-source models with your proprietary data. Our infrastructure is optimized for efficient training, enabling you to create specialized versions of models like Llama for your specific business use cases.
While the Inference API provides serverless access to models, GPU Clusters offer dedicated H100 or A100 instances for large-scale training and research. These clusters are interconnected with high-speed networking, making them ideal for organizations building their own foundational models from scratch.
You can get started in minutes by creating an account, generating an API key, and using our playground to test prompts. Our serverless infrastructure handles the deployment automatically, so there is no need to manually provision servers or manage complex Kubernetes clusters.
We provide comprehensive documentation, migration guides, and code samples to help you move your workloads from local hardware to our cloud. Our support team can also assist in optimizing your inference parameters to ensure you achieve the best performance-to-cost ratio.
Together AI offers an OpenAI-compatible API, meaning you can often switch your existing integrations by simply changing the base URL and API key. This allows developers to migrate their applications to open-source models with minimal code changes and no architectural overhaul.
We provide official SDKs for Python and TypeScript/JavaScript to streamline your development process. Additionally, because we use a standard RESTful API, you can integrate Together AI services into any environment that supports HTTP requests, including Go, Ruby, and Java.
Our Inference API uses a transparent pay-as-you-go model based on the number of tokens processed (for LLMs) or images generated. This allows you to scale your application without upfront costs, only paying for the exact volume of requests your users generate each month.
Yes, we offer custom enterprise pricing and committed use discounts for customers with high-volume requirements. If your application processes millions of tokens daily or requires dedicated capacity, our sales team can structure a plan that reduces your effective cost per request.
We prioritize data privacy and do not use your input data or output generations to train our base models. All data is encrypted in transit using TLS, and we offer enterprise-grade security features to ensure that your proprietary information remains confidential during processing.
Together AI is committed to high security standards and maintains SOC 2 Type II compliance. We provide the necessary documentation and security controls required by enterprise legal and IT departments to ensure our infrastructure meets rigorous data protection requirements.
All users have access to our extensive documentation, community forums, and email support. Enterprise customers receive enhanced support packages, which include dedicated Slack channels, faster response times, and direct access to our engineering team for architectural reviews.
Pay-as-you-go serverless inference for LLMs and Vision models
Starting at
$0.00/month
Price per 1M input tokens
Price per 1M output tokens
Flat rate per 1M tokens for input/output
Single-tenant GPU instances with guaranteed performance
Starting at
$2872.80/month
Hourly rate for H100 instance
Hourly rate for H200 instance
Pay as you go GPU capacity on an hourly basis
Contact for pricing
On-demand hourly rate
Reserved GPU capacity for 6+ months
Contact for pricing
Contact sales for reservation pricing
Train open-source models (SFT and DPO)
Starting at
$0.00/month
Supervised Fine-Tuning LoRA for models up to 16B
Direct Preference Optimization Full Fine-Tuning for 70-100B models
High-bandwidth parallel filesystem
Starting at
$0.00/month
Storage cost per GiB per month
User reviews coming soon
We're building our review system to help you make informed decisions.
Performance data coming soon
We're collecting uptime and performance metrics to provide comprehensive insights.