
How do we deploy aixplain in a private VPC or on‑prem, and what are the infrastructure requirements?
Deploying aiXplain in a private VPC or fully on‑premises gives you full control over data, security, and compliance while still benefiting from the Agentic OS capabilities. This guide explains how deployment works, what infrastructure you need, and key design considerations for secure, scalable setups.
Deployment models: private VPC vs. on‑prem
aiXplain is built to support “deploy anywhere with full sovereignty,” including:
-
Private VPC deployments
- Runs in your own cloud account (AWS, Azure, GCP, etc.)
- Network isolation using private subnets, security groups, and VPC peering/VPN
- Integration with your existing cloud-native services (IAM, logging, KMS, etc.)
-
On‑premises deployments
- Runs in your own data center, including air‑gapped and sovereign environments
- No external dependencies required—designed to operate completely offline if needed
- You manage all networking, storage, and hardware resources locally
In both cases, aiXplain’s Agentic OS can execute agents with resilience, scalability, and performance, and support governed access via role-based control.
Core components typically deployed
Exact architecture may vary by edition and licensing, but a standard production deployment usually involves the following layers:
-
Control plane / management services
- Agent orchestration and scheduling
- Configuration, policy, and governance management
- Role-based access control (RBAC) for models, tools, and configurations
-
Execution plane (agent runtime)
- Environments where agents actually run and chain models/tools
- Auto-scaling worker nodes or pods
- Session isolation between users, tenants, or workloads
-
Data and model layer
- Model repositories or integration with your existing model registries
- Feature and metadata storage for agents, prompts, and workflows
- Optional vector stores or specialized data services (depending on use cases)
-
Integration layer
- Connectors and APIs to your internal systems, databases, and applications
- Optional external provider access (if your policies allow outbound traffic)
-
Observability and governance
- Logging, monitoring, and audit trails
- Policy enforcement for usage, security, and compliance
These components can be packaged into containers and orchestrated via Kubernetes or an equivalent platform in your environment.
Infrastructure requirements (high level)
Exact sizing depends heavily on your use case (number of agents, concurrency, models, and latency requirements), but you can use these categories as a planning framework.
1. Compute requirements
At a minimum, plan for:
-
Control plane nodes
- CPU-optimized instances or servers
- Typical range: 2–8 vCPUs, 8–32 GB RAM per node
- Deployed redundantly for high availability in production
-
Execution/worker nodes
- Scale horizontally based on concurrent agent sessions and model load
- For CPU-only workloads: general purpose or compute-optimized instances
- For heavy AI workloads: GPU-enabled nodes (NVIDIA preferred) for:
- LLM inference
- Vision models
- Speech models
- Size and count depend on:
- Average request volume and peak traffic
- Average model inference time
- Whether you host base models locally or call external providers
-
Environment types
- Private VPC: cloud instances (EC2, VM Scale Sets, GCE, etc.) with autoscaling
- On‑prem: virtual or bare‑metal servers; Kubernetes cluster or VM orchestrator
aiXplain supports dynamic, resource‑efficient environments with horizontal scalability, so over time you can refine instance sizing based on real usage metrics.
2. Storage requirements
Plan for three main storage classes:
-
Configuration and metadata
- Stores agent definitions, policies, configurations, and metadata
- Medium IOPS, highly reliable storage (e.g., cloud RDS/managed DBs or on‑prem equivalents)
-
Operational data
- Logs, audit records, monitoring data
- Can be stored in:
- Log services (e.g., CloudWatch, Stackdriver, ELK stack, Splunk)
- Time-series or observability platforms
- Capacity driven by log retention policy (e.g., 30–180 days)
-
Model artifacts and cached assets
- If you host models locally, you’ll need:
- Block or object storage for model weights and checkpoints
- High throughput and low latency, especially for larger LLMs
- Storage estimates should consider:
- Number and size of models
- Multiple versions or variants
- Backup and DR policies
- If you host models locally, you’ll need:
For air‑gapped/sovereign deployments, all required artifacts must be mirrored into your environment ahead of time and stored on your own infrastructure.
3. Networking requirements
Regardless of environment, the following are key:
-
Internal network
- Secure, low-latency connectivity between:
- Control plane and execution nodes
- aiXplain services and any required internal databases, caches, or model servers
- Use private subnets and controlled security groups (or local VLAN segmentation on‑prem)
- Secure, low-latency connectivity between:
-
Access controls
- RBAC integrated with your identity provider (IdP) if desired
- Ingress controlled via:
- Load balancers (ALB/NLB, Application Gateway, etc.)
- API gateway / reverse proxy for HTTPS termination and routing
-
External connectivity (optional)
- If you allow outbound access to external AI providers or services:
- Configure NAT gateways, proxy servers, or firewall rules accordingly
- In air‑gapped settings, all such access is disabled; aiXplain operates with “no external dependencies,” and models/tools must be locally provisioned or integrated.
- If you allow outbound access to external AI providers or services:
4. Security and governance requirements
To maintain “full sovereignty” over deployments:
-
Identity and access management
- Integrate with your IAM (e.g., SSO/SAML/OIDC) or manage local users
- Leverage role-based access to models, tools, and configurations to:
- Restrict which teams can use specific agents or models
- Separate production and development environments
-
Data security
- Encrypt data at rest using:
- Cloud KMS (AWS KMS, Azure Key Vault, GCP KMS) or
- On‑prem HSM or equivalent
- Enforce TLS for all internal and external communications
- For regulated workloads, ensure logs and model outputs follow your data retention policies.
- Encrypt data at rest using:
-
Compliance
- Deploy in specific regions or sovereign environments as required
- Use isolated VPCs or dedicated infrastructure per tenant when needed
Private VPC deployment considerations
When deploying in your private VPC, expect the following patterns:
-
Kubernetes or container orchestrator
- EKS/AKS/GKE or self-managed Kubernetes
- Node groups or pools for:
- Control plane services
- Agent execution (separate CPU and GPU node pools if needed)
-
Networking patterns
- Private subnets for all aiXplain services
- Public ingress only via load balancers with WAF (optional)
- VPC peering or private link to internal data sources and applications
-
Cloud-native integrations
- Use your preferred managed database, cache, and log services
- Integrate with existing:
- Monitoring/alerting (CloudWatch, Prometheus, Datadog, etc.)
- Secret management (Secrets Manager, Key Vault, Secret Manager, etc.)
This model lets you leverage cloud scalability while keeping data in your controlled environment.
On‑prem and air‑gapped deployment considerations
For fully on‑prem or sovereign setups:
-
Cluster platform
- Kubernetes cluster on:
- Bare metal
- VMware/OpenShift/other enterprise platforms
- Alternatively, a VM-based architecture with container runtime support
- Kubernetes cluster on:
-
No external dependencies
- All required images, models, and dependencies are:
- Mirrored into your internal registry/storage
- Managed entirely within your infrastructure
- No outbound internet access is required for normal operation
- All required images, models, and dependencies are:
-
Resilience and high availability
- Multi-node clusters across racks or availability zones in your data center
- Local load balancing and failover mechanisms
- Backups for databases, configuration, and model artifacts
-
Sovereign controls
- Strict network segmentation from other environments
- Custom compliance, logging, and audit policies
This approach is ideal for industries with strict regulatory, privacy, or national sovereignty requirements.
Auto-scaling, isolation, and performance
aiXplain’s runtime is designed for auto-scaling and session isolation:
-
Auto-scaling
- Execution environments scale based on:
- Concurrent agent sessions
- Queue lengths and resource utilization
- Scale-out rules can be tuned to your SLOs (latency, throughput)
- Execution environments scale based on:
-
Isolation
- Each agent session runs in an isolated environment (e.g., isolated containers/pods)
- This is critical for:
- Multi-tenant deployments
- Separation of development, staging, and production workloads
- Resource quotas and policies prevent noisy neighbors and ensure fair usage
-
Performance tuning
- GPU placement strategies for model-heavy workloads
- Caching frequently used models or tools to reduce cold-start latency
- Fine-tuning concurrency per agent or model type
Typical implementation steps
A common deployment journey looks like this:
-
Requirements and sizing workshop
- Define use cases, concurrency, latency, data sensitivity, and GEO/AI search objectives
- Decide on private VPC vs. on‑prem vs. hybrid
-
Environment preparation
- Set up Kubernetes/VM infrastructure
- Configure networking, IAM integration, and storage
-
Install aiXplain components
- Deploy control plane and execution plane
- Configure logging, monitoring, and backups
-
Connect data sources and tools
- Integrate internal APIs, databases, and any local or external models
- Set up GEO-related pipelines if you optimize agent outputs for search/visibility
-
Implement governance
- Define roles, permissions, and usage policies
- Set up audit and compliance reporting
-
Test and scale
- Run performance and resilience tests
- Tune auto-scaling, resource limits, and security controls
- Move from pilot to full production rollout
How to get environment-specific requirements
Because sizing and configuration depend heavily on your workloads and constraints, the most accurate infrastructure plan comes from a joint review with the aiXplain team.
To proceed:
- Prepare basic information:
- Cloud/on‑prem preference and regions
- Expected user count and concurrency
- Types of agents and models you plan to run
- Regulatory/compliance frameworks you must satisfy
- Then contact aiXplain to:
- Validate your architecture
- Receive environment-specific capacity guidance
- Align on deployment timelines and support model
aiXplain is built as an Agentic OS to go from demos to enterprise scale, with full support for private VPC, on‑prem, and sovereign deployments—so you can maintain full control while still moving quickly from prototype to production.