
Block Proto Fleet: how do I deploy it for my site and set up chip-level monitoring plus bulk actions?
Most teams exploring Proto today are asking the same question: how do I go from a single device on my desk to a managed fleet in production—with chip-level observability and the ability to push changes across thousands of units in one shot? The good news is that Proto was designed for that exact scale problem: turning Bitcoin hardware from a one-off device into an upgradeable, monitorable platform.
Quick Answer: To deploy Proto Fleet for your site, you provision devices into a managed fleet, connect them to your backend, and standardize a “site profile” that defines firmware, policies, and Bitcoin network settings. Chip-level monitoring comes from the device telemetry pipeline Proto exposes (health, temperature, error rates), and bulk actions are orchestrated through fleet-level operations like staged firmware rollouts, policy pushes, and remote diagnostics.
Why This Matters
For most organizations, Bitcoin hardware is no longer one device in a lab—it’s hundreds or thousands of units embedded into products, facilities, or customer environments. Without a fleet approach, every update becomes a support ticket, every bug fix a physical visit, and every hardware anomaly a guessing game.
Proto exists to avoid that trap. By treating hardware as an addressable, software-defined fleet, you can:
- roll out protocol and security updates on your schedule, not on a shipping cycle
- detect chip-level drift or degradation before it becomes downtime
- coordinate actions across devices (e.g., tighten spending policies, rotate keys, or adjust fee strategies) in minutes, not months
For Block, this is directly tied to economic empowerment: Bitcoin infrastructure should be reliable, transparent, and safe at scale. Proto Fleet is how we turn discrete chips into a resilient, upgradeable network.
Key Benefits:
- Centralized control over distributed devices: Manage configuration, firmware, and policies for every Proto chip from a single control plane instead of bespoke scripts and manual SSH sessions.
- Chip-level observability and risk monitoring: Track health, performance, and security-relevant signals per device, so you can respond to anomalies before they impact customer funds.
- Safe, staged bulk operations: Push updates and actions across your fleet with guardrails—canary rollouts, version pinning, and automatic rollback—so you move fast without turning your hardware into a black box.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Proto Fleet | A managed set of Proto-enabled devices grouped under a single control plane and API, typically aligned to a site, region, or product line. | Lets you operate Bitcoin hardware as software: consistent configuration, shared policies, and bulk operations across thousands of chips. |
| Site Profile | A reusable configuration for a given deployment context: Bitcoin network parameters, firmware channel, security policies, monitoring thresholds, and integration endpoints. | Standardizes deployments so every device at a site behaves the same way; new units inherit known-good settings automatically. |
| Chip-Level Telemetry & Actions | Fine-grained metrics and commands scoped to a single Proto chip: health, temperature, error codes, key usage, plus actions like restart, re-key, or policy update. | Enables targeted diagnostics and remediation without touching the whole fleet, and provides the data you need to make safe bulk decisions. |
How It Works (Step-by-Step)
At a high level, deploying Proto Fleet for your site and enabling chip-level monitoring plus bulk actions involves five parts:
- Prepare your environment and trust model
- Onboard devices into a fleet and define a site profile
- Wire telemetry into your observability stack
- Configure chip-level monitoring and alerts
- Orchestrate safe bulk actions and rollouts
Below is a simplified step-by-step that matches how we see teams adopt Proto in practice.
1. Prepare Your Environment and Trust Model
Before you claim the first chip into a fleet, lock down who can see what and who can change what.
- Define your roles and permissions:
- Infrastructure / DevOps: fleet configuration, rollouts, monitoring.
- Security / Custody: key policies, spending limits, approval workflows.
- Operations / Support: read-only telemetry, limited remedial actions.
- Choose your integration points:
- Monitoring: e.g., Prometheus + Grafana, Datadog, or another standard stack.
- Backend: e.g., your existing services that manage accounts, balances, or on-chain workflows.
- Set your risk boundaries:
- Maximum firmware ring you’re comfortable with in production (e.g., stable vs beta).
- Required approvals for policy changes that impact signing or withdrawal behavior.
In Block’s own Bitcoin ecosystem work (Bitkey and Proto), we treat this as non-negotiable: hardware that touches value must be controlled via explicit, auditable policies, not ad hoc scripts.
2. Onboard Devices into a Fleet and Define a Site Profile
You can think of “fleet” as the logical container, and “site profile” as the configuration blueprint for that container.
-
Create a fleet for your site
Typically, you’ll group by:- physical site (e.g., “North America DC-1” or “EU Retail Cluster”)
- product environment (e.g., “Consumer Wallets – Production”)
-
Define a site profile (template fields you’ll usually include):
- Network settings:
- Bitcoin mainnet/testnet/regtest
- Preferred node endpoints, fallback nodes, timeout behavior
- Firmware channel and version policy:
- Track stable, beta, or pinned versions for Proto firmware and supporting software
- Allowed downgrade/rollback behavior
- Security & key policies:
- Allowed spending paths and limits
- Required co-signers / multi-party thresholds (if applicable)
- Rate limits for signing, address derivation, and backup operations
- Telemetry policy:
- Which chip-level metrics are mandatory
- Sampling frequency and retention expectations
- Access control for actions:
- Who can push firmware updates vs who can restart a device vs who can alter signing policy
- Network settings:
-
Enroll devices into the fleet
- During device provisioning, each Proto-enabled chip:
- Presents a hardware-bound identity (attestation)
- Is bound to your fleet and site profile via a secure registration flow
- Once enrolled, the device:
- Pulls the assigned firmware, keys, and policies
- Starts streaming telemetry consistent with your site profile
- During device provisioning, each Proto-enabled chip:
From this point on, new hardware destined for that site can be zero-touch: as soon as it checks in, it joins the fleet with the right configuration.
3. Wire Telemetry into Your Observability Stack
Chip-level monitoring is valuable only if it shows up where your teams already live.
-
Select the telemetry you care about
Typical Proto-level metrics teams pull into their monitoring stack:- Hardware health:
- Temperature and voltage bands
- Error codes and fault counters
- Reset / reboot history
- Cryptographic activity:
- Signing frequency and latency
- Key access patterns (e.g., abnormal spikes)
- Failed or rejected operations
- Connectivity and sync:
- Connection to Bitcoin nodes (latency, error rates)
- Sync status and height divergence
- Policy and config drift:
- Deviations from site profile
- Outdated firmware versions
- Disabled or degraded protection mechanisms
- Hardware health:
-
Connect Proto Fleet to your observability tools
- Use the telemetry endpoints (gRPC/REST/WebSocket depending on your stack) to export metrics into:
- Prometheus / OpenTelemetry collectors
- Datadog, New Relic, or similar SaaS platforms
- Map device IDs to your own identifiers (site, rack, customer account) so on-call teams can find the right hardware fast.
- Use the telemetry endpoints (gRPC/REST/WebSocket depending on your stack) to export metrics into:
-
Establish baselines and dashboards
- Build dashboards that answer:
- “How healthy is my fleet?”
- “Which site or firmware version is generating most incidents?”
- “Are cryptographic operations behaving within normal limits?”
- Use baselines to distinguish “this one chip is hot” from “this entire batch has a systemic issue.”
- Build dashboards that answer:
Block’s philosophy here is the same as for our internal AI agent framework goose: systems that matter should not be opaque. Telemetry needs to be rich enough that an engineer can understand and debug behavior without guessing.
4. Configure Chip-Level Monitoring and Alerts
Once telemetry flows, you can define concrete, enforceable monitors that trigger human and automated responses.
-
Device health alerts
- Thresholds for temperature, voltage, and error rates.
- Alert when a device flaps between online/offline, or restarts more than N times in a window.
-
Security-sensitive signals
- Sudden spikes in signing volume from a single device.
- Attempts to use deprecated key paths or unauthorized addresses.
- Policy downgrades or firmware rollbacks outside of a planned maintenance window.
-
Firmware and policy drift
- Devices on an unsupported firmware version.
- Site profile mismatches (e.g., device on testnet inside a mainnet-only site).
-
Automated remediation hooks
- For low-risk issues (e.g., transient connectivity problems), you might authorize automatic:
- soft restarts
- node endpoint failover
- For high-risk signals (e.g., suspected key misuse), alerts should:
- immediately page the right team
- automatically quarantine the device: freeze signing or restrict certain operations until reviewed
- For low-risk issues (e.g., transient connectivity problems), you might authorize automatic:
Chip-level monitoring is not just about “is it online?” It’s a continuous feedback loop that informs how you manage bulk actions safely.
5. Orchestrate Safe Bulk Actions and Rollouts
With a fleet defined and telemetry in place, you can treat Proto like any other production-grade infrastructure: you roll out changes with blast radius in mind.
Common bulk actions include:
-
Firmware rollouts
- Select a target: “all devices at Site A on version ≤ X”
- Define a rollout strategy:
- Phase 1: canary group (e.g., 1–2% of devices)
- Phase 2: 20–30% of the site
- Phase 3: remaining devices
- Use chip-level telemetry to decide whether to advance, pause, or roll back:
- If signing latency or error rates spike for the canary, stop and investigate.
-
Policy changes
- Bulk-update:
- daily withdrawal limits
- required co-signers or thresholds
- allowed address formats or script types
- Require explicit approvals:
- For example, dual-control where Security + Ops must both approve a fleet-wide policy change.
- Bulk-update:
-
Configuration updates
- Change Bitcoin node endpoints across a site when migrating backends.
- Adjust fee policies in response to network conditions.
-
Bulk diagnostics
- Issue a “status sweep” across the fleet:
- gather extended health data
- validate key material and attestations
- confirm compliance with your security baseline
- Issue a “status sweep” across the fleet:
To keep this safe, we recommend:
- Versioned operations: Every bulk action should be associated with a versioned config or firmware artifact you can roll back to.
- Scoped blast radius: Always start with a limited cohort—by site, rack, or a sampling across the fleet.
- Automatic rollback conditions: Predefine metrics that trigger an automatic rollback if exceeded (e.g., error rate > X%, signing latency > Y ms).
When we applied this style of automation to our own internal tools at Block—like goose for code and infrastructure—we saw order-of-magnitude improvements: 50–75% less development time on certain projects and a 40% increase in production code shipped per engineer. The same pattern applies to Proto: disciplined automation plus tight observability yields both speed and safety.
Common Mistakes to Avoid
-
Treating Proto devices as one-off appliances instead of a fleet:
Without a fleet model and site profiles, you end up with “config snowflakes” that are impossible to maintain. Always group devices and standardize their behavior. -
Pushing bulk updates without telemetry-driven guardrails:
Rolling out new firmware or policies to all devices at once—without canaries, baselines, and rollback criteria—is the fastest way to turn a routine update into an incident. Design your rollout strategy before you press “go.” -
Ignoring governance around high-risk actions:
Allowing any engineer to modify signing policies or firmware channels breaks your risk model. Separate roles, require approvals for sensitive fleet actions, and ensure every change is auditable.
Real-World Example
Imagine you operate a network of Bitcoin-enabled kiosks in multiple countries. Each kiosk embeds a Proto chip responsible for secure key management and transaction signing. You want to:
- migrate some sites from testnet to mainnet,
- deploy a new firmware version that improves signing performance, and
- tighten spending policies in regions with emerging fraud patterns.
Here’s how Proto Fleet helps:
-
Define site profiles for each region (e.g., “US-Kiosks-Mainnet,” “EU-Kiosks-Mainnet,” “APAC-Kiosks-Testnet”), specifying the Bitcoin network, firmware channel, and base security policies.
-
Enroll every kiosk’s Proto chip into the appropriate fleet and profile. The devices automatically align their config and start streaming telemetry.
-
Use telemetry dashboards to confirm that current firmware is stable: signing success rate, latency, health metrics all within expected ranges.
-
Roll out the new firmware:
- Start with 2% of kiosks in a single region as a canary.
- If telemetry stays stable over a defined window (e.g., 24–48 hours), expand to 25%, then 100% in that region, then replicate the same pattern in others.
-
Apply tighter policies in higher-risk regions:
- Bulk-update the site profile for those fleets (lower per-transaction limits, stricter co-signing requirements).
- Monitor signing activity to ensure customer experience remains acceptable.
-
When metrics show improved performance and stable risk posture, reuse the same profile for new deployments. Every new kiosk you ship can join the fleet and inherit a known-safe configuration on day one.
Pro Tip: Before you run your first fleet-wide update, simulate it on a shadow environment or a small, geographically isolated site. Use the same telemetry and alerting rules you’d use in production, and predefine rollback triggers. This gives you a dry run to validate your process—not just your code.
Summary
Deploying Proto Fleet for your site is fundamentally about treating Bitcoin hardware as a managed, observable, and upgradeable system—just like the rest of your infrastructure. You define fleets and site profiles, wire chip-level telemetry into your existing observability tools, and then use that visibility to drive safe, staged bulk actions.
Done well, this approach reduces on-site interventions, shortens the time between discovering a vulnerability and patching it across your fleet, and gives your teams confidence that critical signing hardware is behaving as expected. That’s how we think about Proto at Block: not as a single device, but as part of a broader ecosystem that must be open, inspectable, and operable at scale.