Google Sign-In Required

Use your company Google account to access the BetterFleet private content.

Back to private home

BetterFleet Support Private
Skip to content
BetterFleet Dev Wiki
OSCP Protocol Documentation
Initializing search
    bf-dev
    • Home
    • Process
    • Products
    • Reference
    • Decisions
    • Work
    • Operations
    bf-dev
    • Home
      • Process Handbook
      • BetterFleet Workflow Map
      • Product Development System
      • Product Engineering Workflow
        • Process Workflows
        • Work Intake and Weekly Planning
        • Product Engineering Workflow in Linear
        • Product Engineering Delivery
        • Agent Guidance
        • Workflow
        • Skills
        • Skill Sources
        • Process Guides
        • GitLab Feature Flags
        • In-App Docs Authoring
        • Release Notes
        • Process Templates
        • Release Plan: <title>
      • Process Publishing
      • Product overview
        • General Reference
          • Core Domain Training
          • System Topology
          • Two-Axis Ontology Model
          • Ontology Primer
          • Worked Example
          • Evidence, Ownership, and Lineage
          • Energy Management
          • Standards and Protocol Map
          • Charging, Roaming, and Commercial Model
          • Charge Planning and Operations
          • Cross-Cutting Domains
          • Domain Coverage Matrix
        • BetterFleet Product Ontology
        • Core Operations Data Ontology
        • BetterFleet R&D Plan
        • Index
        • Architecture
        • Manage Product Capabilities
        • Manage Data and State
        • Manage Service Interaction Flows
        • Manage Reference
        • Manage Internal Application Diagrams
          • Manage Authorization And Permissions
          • bf-manage-core Auth and Authorization Model
          • Manage Authorization and Permissions
          • bf-manage-web Auth and Permission Model
          • Manage Service Catalog
          • bf-depot-sim
          • bf-digital-twin (Manage Role)
          • bf-fleet-health
          • bf-manage-connect
          • bf-manage-core
          • bf-manage-incidents
          • bf-manage-roaming
          • bf-manage-sitepwrmon
          • bf-manage-web
          • bf-schedule-creator (Manage Role)
          • bf-support-microsite
          • bf-telematics
        • Index
        • Architecture
        • Plan Reference
        • Plan Internal Application Diagrams
        • Plan Migration and Flags
        • Plan Simulation Request Lifecycle
          • Plan Service Catalog
          • bf-bnl-schedule-analysis-compute
          • bf-bnl-settings
          • bf-bnl-ui
          • bf-digital-twin (Plan Role)
          • bf-route-modelling
          • bf-schedule-creator (Plan Role)
      • Where to Ask Product Questions
      • Reference
        • Platform Reference
        • Platform Architecture
        • Script Runtime Model
        • Compose Profiles and Modes
        • Repository Map
        • Monolithic Git Transition FAQ
        • Monolithic Git Sizing
        • CI and Release Integration
        • Shared Reference
        • Shared Infrastructure Architecture
        • Secrets and Env Strategy
        • Vendors and Local Dependencies
        • System Reference
        • Cloud Data Dependencies
        • Ports and URLs
        • Service Matrix
          • API Docs
          • OCPI API Docs
          • OCPP API Docs
          • OSCP API Docs
          • VDV API Docs
          • Yard State API Docs
        • System Design
        • System Design: BBA Microgrid Controller Generic Packet Translation
        • System Design: Depot Simulation
        • System Design: IoT Sensor Packet
        • System Design: Microgrid Energy Orchestration
          • System Design: OCPP Profile 3 And ISO 15118 PKI
          • Architecture: BetterFleet OCPP Profile 3 and ISO 15118 PKI
          • Specification: BetterFleet OCPP Profile 3 and ISO 15118 Certificate Lifecycle Management
          • System Design: On-Prem Control
          • Challenge
          • Specification: BetterFleet On-Prem Continuity Control
          • System Design: OSCP
          • OSCP Protocol Documentation
            • Purpose
            • Scope
            • Supplementary Resources
            • Roles
            • OSCP Conceptual Model
              • State Model
              • Communication Expectations
              • Registration
              • Connection
                • Starting a connection
                  • Expected sequence
                  • Timeout handling
                • Receiving a connection request
                  • Expected sequence
                  • Timeout handling
                • Maintaining a connection
                  • Outbound heartbeat
                  • Inbound heartbeat
                  • Heartbeat expiry
                  • Fallback-mode handover after offline detection
                  • Intentional disconnect handover
              • Capabilities
            • Runtime State Model
            • Implementation Constraints and Compromises
              • AWS Event Bridge Constraint
              • Timeouts
              • Self-Healing
          • Depot Sim Testing Requirements
          • System Design: OSCP Flexibility Provider Domain
      • Decisions
        • Architecture Decision Records
        • 0001 - Record architecture decisions
        • 0002 - Cognito for Authentication and Authorisation
        • 0003 - AWS Amplify for Authentication
        • 0004 - DynamoDB for default database
        • 0005 - Data Persistence
        • 0006 - Trunk-Based Development
        • 0007 - Generalised principle for automation
        • 0008 - Naming Repositories, Services, and URLs
        • 0009 - Use Timezone Aware DateTimes and UTC
        • 0010 - Use semantic release
        • 0011 - Centralized feature flag repository
        • 0012 - Use Named Exports in Storybook
        • 0013 - RESTful TITLE GraphQL
        • 0014 - Service Granularity
        • 0015 - Async/co-routine exception handling pattern
        • 0016 - Logging & log levels
        • 0017 - Instantiated Models
        • 0018 - Repository Pattern for Database Access
        • 0019 - Use of Design Tokens in TypeScript React Application
        • 0020 - API backwards compatibility and versioning
        • 0021 - Alembic Migration strategy
        • 0022 - Consistent react-hook-form usage
        • 0023 - Domain Event-Driven Architecture
        • 0024 - Domain Event Bus Tech Stack
        • 0025 - No enum types in DB table columns
        • 0026 - In-Memory Ormar Stores for Repository testing
        • 0027 - Storing Tab State in Query and Local Storage
        • 0028 - Adopt OpenTelemetry Semantic Conventions for Structured Logging
        • 0029 - Adopt RFC 9457 for HTTP Error Responses
        • 0030 - Use GitLab registry and Terraform state for ECS services
        • 0031 - Adopt DDD, Hexagonal Architecture, and CQRS for Python Domain Services
      • Work
        • Active Work
          • Work: Bba Microgrid Controller
          • Implementation Specification: BBA Microgrid Controller
          • BBA Microgrid Controller Deliverables (Stories)
          • Work: BFDev Monolithic Git
          • Challenge
          • Specification: BFDev Monolithic Git v2
          • BFDev Monolithic Git v2 Stories
          • Work: Complex Circuit Load Balancing
          • Implementation Specification: Complex Circuit Load Balancing
          • Complex Circuit Load Balancing Deliverables (Stories)
            • COR-10 and COR-11 Consolidation Review
          • Work: Dispatch Reliability and Reconciliation
          • Challenge
          • Specification: Dispatch Reliability and Reconciliation
          • Dispatch Reliability and Reconciliation (Unit User Stories)
            • Dispatch populated vehicle cards grey surface snapshot
            • Dispatch Visual Review
          • Work: Enable Scheduled Managed Charger Access
          • Challenge: Enable Scheduled Managed Charger Access
          • Specification Exploration Dossier: Enable Scheduled Managed Charger Access
          • Specification Review: Enable Scheduled Managed Charger Access
          • Specification: Enable Scheduled Managed Charger Access
          • Work: Guided Cut-Off and Release Orchestration
          • Specification: Guided Cut-Off and Release Orchestration
          • Guided Cut-Off and Release Orchestration (Unit User Stories)
          • Work: Production Deployment Validation
          • Challenge
          • Work: Scheduled Report Parity
          • Specification: Scheduled Report Parity
          • Work: Telematics
          • Telematics EventBridge Path
          • Telematics Ingress Architecture
          • Specification: Telematics Migration into bf-manage-core with 5-Minute Freshness and Health Visibility
          • Telematics Core Migration MVP (Implementation-Time BDD)
          • Work: Vector Derms
          • Implementation Specification: Vector DERMS
          • Vector DERMS Deliverables (Stories)
          • Work: Visiting Vehicle Charging Visibility
          • Specification: Visiting Vehicle Charging Visibility
          • Visiting Vehicle Charging Visibility (Unit User Stories)
          • Work: Workspace Owned Stripe Roaming
          • Specification: Workspace-Owned Stripe Credentials for Roaming Payments
        • Backlog Work
          • Work: Microgrid
          • Microgrid Backlog Stories
          • Work: Mobile Ops Companion
          • Challenge
          • Specification: Mobile Operations Companion v1
          • Mobile Operations Companion Deliverables (Stories)
          • Work: Oscp
          • OSCP Backlog Stories
        • Archived Work
          • Work: Code Canonical Orchestration
          • Challenge
          • Specification: Product Engineering Workflow
          • Product Engineering Workflow Deliverables (Unit User Stories)
          • Work: Release Notes Automation
          • Release Plan: Release Notes Automation
          • Release Notes Automation Backlog Stories
      • Operations
      • Onboarding Runbook
        • Operations Runbooks
        • Production Hotfix Release
        • Staging Hotfix Release
        • Manage Staging Release Validation
        • Terraform Plan Dry Runs
        • Operations Tooling
        • Code Indexing
        • Operations Evidence
        • Database Restoration Test Report
      • Daily Operations Runbook
      • Testing Guide
      • Troubleshooting
    • Purpose
    • Scope
    • Supplementary Resources
    • Roles
    • OSCP Conceptual Model
      • State Model
      • Communication Expectations
      • Registration
      • Connection
        • Starting a connection
          • Expected sequence
          • Timeout handling
        • Receiving a connection request
          • Expected sequence
          • Timeout handling
        • Maintaining a connection
          • Outbound heartbeat
          • Inbound heartbeat
          • Heartbeat expiry
          • Fallback-mode handover after offline detection
          • Intentional disconnect handover
      • Capabilities
    • Runtime State Model
    • Implementation Constraints and Compromises
      • AWS Event Bridge Constraint
      • Timeouts
      • Self-Healing
    1. Home
    2. Reference
    3. System design
    4. Oscp
    Shared Technical

    OSCP Protocol Documentation¶

    This document complements spec.md by describing the current implemented OSCP connection-lifecycle slice rather than the full conceptual OSCP design.

    Use spec.md as the source of truth for the broader architecture, domain boundaries, and future-shape OSCP model. Use this document for the current runtime FSM, handshake/timeout/heartbeat behaviour, implementation compromises in the delivered lifecycle flow, and the confirmed target behaviour that follows from lifecycle degradation into fallback mode or intentional disconnect.

    Purpose¶

    This document exists to:

    • define the BetterFleet interpretation of the OSCP connection lifecycle
    • describe the persisted runtime state model
    • document the handshake, timeout, and heartbeat logical flows
    • document how offline lifecycle detection hands off into fallback mode
    • document the intentional-disconnect handover back to non-OSCP control
    • capture the current implementation compromises

    Scope¶

    This document currently covers:

    • connection establishment
    • connection timeout handling
    • heartbeat handling and liveness
    • lifecycle-triggered fallback-mode behaviour
    • intentional disconnect handover
    • BetterFleet and CP simulator role boundaries used for development

    This document does not currently (but may in future) define:

    • registration behavior
    • capability / forecast payload semantics
    • long-term audit or event-sourcing strategy

    Supplementary Resources¶

    • OSCP specification: docs/reference/system-design/oscp/spec.md
    • Vector DERMS specification: docs/work/active/vector-derms/spec.md

    Roles¶

    For BetterFleet's current OSCP implementation:

    • BetterFleet acts as the flexibility provider, FP
    • the peer system acts as the capacity provider, CP

    Route ownership follows that split:

    • BetterFleet exposes FP routes such as /oscp/fp/2.0/...
    • the CP simulator exposes CP routes such as /oscp/cp/2.0/...

    The OSCP specification calls out the inclusion of a capacity optimizer, CO, which currently falls out of scope.

    OSCP Conceptual Model¶

    OSCP has three broad concerns:

    • Registration
    • Active connection lifecycle
    • Capabilities

    State Model¶

    The connection lifecycle is described through the following finite state machine.

    stateDiagram-v2
        direction LR
        UNREGISTERED --> NO_CONNECTION: Register (out of scope)
    
        NO_CONNECTION --> CONNECTING: Start connection (rx204)
        NO_CONNECTION --> ACCEPTING_CONNECTION: Receive connection request (tx204)
    
        CONNECTING --> CONNECTED: Receive acknowledgement (tx204)
        CONNECTING --> NO_CONNECTION: Timeout or protocol failure
    
        ACCEPTING_CONNECTION --> CONNECTED: Send acknowledgement (rx204)
        ACCEPTING_CONNECTION --> NO_CONNECTION: Timeout or protocol failure
    
        CONNECTED --> CONNECTED: receive Heartbeat within threshold
        CONNECTED --> NO_CONNECTION: Heartbeat threshold missed or protocol failure
    • Registration can only occur while in the UNREGISTERED state
    • Active connection lifecycle relates to all other states
    • Capabilities can only be enacted while in the CONNECTED state

    Communication Expectations¶

    Accepted OSCP requests in this slice are acknowledged with HTTP 204.

    Invalid or impossible protocol transitions are rejected with the appropriate HTTP error, for example 403 when the current runtime state does not allow the requested transition.

    Registration¶

    Registration exists conceptually but is out of scope for the current implementation.

    The intended end state is that registration owns creation of the persisted OSCP connection-state record.

    Under that model:

    • an OSCP connection must be registered before handshake or heartbeat lifecycle commands are valid
    • later lifecycle commands transition an existing registered record rather than creating one on first use
    • duplicate first-touch record creation races are treated as "already registered" conflicts rather than infrastructure failures

    Connection¶

    Connection consists of:

    • handshake initiation
    • handshake acknowledgement
    • heartbeat exchange
    • timeout / disconnect handling

    Crucially, communication sessions are initiated and handled using HTTP/S, meaning that the concept of an ongoing 'connection' is ephemeral. These are not websockets, but the protocol does imitate it.

    Starting a connection¶

    This is initiated by BetterFleet by sending a handshake to the capacity provider.

    Expected sequence¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
    
        FP->>State: Persist state = CONNECTING
        FP->>CP: Send Handshake
        CP-->>FP: HTTP 204
        CP->>FP: HandshakeAcknowledge
        FP-->>CP: HTTP 204
        FP->>State: Persist state = CONNECTED
        FP->>State: Set connection_started_at if first connection

    The outbound communication initialisation must also be robust to timeout.

    The timeout handling is necessary to cover two distinct failure modes:

    • the peer does not return the immediate HTTP 204 to the outbound Handshake
    • the peer returns 204, but never sends the later HandshakeAcknowledge
    Timeout handling¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
        participant Scheduler as Event Bridge
    
        FP->>State: Persist state = CONNECTING
        FP->>Scheduler: Create one-shot CONNECTING timeout
        FP->>CP: Send handshake
    
        alt handshake_acknowledge received before timeout
            CP-->>FP: HTTP 204
            CP->>FP: HandshakeAcknowledge
            FP-->>CP: HTTP 204
            FP->>State: Persist state = CONNECTED
            FP->>Scheduler: Cancel CONNECTING timeout
        else timeout event fires first
            Scheduler->>FP: Trigger OSCP_CONNECTION_TIMEOUT
            FP->>State: Re-check expected state + state_updated_at
            FP->>State: Disconnect/reset state
            FP->>Scheduler: Delete timeout event
        end

    Receiving a connection request¶

    This is initiated by the capacity provider by sending a handshake to BetterFleet.

    The acknowledgement is deliberately delayed onto an in-process async follow-up task so BetterFleet can (nearly) guarantee that the outbound HTTP 204 is returned before the outbound HandshakeAcknowledge is sent.

    Expected sequence¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
        participant BG as Async follow-up task
    
        CP->>FP: Handshake
        FP->>State: Persist state = ACCEPTING_CONNECTION
        FP->>BG: Schedule delayed HandshakeAcknowledge
        FP-->>CP: HTTP 204
        BG->>CP: Send HandshakeAcknowledge
        CP-->>FP: HTTP 204
        FP->>State: Persist state = CONNECTED
        FP->>State: Set connection_started_at if first connection

    The inbound communication acceptance must also be robust to timeout.

    The timeout handling is necessary to cover the case where the peer never returns the expected HTTP 204 to BetterFleet's outbound HandshakeAcknowledge.

    More information at end.

    Timeout handling¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
        participant Scheduler as Event Bridge
        participant BG as Async follow-up task
    
        CP->>FP: Send handshake
        FP->>State: Persist state = ACCEPTING_CONNECTION
        FP->>Scheduler: Create one-shot ACCEPTING_CONNECTION timeout
        FP->>BG: Schedule delayed HandshakeAcknowledge
        FP-->>CP: HTTP 204
    
        alt delayed HandshakeAcknowledge succeeds before timeout
            BG->>CP: Send HandshakeAcknowledge
            CP-->>FP: HTTP 204
            FP->>State: Persist state = CONNECTED
            FP->>Scheduler: Cancel ACCEPTING_CONNECTION timeout
        else timeout event fires first
            Scheduler->>FP: Trigger ACCEPTING_CONNECTION timeout
            FP->>State: Re-check expected state + state_updated_at
            FP->>State: Disconnect/reset state
            FP->>Scheduler: Delete timeout event
        end

    Maintaining a connection¶

    This is done through the use of heartbeats.

    • Inbound heartbeat is accepted only when the connection is effectively CONNECTED.
    • A valid heartbeat does not trigger a state transition.
    • A valid heartbeat updates heartbeat_expires_at.
    • Once heartbeat_expires_at has passed, the connection is effectively offline.
    • Heartbeat liveness uses a separate stale grace window so acceptance is not tied exactly to the raw heartbeat interval.
    Outbound heartbeat¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
        participant Scheduler as Event Bridge
    
        FP->>State: Persist state = CONNECTED
        FP->>Scheduler: Create recurring heartbeat schedule
    
        loop While connected
            Scheduler->>FP: Trigger heartbeat
            FP->>CP: Send heartbeat
        end
    Inbound heartbeat¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
    
        loop While connected
            CP-->>FP: Heartbeat with offline_mode_at
            FP->>State: Persist heartbeat_expires_at
        end
    Heartbeat expiry¶
    sequenceDiagram
        participant CP as Capacity Provider
        participant FP as BetterFleet
        participant State as DB State Store
        participant Scheduler as Event Bridge
    
        alt heartbeat_expires_at has passed
            FP->>State: Interpret connection as offline
            FP->>State: Disconnect/reset state
            FP->>Policy: Resolve fallback or gap-policy constraint for now
            FP->>Ops: Show yellow fallback mode + create notification/incident
            FP->>Scheduler: Delete heartbeat schedule
        end
    Fallback-mode handover after offline detection¶
    • Fallback mode is not a separate OSCP connection state. It is an operator-visible operating mode layered on top of an offline or degraded connection.
    • When heartbeat expiry or equivalent offline detection occurs, BetterFleet transitions the OSCP connection to its offline lifecycle state and then resolves the managed-scope constraint to apply next.
    • If valid fallback forecast coverage exists for now, BetterFleet activates fallback-derived constraint state for the mapped managed scope.
    • If no matching fallback coverage exists for now, BetterFleet applies the configured gap policy. The current supported selectable option is the existing circuit safe default or non-OSCP path.
    • In both cases, BetterFleet surfaces a yellow fallback mode rather than a red fail-safe alarm and creates notification and incident context for operators.
    • When the connection is restored and valid primary forecast coverage resumes, BetterFleet exits fallback mode and restores the non-fallback OSCP-controlled path unless a newer accepted forecast supersedes it.
    Intentional disconnect handover¶
    sequenceDiagram
        participant User as Operator
        participant FP as BetterFleet
        participant State as DB State Store
        participant MG as Managed Scope / Compatibility Path
        participant Ops as Operator Surface
    
        User->>FP: Disconnect OSCP connection
        FP->>Ops: Warn that active OSCP constraints will be cleared
        User->>FP: Confirm disconnect
        FP->>State: Mark connection disconnected / reset lifecycle state
        FP->>MG: Withdraw active OSCP forecast and fallback envelopes
        FP->>Ops: Exit fallback mode if active
        FP->>Ops: Show local control resumed / non-OSCP path active

    Capabilities¶

    Capability exchange and forecast actions are later slices. They depend on the connection model being stable first.

    Runtime State Model¶

    The current persisted runtime FSM uses four states:

    • NO_CONNECTION: no active protocol session exists.
    • CONNECTING: BetterFleet initiated connection establishment by sending Handshake; the connection is not yet established and BetterFleet is waiting for inbound HandshakeAcknowledge.
    • ACCEPTING_CONNECTION: the remote party initiated connection establishment; BetterFleet accepted the inbound handshake path and the connection is not yet established while the acknowledgement exchange completes.
    • CONNECTED: the handshake exchange is complete and the session remains live until heartbeat expectations are violated or a protocol failure occurs.

    Fallback mode is not an additional persisted FSM state in this document. It is an operational mode that can be active while the connection lifecycle is effectively offline or intentionally disconnected.

    Implementation Constraints and Compromises¶

    AWS Event Bridge Constraint¶

    AWS Scheduler recurring schedules are minute-granularity. BetterFleet therefore:

    • keeps the protocol heartbeat preference in seconds during handshake communication
    • rounds scheduler cadence up to whole minutes for recurring outbound heartbeat schedules
    • rounds connection-timeout scheduling up to whole minutes for one-shot timeout events

    This is an implementation compromise, not a protocol ideal.

    Timeouts¶

    BetterFleet uses connection timeouts as a lifecycle failsafe rather than as an OSCP-defined protocol state.

    This exists to ensure that transient failures do not leave the local lifecycle stuck in CONNECTING or ACCEPTING_CONNECTION forever.

    Once BetterFleet has already sent an outbound request and is waiting for HTTP 204, lack of response is handled through the transport timeout / request failure path rather than by waiting for the lifecycle timeout.

    This means that there is a race condition that exists, whereby a HandshakeAcknowledge and the paired timeout are processed simultaneously, leading to an unclear end-state depending on processing order.

    • If the timeout is processed first, then the connection is killed and the acknowledgement fails.
    • If the acknowledgement is processed first, then the connection is made and the timeout fails.

    This is intended behaviour as both events could be processed by different instances/servers, meaning they rely exclusively on the deterministic transitions of the finite state model (e.g., acknowledgement does nothing in the not connected state, etc.), meaning this is a safe mechanism.

    It is also important to note that the CP who sent the acknowledgment is attempting to make a connection with BetterFleet, whereas the timeout behaviour is purely a failsafe, not part of the OSCP spec, hence why it is acceptable for the connection to be made even after the timeout window has formally passed.

    Self-Healing¶

    In practice, the current lifecycle is intended to be self-healing. This means that if states become misaligned (perhaps through an uncaught race condition), this misalignment will only persist for a short period of time as each peer will quickly realise that there is no live connection, and so the connection process can recommence from a known good state.

    Made with Material for MkDocs
    BFDev Docs Assistant
    New conversation?
    Ask one focused question at a time, this helps the assistant provide accurate answers about what's been implemented in BetterFleet.