Google Sign-In Required

Use your company Google account to access the BetterFleet private content.

Back to private home

BetterFleet Support Private
Skip to content
BetterFleet Dev Wiki
Production Hotfix Release
Initializing search
    bf-dev
    • Home
    • Process
    • Products
    • Reference
    • Decisions
    • Work
    • Operations
    bf-dev
    • Home
      • Process Handbook
      • BetterFleet Workflow Map
      • Product Development System
      • Product Engineering Workflow
        • Process Workflows
        • Work Intake and Weekly Planning
        • Product Engineering Workflow in Linear
        • Product Engineering Delivery
        • Agent Guidance
        • Workflow
        • Skills
        • Skill Sources
        • Process Guides
        • GitLab Feature Flags
        • In-App Docs Authoring
        • Release Notes
        • Process Templates
        • Release Plan: <title>
      • Process Publishing
      • Product overview
        • General Reference
          • Core Domain Training
          • System Topology
          • Two-Axis Ontology Model
          • Ontology Primer
          • Worked Example
          • Evidence, Ownership, and Lineage
          • Energy Management
          • Standards and Protocol Map
          • Charging, Roaming, and Commercial Model
          • Charge Planning and Operations
          • Cross-Cutting Domains
          • Domain Coverage Matrix
        • BetterFleet Product Ontology
        • Core Operations Data Ontology
        • BetterFleet R&D Plan
        • Index
        • Architecture
        • Manage Product Capabilities
        • Manage Data and State
        • Manage Service Interaction Flows
        • Manage Reference
        • Manage Internal Application Diagrams
          • Manage Authorization And Permissions
          • bf-manage-core Auth and Authorization Model
          • Manage Authorization and Permissions
          • bf-manage-web Auth and Permission Model
          • Manage Service Catalog
          • bf-depot-sim
          • bf-digital-twin (Manage Role)
          • bf-fleet-health
          • bf-manage-connect
          • bf-manage-core
          • bf-manage-incidents
          • bf-manage-roaming
          • bf-manage-sitepwrmon
          • bf-manage-web
          • bf-schedule-creator (Manage Role)
          • bf-support-microsite
          • bf-telematics
        • Index
        • Architecture
        • Plan Reference
        • Plan Internal Application Diagrams
        • Plan Migration and Flags
        • Plan Simulation Request Lifecycle
          • Plan Service Catalog
          • bf-bnl-schedule-analysis-compute
          • bf-bnl-settings
          • bf-bnl-ui
          • bf-digital-twin (Plan Role)
          • bf-route-modelling
          • bf-schedule-creator (Plan Role)
      • Where to Ask Product Questions
      • Reference
        • Platform Reference
        • Platform Architecture
        • Script Runtime Model
        • Compose Profiles and Modes
        • Repository Map
        • Monolithic Git Transition FAQ
        • Monolithic Git Sizing
        • CI and Release Integration
        • Shared Reference
        • Shared Infrastructure Architecture
        • Secrets and Env Strategy
        • Vendors and Local Dependencies
        • System Reference
        • Cloud Data Dependencies
        • Ports and URLs
        • Service Matrix
          • API Docs
          • OCPI API Docs
          • OCPP API Docs
          • OSCP API Docs
          • VDV API Docs
          • Yard State API Docs
        • System Design
        • System Design: BBA Microgrid Controller Generic Packet Translation
        • System Design: Depot Simulation
        • System Design: IoT Sensor Packet
        • System Design: Microgrid Energy Orchestration
          • System Design: OCPP Profile 3 And ISO 15118 PKI
          • Architecture: BetterFleet OCPP Profile 3 and ISO 15118 PKI
          • Specification: BetterFleet OCPP Profile 3 and ISO 15118 Certificate Lifecycle Management
          • System Design: On-Prem Control
          • Challenge
          • Specification: BetterFleet On-Prem Continuity Control
          • System Design: OSCP
          • OSCP Protocol Documentation
          • Depot Sim Testing Requirements
          • System Design: OSCP Flexibility Provider Domain
      • Decisions
        • Architecture Decision Records
        • 0001 - Record architecture decisions
        • 0002 - Cognito for Authentication and Authorisation
        • 0003 - AWS Amplify for Authentication
        • 0004 - DynamoDB for default database
        • 0005 - Data Persistence
        • 0006 - Trunk-Based Development
        • 0007 - Generalised principle for automation
        • 0008 - Naming Repositories, Services, and URLs
        • 0009 - Use Timezone Aware DateTimes and UTC
        • 0010 - Use semantic release
        • 0011 - Centralized feature flag repository
        • 0012 - Use Named Exports in Storybook
        • 0013 - RESTful TITLE GraphQL
        • 0014 - Service Granularity
        • 0015 - Async/co-routine exception handling pattern
        • 0016 - Logging & log levels
        • 0017 - Instantiated Models
        • 0018 - Repository Pattern for Database Access
        • 0019 - Use of Design Tokens in TypeScript React Application
        • 0020 - API backwards compatibility and versioning
        • 0021 - Alembic Migration strategy
        • 0022 - Consistent react-hook-form usage
        • 0023 - Domain Event-Driven Architecture
        • 0024 - Domain Event Bus Tech Stack
        • 0025 - No enum types in DB table columns
        • 0026 - In-Memory Ormar Stores for Repository testing
        • 0027 - Storing Tab State in Query and Local Storage
        • 0028 - Adopt OpenTelemetry Semantic Conventions for Structured Logging
        • 0029 - Adopt RFC 9457 for HTTP Error Responses
        • 0030 - Use GitLab registry and Terraform state for ECS services
        • 0031 - Adopt DDD, Hexagonal Architecture, and CQRS for Python Domain Services
      • Work
        • Active Work
          • Work: Bba Microgrid Controller
          • Implementation Specification: BBA Microgrid Controller
          • BBA Microgrid Controller Deliverables (Stories)
          • Work: BFDev Monolithic Git
          • Challenge
          • Specification: BFDev Monolithic Git v2
          • BFDev Monolithic Git v2 Stories
          • Work: Complex Circuit Load Balancing
          • Implementation Specification: Complex Circuit Load Balancing
          • Complex Circuit Load Balancing Deliverables (Stories)
            • COR-10 and COR-11 Consolidation Review
          • Work: Dispatch Reliability and Reconciliation
          • Challenge
          • Specification: Dispatch Reliability and Reconciliation
          • Dispatch Reliability and Reconciliation (Unit User Stories)
            • Dispatch populated vehicle cards grey surface snapshot
            • Dispatch Visual Review
          • Work: Enable Scheduled Managed Charger Access
          • Challenge: Enable Scheduled Managed Charger Access
          • Specification Exploration Dossier: Enable Scheduled Managed Charger Access
          • Specification Review: Enable Scheduled Managed Charger Access
          • Specification: Enable Scheduled Managed Charger Access
          • Work: Guided Cut-Off and Release Orchestration
          • Specification: Guided Cut-Off and Release Orchestration
          • Guided Cut-Off and Release Orchestration (Unit User Stories)
          • Work: Production Deployment Validation
          • Challenge
          • Work: Scheduled Report Parity
          • Specification: Scheduled Report Parity
          • Work: Telematics
          • Telematics EventBridge Path
          • Telematics Ingress Architecture
          • Specification: Telematics Migration into bf-manage-core with 5-Minute Freshness and Health Visibility
          • Telematics Core Migration MVP (Implementation-Time BDD)
          • Work: Vector Derms
          • Implementation Specification: Vector DERMS
          • Vector DERMS Deliverables (Stories)
          • Work: Visiting Vehicle Charging Visibility
          • Specification: Visiting Vehicle Charging Visibility
          • Visiting Vehicle Charging Visibility (Unit User Stories)
          • Work: Workspace Owned Stripe Roaming
          • Specification: Workspace-Owned Stripe Credentials for Roaming Payments
        • Backlog Work
          • Work: Microgrid
          • Microgrid Backlog Stories
          • Work: Mobile Ops Companion
          • Challenge
          • Specification: Mobile Operations Companion v1
          • Mobile Operations Companion Deliverables (Stories)
          • Work: Oscp
          • OSCP Backlog Stories
        • Archived Work
          • Work: Code Canonical Orchestration
          • Challenge
          • Specification: Product Engineering Workflow
          • Product Engineering Workflow Deliverables (Unit User Stories)
          • Work: Release Notes Automation
          • Release Plan: Release Notes Automation
          • Release Notes Automation Backlog Stories
      • Operations
      • Onboarding Runbook
        • Operations Runbooks
        • Production Hotfix Release
          • 3AM Quick Guide
          • Purpose And When To Use It
          • Prerequisites And Permissions
          • Normal Procedure
          • Reference Screenshots
            • Steps 1 And 7: Check The Environments Page Before And After The Release
            • Step 3: Cherry-Pick Only The Approved Hotfix Into release
            • Steps 5 And 6: Start The Guided Release Invocation
          • Decision Points And Exceptions
          • Validation And Evidence
          • Rollback And Recovery
          • Links To Service-Specific Details
        • Staging Hotfix Release
        • Manage Staging Release Validation
        • Terraform Plan Dry Runs
        • Operations Tooling
        • Code Indexing
        • Operations Evidence
        • Database Restoration Test Report
      • Daily Operations Runbook
      • Testing Guide
      • Troubleshooting
    • 3AM Quick Guide
    • Purpose And When To Use It
    • Prerequisites And Permissions
    • Normal Procedure
    • Reference Screenshots
      • Steps 1 And 7: Check The Environments Page Before And After The Release
      • Step 3: Cherry-Pick Only The Approved Hotfix Into release
      • Steps 5 And 6: Start The Guided Release Invocation
    • Decision Points And Exceptions
    • Validation And Evidence
    • Rollback And Recovery
    • Links To Service-Specific Details
    1. Home
    2. Operations
    3. Runbooks
    Operations general

    Production Hotfix Release¶

    3AM Quick Guide¶

    • Use this when production needs one urgent fix and the current release branch may contain unreleased staging changes.
    • Record the currently deployed production SHA from the target repository's Operate -> Environments page.
    • Restore release back to what production is already running, commit that reset, and push it. This replaces the files in your local release checkout with the files from the production commit so your next commit makes release match production again.
    git restore --source="${PRODUCTION_SHA1}" --staged --worktree .
    git commit --allow-empty --message "fix: restore release to last production state"
    git push origin release
    git fetch origin release
    git diff "${PRODUCTION_SHA1}"..origin/release
    # output should be empty
    
    • After that clean baseline is on origin/release, cherry-pick the approved hotfix into release using the GitLab UI.
    • The release pipeline is manually activated, so it is safe to push, revert, and cherry-pick on release until you are ready to start the release.
    • Open the orchestration pipeline and set action=deploy-release-branch-to-staging, project=<target repository>, and confirm=true: https://gitlab.com/evenergi/develop/-/pipelines/new
    • Run the downstream manual production deployment pipeline as well. Both gates are manual by design for safety and control.
    • Do not choose bring-all-code-to-release-branch. That can repopulate release with unrelated unreleased changes.
    • Watch the production deploy settle, run the focused smoke checks, and update the GitLab incident record.

    Purpose And When To Use It¶

    This runbook explains how to rebuild the release branch from the last known production deployment, push that clean baseline, cherry-pick a merged hotfix onto it, and trigger a production release for the affected repository.

    Use this workflow when production needs an urgent fix and the current release branch may already contain unreleased staging changes that must not be promoted with the hotfix.

    Do not use this workflow for the normal weekly release, for routine staging promotion, or when the issue can wait for the standard release cadence.

    Prerequisites And Permissions¶

    Before using this workflow:

    • the hotfix merge request must already be merged and approved for production
    • you must be able to identify the last known-good production deployment commit
    • you must have permission to push to the release branch
    • you must have permission to create and run the production release pipeline and the downstream manual production deployment pipeline
    • you must have access to the target service's production environment page, logs, and health checks
    • you must have permission to create or update the GitLab incident record
    • you must know the smoke checks or focused validations required after the production deploy

    In the current BetterFleet setup, operators typically inspect the deployed production SHA from the target repository's Operate -> Environments page, and they trigger the shared orchestration pipeline from:

    • https://gitlab.com/evenergi/develop/-/pipelines/new

    For a selective production hotfix, use these guided inputs in bf-dev:

    • action=deploy-release-branch-to-staging
    • project=<target repository>
    • confirm=true

    This BFDev action starts the selected repository's release-branch pipeline. The production promotion itself still happens in that downstream repository's manual production deployment step.

    Normal Procedure¶

    1. Capture the last known-good production SHA. Open the target repository's production environment and record the commit currently deployed to production.
    2. Restore, commit, and push release from that production SHA. Use a clean local checkout to replace the files on release with the files from the recorded production commit, then create and push a new commit that represents the rollback to the last production state. The git restore command below does not change which commit release points to yet. It simply swaps the checked-out files to match PRODUCTION_SHA1 so the next commit recreates that last known-good production state on release.
    export PRODUCTION_SHA1="<last-production-sha>"
    
    git fetch --all
    git checkout -B release origin/release
    
    git restore --source="${PRODUCTION_SHA1}" --staged --worktree .
    git commit --allow-empty --message "fix: restore release to last production state"
    
    git push origin release
    git fetch origin release
    
    git diff "${PRODUCTION_SHA1}"..origin/release
    # output should be empty
    
    1. Cherry-pick the merged hotfix into release. After the restore commit is on origin/release, use the GitLab merge request cherry-pick action so only the approved hotfix is layered on top of the restored production baseline. The release pipeline is manual, so pushing the restore commit first is safe and gives you a clean point from which to cherry-pick, revert, or retry as needed before releasing.
    2. Verify the release branch contents. Confirm that origin/release now contains the restore commit plus only the intended hotfix commit or commits.
    3. Create a new orchestration pipeline. Open the shared orchestration pipeline page and start a fresh pipeline with action=deploy-release-branch-to-staging, project=<target repository>, and confirm=true.
    4. Do not choose the cut-off action. bring-all-code-to-release-branch syncs the default branch into release, which can pull in unrelated unreleased changes and defeat the purpose of this recovery path.
    5. Run and watch the manual production deployment pipeline through to completion. The downstream production deployment pipeline is manual as well, for the same safety and control reasons as the guided release invocation. Trigger only the intended production deployment path, then follow the downstream pipeline, environment page, service logs, and health checks until the deployment settles.
    6. Validate the production hotfix. Run the focused production smoke checks needed for the incident or defect.
    7. Record the operational incident. Create or update the GitLab incident with the problem, timeline, deployed fix, validation result, and any remaining follow-up actions.

    Reference Screenshots¶

    These screenshots use one example service and repository. The Run new pipeline image now reflects the current guided action / project / confirm entrypoint.

    Steps 1 And 7: Check The Environments Page Before And After The Release¶

    Use Operate -> Environments to identify the last known-good production deployment before restoring release, and then revisit the same page after the release to confirm production now points at the intended hotfix deployment.

    GitLab environments view showing the production and staging deployment history

    Step 3: Cherry-Pick Only The Approved Hotfix Into release¶

    What To Check Screenshot
    Start from the merged merge request and use the Cherry-pick action. Merged merge request with the Cherry-pick action highlighted
    In the cherry-pick dialog, target the release branch so the hotfix lands on the restored production baseline. Cherry-pick dialog targeting the release branch

    Steps 5 And 6: Start The Guided Release Invocation¶

    Use the orchestration pipeline with action=deploy-release-branch-to-staging, project=<target repository>, and confirm=true. This starts the selected repository's release-branch pipeline; production promotion still happens in the downstream manual production deployment step. The key safety check is to avoid bring-all-code-to-release-branch, because that can repopulate release with unrelated unreleased changes. The screenshot below shows the current Run new pipeline screen with the guided inputs ready for submission.

    Run new pipeline screen showing the guided action, project, and confirm inputs

    Decision Points And Exceptions¶

    • If the current release branch already matches the last production state and contains no unreleased changes, you can skip the restore step and promote only the hotfix.
    • If the hotfix depends on unreleased release-branch changes, stop and escalate the decision. A selective production hotfix is no longer the right workflow.
    • If the production environment page does not clearly show the last deployed SHA, stop and gather evidence from deploy logs, release records, or the last successful production pipeline before changing release.
    • If the cherry-pick conflicts, resolve locally, inspect the final tree carefully, and push only after confirming that unrelated changes were not pulled into release.
    • If multiple repositories are involved, repeat the guided release invocation and validation steps for each affected repository and record which targets were changed.
    • If the issue turns out to be operational rather than code-related, stop using this runbook and switch to the service-specific incident or infrastructure recovery path.

    Validation And Evidence¶

    Treat the production hotfix as complete only when you can point to evidence for each of these checks:

    • the last known-good production SHA was recorded before the restore
    • the git diff "${PRODUCTION_SHA1}"..release check was empty immediately after the restore commit
    • the release branch contains only the restore step plus the intended hotfix
    • the orchestration pipeline used action=deploy-release-branch-to-staging, the intended project, and confirm=true
    • bring-all-code-to-release-branch was not selected for the hotfix
    • the production deployment completed successfully for the target service
    • the production environment reflects the intended hotfix commit or build
    • the focused production smoke checks passed
    • the GitLab incident records the timeline, fix, validation evidence, and any follow-up work

    Rollback And Recovery¶

    If the production hotfix fails or causes a regression:

    • repeat the release-branch restore procedure using the last known-good production SHA
    • push the restored release branch
    • rerun the orchestration pipeline with action=deploy-release-branch-to-staging, the same target project, and confirm=true
    • recheck production health, smoke tests, and monitoring
    • update the GitLab incident with the rollback time and observed outcome

    If the service does not recover after the code rollback, stop using this runbook and follow the service-specific incident, infrastructure, or data recovery documentation.

    Links To Service-Specific Details¶

    • Shared CI and release context: CI and Release Integration
    • BetterFleet service lookup: Service Matrix
    • BetterFleet Manage service docs: Manage Services
    • BetterFleet Plan service docs: Plan Services
    • GitLab incident template: https://gitlab.com/evenergi/develop/-/issues/new?issuable_template=incident&type=INCIDENT
    • Target repository details: the target repository's .gitlab-ci.yml, release jobs, environment page, and service-specific operational docs
    Made with Material for MkDocs
    BFDev Docs Assistant
    New conversation?
    Ask one focused question at a time, this helps the assistant provide accurate answers about what's been implemented in BetterFleet.