Google Sign-In Required

Use your company Google account to access the BetterFleet private content.

Back to private home

BetterFleet Support Private
Skip to content
BetterFleet Dev Wiki
Production Hotfix Release
Initializing search
    bf-dev
    • Home
    • Product Capabilities
    • Process
    • Current Work
    • System Design
    • Software Reference
    • Operations
    bf-dev
    • Home
      • Overview
      • Manage
      • Overview
      • Product Engineering Workflow
      • Product Engineering Delivery
      • Product Engineering Workflow in Linear
        • GitLab Feature Flags
        • In-App Docs Authoring
        • Release Notes
      • Templates
      • Publishing
      • Workflow Companions
      • Overview
      • Active Artifacts
      • Backlog Artifacts
      • Archived Artifacts
      • Overview
      • Microgrid
      • OSCP
        • Challenge
        • Specification
        • Spec
        • Architecture
        • Overview
        • Script Runtime Model
        • Compose Profiles and Modes
        • Repo Topology
        • CI and Release Integration
        • Overview
        • Internal Application Diagrams
          • Overview
          • Web Model
          • Core Model
        • Service Interaction Flows
        • Data and State
          • Index
          • bf-manage-web
          • bf-manage-core
          • bf-manage-connect
          • bf-manage-sitepwrmon
          • bf-manage-incidents
          • bf-telematics
          • bf-depot-sim
          • bf-manage-roaming
          • bf-support-microsite
          • bf-digital-twin
          • bf-schedule-creator
        • Overview
        • Internal Application Diagrams
        • Migration and Flags
        • Simulation Request Lifecycle
          • Index
          • bf-bnl-ui
          • bf-bnl-settings
          • bf-bnl-schedule-analysis-compute
          • bf-route-modelling
          • bf-schedule-creator
          • bf-digital-twin
        • Overview
        • Secrets and Env Strategy
        • Vendors and Local Dependencies
        • ADRs
        • Service Matrix
        • Cloud Dependencies
        • Ports and URLs
      • Onboarding
      • Daily Operations Runbook
        • Overview
        • Staging Hotfix Release
        • Production Hotfix Release
          • 3AM Quick Guide
          • Purpose And When To Use It
          • Prerequisites And Permissions
          • Normal Procedure
          • Reference Screenshots
            • Steps 1 And 8: Check The Environments Page Before And After The Release
            • Step 3: Cherry-Pick Only The Approved Hotfix Into release
            • Steps 5 Through 7: Run The Service Release Job
          • Decision Points And Exceptions
          • Validation And Evidence
          • Rollback And Recovery
          • Links To Service-Specific Details
        • Terraform Plan Dry Runs
      • Troubleshooting
      • Testing Guide
    • 3AM Quick Guide
    • Purpose And When To Use It
    • Prerequisites And Permissions
    • Normal Procedure
    • Reference Screenshots
      • Steps 1 And 8: Check The Environments Page Before And After The Release
      • Step 3: Cherry-Pick Only The Approved Hotfix Into release
      • Steps 5 Through 7: Run The Service Release Job
    • Decision Points And Exceptions
    • Validation And Evidence
    • Rollback And Recovery
    • Links To Service-Specific Details

    Production Hotfix Release¶

    3AM Quick Guide¶

    • Use this when production needs one urgent fix and the current release branch may contain unreleased staging changes.
    • Record the currently deployed production SHA from the target repository's Operate -> Environments page.
    • Restore release back to what production is already running, commit that reset, and push it. This replaces the files in your local release checkout with the files from the production commit so your next commit makes release match production again.
    git restore --source="${PRODUCTION_SHA1}" --staged --worktree .
    git commit --allow-empty --message "fix: restore release to last production state"
    git push origin release
    git fetch origin release
    git diff "${PRODUCTION_SHA1}"..origin/release
    # output should be empty
    
    • After that clean baseline is on origin/release, cherry-pick the approved hotfix into release using the GitLab UI.
    • The release pipeline is manually activated, so it is safe to push, revert, and cherry-pick on release until you are ready to start the release.
    • Open the orchestration pipeline and run only release <service name>: https://gitlab.com/evenergi/develop/-/pipelines/new
    • Run the downstream manual production deployment pipeline as well. Both gates are manual by design for safety and control.
    • Do not run cut-off. That can repopulate release with unrelated unreleased changes.
    • Watch the production deploy settle, run the focused smoke checks, and update the GitLab incident record.

    Purpose And When To Use It¶

    This runbook explains how to rebuild the release branch from the last known production deployment, push that clean baseline, cherry-pick a merged hotfix onto it, and trigger a production release for the affected service.

    Use this workflow when production needs an urgent fix and the current release branch may already contain unreleased staging changes that must not be promoted with the hotfix.

    Do not use this workflow for the normal weekly release, for routine staging promotion, or when the issue can wait for the standard release cadence.

    Prerequisites And Permissions¶

    Before using this workflow:

    • the hotfix merge request must already be merged and approved for production
    • you must be able to identify the last known-good production deployment commit
    • you must have permission to push to the release branch
    • you must have permission to create and run the production release pipeline and the downstream manual production deployment pipeline
    • you must have access to the target service's production environment page, logs, and health checks
    • you must have permission to create or update the GitLab incident record
    • you must know the smoke checks or focused validations required after the production deploy

    In the current BetterFleet setup, operators typically inspect the deployed production SHA from the target repository's Operate -> Environments page, and they trigger the shared orchestration pipeline from:

    • https://gitlab.com/evenergi/develop/-/pipelines/new

    The service-specific manual job usually follows the pattern release <service name>, for example 01 release vemo-core.

    Normal Procedure¶

    1. Capture the last known-good production SHA. Open the target repository's production environment and record the commit currently deployed to production.
    2. Restore, commit, and push release from that production SHA. Use a clean local checkout to replace the files on release with the files from the recorded production commit, then create and push a new commit that represents the rollback to the last production state. The git restore command below does not change which commit release points to yet. It simply swaps the checked-out files to match PRODUCTION_SHA1 so the next commit recreates that last known-good production state on release.
    export PRODUCTION_SHA1="<last-production-sha>"
    
    git fetch --all
    git checkout -B release origin/release
    
    git restore --source="${PRODUCTION_SHA1}" --staged --worktree .
    git commit --allow-empty --message "fix: restore release to last production state"
    
    git push origin release
    git fetch origin release
    
    git diff "${PRODUCTION_SHA1}"..origin/release
    # output should be empty
    
    1. Cherry-pick the merged hotfix into release. After the restore commit is on origin/release, use the GitLab merge request cherry-pick action so only the approved hotfix is layered on top of the restored production baseline. The release pipeline is manual, so pushing the restore commit first is safe and gives you a clean point from which to cherry-pick, revert, or retry as needed before releasing.
    2. Verify the release branch contents. Confirm that origin/release now contains the restore commit plus only the intended hotfix commit or commits.
    3. Create a new orchestration pipeline. Open the shared orchestration pipeline page and start a fresh pipeline for the release flow.
    4. Run the service release job. Select the manual job for the target service, usually named release <service name>.
    5. Do not run the cut-off job. The cut-off job syncs the default branch into release, which can pull in unrelated unreleased changes and defeat the purpose of this recovery path.
    6. Run and watch the manual production deployment pipeline through to completion. The downstream production deployment pipeline is manual as well, for the same safety and control reasons as the release pipeline. Trigger only the intended production deployment path, then follow the downstream pipeline, environment page, service logs, and health checks until the deployment settles.
    7. Validate the production hotfix. Run the focused production smoke checks needed for the incident or defect.
    8. Record the operational incident. Create or update the GitLab incident with the problem, timeline, deployed fix, validation result, and any remaining follow-up actions.

    Reference Screenshots¶

    These screenshots use one example service and repository. Your target project name and release <service name> job label may differ.

    Steps 1 And 8: Check The Environments Page Before And After The Release¶

    Use Operate -> Environments to identify the last known-good production deployment before restoring release, and then revisit the same page after the release to confirm production now points at the intended hotfix deployment.

    GitLab environments view showing the production and staging deployment history

    Step 3: Cherry-Pick Only The Approved Hotfix Into release¶

    What To Check Screenshot
    Start from the merged merge request and use the Cherry-pick action. Merged merge request with the Cherry-pick action highlighted
    In the cherry-pick dialog, target the release branch so the hotfix lands on the restored production baseline. Cherry-pick dialog targeting the release branch

    Steps 5 Through 7: Run The Service Release Job¶

    Use the orchestration pipeline to run the release <service name> job for the affected service. The key safety check is to choose the service release job and not the cut-off job, because cut-off can repopulate release with unrelated unreleased changes.

    Release-stage pipeline showing a service-specific release job

    Decision Points And Exceptions¶

    • If the current release branch already matches the last production state and contains no unreleased changes, you can skip the restore step and promote only the hotfix.
    • If the hotfix depends on unreleased release-branch changes, stop and escalate the decision. A selective production hotfix is no longer the right workflow.
    • If the production environment page does not clearly show the last deployed SHA, stop and gather evidence from deploy logs, release records, or the last successful production pipeline before changing release.
    • If the cherry-pick conflicts, resolve locally, inspect the final tree carefully, and push only after confirming that unrelated changes were not pulled into release.
    • If multiple services are involved, repeat the release and validation steps for each affected service and record which services were changed.
    • If the issue turns out to be operational rather than code-related, stop using this runbook and switch to the service-specific incident or infrastructure recovery path.

    Validation And Evidence¶

    Treat the production hotfix as complete only when you can point to evidence for each of these checks:

    • the last known-good production SHA was recorded before the restore
    • the git diff "${PRODUCTION_SHA1}"..release check was empty immediately after the restore commit
    • the release branch contains only the restore step plus the intended hotfix
    • the correct service release job was run, not cut-off
    • the production deployment completed successfully for the target service
    • the production environment reflects the intended hotfix commit or build
    • the focused production smoke checks passed
    • the GitLab incident records the timeline, fix, validation evidence, and any follow-up work

    Rollback And Recovery¶

    If the production hotfix fails or causes a regression:

    • repeat the release-branch restore procedure using the last known-good production SHA
    • push the restored release branch
    • rerun the relevant release <service name> job
    • recheck production health, smoke tests, and monitoring
    • update the GitLab incident with the rollback time and observed outcome

    If the service does not recover after the code rollback, stop using this runbook and follow the service-specific incident, infrastructure, or data recovery documentation.

    Links To Service-Specific Details¶

    • Shared CI and release context: CI and Release Integration
    • BetterFleet service lookup: Service Matrix
    • BetterFleet Manage service docs: Manage Services
    • BetterFleet Plan service docs: Plan Services
    • GitLab incident template: https://gitlab.com/evenergi/develop/-/issues/new?issuable_template=incident&type=INCIDENT
    • Target repository details: the target repository's .gitlab-ci.yml, release jobs, environment page, and service-specific operational docs
    Made with Material for MkDocs