Google Sign-In Required

Use your company Google account to access the BetterFleet private content.

Back to private home

BetterFleet Support Private
Skip to content
BetterFleet Dev Wiki
0015 - Async/co-routine exception handling pattern
Initializing search
    bf-dev
    • Home
    • Product Capabilities
    • Process
    • Current Work
    • System Design
    • Software Reference
    • Operations
    bf-dev
    • Home
      • Overview
      • Manage
      • Overview
      • Product Engineering Workflow
      • Product Engineering Delivery
      • Product Engineering Workflow in Linear
        • GitLab Feature Flags
        • In-App Docs Authoring
        • Release Notes
      • Templates
      • Publishing
      • Workflow Companions
      • Overview
      • Active Artifacts
      • Backlog Artifacts
      • Archived Artifacts
      • Overview
      • Microgrid
      • OSCP
        • Challenge
        • Specification
        • Spec
        • Architecture
        • Overview
        • Script Runtime Model
        • Compose Profiles and Modes
        • Repo Topology
        • CI and Release Integration
        • Overview
        • Internal Application Diagrams
          • Overview
          • Web Model
          • Core Model
        • Service Interaction Flows
        • Data and State
          • Index
          • bf-manage-web
          • bf-manage-core
          • bf-manage-connect
          • bf-manage-sitepwrmon
          • bf-manage-incidents
          • bf-telematics
          • bf-depot-sim
          • bf-manage-roaming
          • bf-support-microsite
          • bf-digital-twin
          • bf-schedule-creator
        • Overview
        • Internal Application Diagrams
        • Migration and Flags
        • Simulation Request Lifecycle
          • Index
          • bf-bnl-ui
          • bf-bnl-settings
          • bf-bnl-schedule-analysis-compute
          • bf-route-modelling
          • bf-schedule-creator
          • bf-digital-twin
        • Overview
        • Secrets and Env Strategy
        • Vendors and Local Dependencies
        • ADRs
        • Service Matrix
        • Cloud Dependencies
        • Ports and URLs
      • Onboarding
      • Daily Operations Runbook
        • Overview
        • Staging Hotfix Release
        • Production Hotfix Release
        • Terraform Plan Dry Runs
      • Troubleshooting
      • Testing Guide
    • Date
    • Status
    • Context
    • Decision
    • Consequences

    0015 - Async/co-routine exception handling pattern¶

    Date¶

    2024-01-10

    Status¶

    Proposed

    Context¶

    In our system architecture, we utilise long-running coroutines that are initialised at application startup. These coroutines are pivotal for continuous background processing and various asynchronous tasks. Effective error handling is crucial to ensure the resilience and reliability of these coroutines, especially since they run persistently and handle a range of operations.

    Decision¶

    We have decided to implement a two-tiered error handling strategy for our long-running coroutines: 1. Inner Loop Handling (Recoverable Errors): * Within the inner loop of each coroutine, we will focus on handling errors that are deemed recoverable. * This includes implementing retry logic or other recovery mechanisms for transient or expected errors, such as temporary network issues or service interruptions. * The coroutine should attempt to resolve these issues internally and continue its normal operation without escalating to a full restart or termination. 2. Outer Loop Handling (Critical Errors): * Outside the inner loop, we will implement an exception handler that catches any errors not addressed within the inner loop. * Upon encountering such an error, the coroutine will be terminated. This action is reserved for critical issues that indicate a fundamental problem which cannot be recovered from within the scope of the coroutine’s logic. * A critical log message will be generated upon such termination to alert system administrators and developers to the issue. 3. Use of Bare Except Clause: * We will employ a bare except clause to ensure all exceptions are caught, including those not explicitly anticipated. * While this approach can capture unexpected errors and we would not use it for short-lived code, in the context of long-running coroutines where continuous operation is desired, it provides a safety net. We recognise that this comes with the trade-off of potentially catching exceptions that might not need special handling, but in this scenario, the goal of uninterrupted service outweighs this concern.

    Consequences¶

    • Increased Resilience: The system will be more robust against recoverable errors, reducing the likelihood of coroutines failing due to transient issues.
    • Improved Error Visibility: Critical errors that cause coroutine termination will be clearly logged, making it easier to diagnose and address underlying issues.
    • Potential Overcatching: Using a bare except clause may lead to some exceptions being caught that could otherwise be handled differently, which could mask certain issues. However, this is considered an acceptable trade-off in this specific context of long-running coroutines.
    • Implementation Complexity: Developers will need to carefully distinguish between recoverable and critical errors and implement appropriate handling logic in both the inner and outer loops.
    Made with Material for MkDocs