0016 - Logging & log levels¶
Date¶
2024-01-16
Status¶
Proposed
Context¶
In our Python applications, we need a consistent and effective way to log various events and errors. Logging levels help in categorising the severity and type of logs, which in turn aids in monitoring and troubleshooting.
Decision¶
We will use the following logging levels in our Python applications: * CRITICAL: Used when a critical error occurs, and the application cannot recover from it. This level is for events that require immediate attention, and the likelihood of the application recovering on its own is close to zero. Examples include system outages, critical data corruption, failing authorizations to external services, resources at maximum capacity for extended periods, or security breaches. Critical logs will trigger alerts, which are currently SNS emails forwarded to Slack. * ERROR: Used for significant issues in which the application has encountered a problem but may still continue running. The odds of recovery are high, or the failure might be an expected outcome. Errors do not trigger alerts but are logged for investigation. Examples include failed transactions, missing data in non-critical paths, or unexpected states.
Additionally, if a result is processed from an external service and there is a catch all (example: except Exception as e) then this should be logged as an error.
- WARNING: For situations that are not errors but could potentially be harmful or indicative of an impending issue. Warnings do not indicate a failure but are logged for awareness and potential future action. A typical scenario is when an endpoint returns a 404 response, and the existence of the entity is is uncertain, or when an exception is caught, but the code path still returns a value (even if empty).
- INFO: Used for general operational messages that track the normal functioning of the application. These logs are helpful for understanding the flow of the application and include events such as service startup, shutdown, routine operations, or configuration changes.
- DEBUG: The most granular logging level, intended for detailed diagnostic information useful during development or deep troubleshooting. This includes internal state, intermediate values, SQL queries, loop progress, and other low-level details that would be too noisy for normal production logs.
Consequences¶
- More consistent logs across services and developers.
- Better filtering and alerting because log intent is more explicit.
- Some judgement is still required, especially around deciding when a warning should instead be treated as an error or critical issue.