-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTEP: Recording exceptions as log based events #4333
base: main
Are you sure you want to change the base?
Conversation
I think this is a related issue: |
b06a09f
to
76c7d85
Compare
5078c47
to
9bc6220
Compare
9bc6220
to
a306972
Compare
a306972
to
974505f
Compare
1a1ea49
to
5ddfd05
Compare
Co-authored-by: Joao Grassi <[email protected]>
db27087
to
e9f38aa
Compare
A small doubt:
Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this: currentSpan.recordException(e);
logger.logRecordBuilder
.addException(e); Is this the case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.
|
||
5. An error should be logged with appropriate severity depending on the available context. | ||
|
||
- Errors that don't indicate any issue should be recorded with severity not higher than `Info`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's an example of this? I'm struggling to understand when this would be used.
5. An error should be logged with appropriate severity depending on the available context. | ||
|
||
- Errors that don't indicate any issue should be recorded with severity not higher than `Info`. | ||
- Transient errors (even if it's the last try) should be recorded with severity not higher than `Warning`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know an error is transient when writing instrumentation? I think you mean errors that you KNOW the application will attempt to handle / retry, right?
I'd suggest rewording (or defining the meaning of transient).
4. It's not recommended to record the same error as it propagates through the stack trace or | ||
attach the same instance of exception to multiple log records. | ||
|
||
5. An error should be logged with appropriate severity depending on the available context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the goal of the taxonomy, but think we need to crisp up the language around Info
/Warning
> OTel should provide API like `setException` when creating log record that will record only necessary information depending | ||
> on the configuration and log severity. | ||
|
||
It should not be an instrumentation library concern to decide whether exception stack trace should be recorded or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- SHOULD is normative, so please capatilize (and I think IS a normative statement here).
- This may not be language neutral, so I think SHOULD is the right guidance here. For example, in Rust, stack traces are something you can opt-in on an error. They leave some details to libraries (see Rust Backtrace/Source capabilities on https://docs.rs/thiserror/latest/thiserror/ e.g. or C++ [prior to 23] https://github.com/jeremy-rifkin/cpptrace).
Additionally, in some highly green-thread/async APIs, I've seen custom stack traces created (e.g. Scala's ZIO where they try to preserve logical stack when physical stack is a confusing mess of work-stealing green-threads. We should allow these to interact with exception reporting in OTEL in some fashion.
I agree with the sentiment, I'd expand the wording though to allow languages like Rust/C++ (and Java ecosystem) to provide stack trace compatibility with their library ecosystem.
with appropriate severity (or stop reporting them). | ||
- We should provide opt-in mechanism for existing instrumentations to switch to logs. | ||
|
||
2. Recording exceptions as log-based events would result in UX degradation for users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also call out the fact that now we have two channels of exporting/batching/recording information of exceptions and Traces. In this new world, you may see a trace before an exception or vice versa, and one may be dropped where the other is not.
We probably need some other mitigatioin should that requiring knowledge of an exception event under a Span is no longer needed (e.g. more aggressively using Span.status and attributes around "transient failures" as we discussed in Semconv SIG.
|
||
## Motivation | ||
|
||
Today OTel supports recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries and may be recorded as OTel logs via logging bridge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also call out that that it would be helpful for metrics instrumentation libraries (that do not produce spans at all, like runtime instrumentation) so they could report exceptions (errors).
Related to open-telemetry/semantic-conventions#1536
Changes
Recording exceptions as span events is problematic since it
This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.
If accepted, the follow-up spec changes are expected to replace existing (stable) documents:
Related OTEP(s) #CHANGELOG.md
file updated for non-trivial changesspec-compliance-matrix.md
updated if necessary