Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEP: Recording exceptions as log based events #4333

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Dec 10, 2024

Related to open-telemetry/semantic-conventions#1536

Changes

Recording exceptions as span events is problematic since it

  • ties recording exceptions to tracing/sampling
  • duplicates exceptions recorded by instrumented libraries on logs
  • does not leverage log features such as typical log filtering based on severity

This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.

If accepted, the follow-up spec changes are expected to replace existing (stable) documents:


@lmolkova lmolkova changed the title OTEP: Recording exceptions and errors with OpenTelemetry OTEP: Recording exceptions as log based events Dec 10, 2024
@pellared
Copy link
Member

pellared commented Dec 10, 2024

I think this is a related issue:

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
@tedsuo tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 12, 2024
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch 2 times, most recently from b06a09f to 76c7d85 Compare December 17, 2024 17:30
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from 9bc6220 to a306972 Compare December 27, 2024 17:42
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from a306972 to 974505f Compare December 27, 2024 18:12
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from db27087 to e9f38aa Compare January 3, 2025 01:42
@carlosalberto
Copy link
Contributor

A small doubt:

If this instrumentation supports tracing, it should capture the error in the scope of the processing span.

Although (I think) it's not called out, I'm understanding exceptions should now be explicitly reported as both 1) Span.Event and 2) Log/Event? i.e. coding wise you should do this:

currentSpan.recordException(e);
logger.logRecordBuilder
    .addException(e);

Is this the case?

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm very supportive. Just some nits and one mitigation I'd like to see called out/addressed.


5. An error should be logged with appropriate severity depending on the available context.

- Errors that don't indicate any issue should be recorded with severity not higher than `Info`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an example of this? I'm struggling to understand when this would be used.

5. An error should be logged with appropriate severity depending on the available context.

- Errors that don't indicate any issue should be recorded with severity not higher than `Info`.
- Transient errors (even if it's the last try) should be recorded with severity not higher than `Warning`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know an error is transient when writing instrumentation? I think you mean errors that you KNOW the application will attempt to handle / retry, right?

I'd suggest rewording (or defining the meaning of transient).

4. It's not recommended to record the same error as it propagates through the stack trace or
attach the same instance of exception to multiple log records.

5. An error should be logged with appropriate severity depending on the available context.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the goal of the taxonomy, but think we need to crisp up the language around Info/Warning

> OTel should provide API like `setException` when creating log record that will record only necessary information depending
> on the configuration and log severity.

It should not be an instrumentation library concern to decide whether exception stack trace should be recorded or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. SHOULD is normative, so please capatilize (and I think IS a normative statement here).
  2. This may not be language neutral, so I think SHOULD is the right guidance here. For example, in Rust, stack traces are something you can opt-in on an error. They leave some details to libraries (see Rust Backtrace/Source capabilities on https://docs.rs/thiserror/latest/thiserror/ e.g. or C++ [prior to 23] https://github.com/jeremy-rifkin/cpptrace).
    Additionally, in some highly green-thread/async APIs, I've seen custom stack traces created (e.g. Scala's ZIO where they try to preserve logical stack when physical stack is a confusing mess of work-stealing green-threads. We should allow these to interact with exception reporting in OTEL in some fashion.

I agree with the sentiment, I'd expand the wording though to allow languages like Rust/C++ (and Java ecosystem) to provide stack trace compatibility with their library ecosystem.

with appropriate severity (or stop reporting them).
- We should provide opt-in mechanism for existing instrumentations to switch to logs.

2. Recording exceptions as log-based events would result in UX degradation for users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also call out the fact that now we have two channels of exporting/batching/recording information of exceptions and Traces. In this new world, you may see a trace before an exception or vice versa, and one may be dropped where the other is not.

We probably need some other mitigatioin should that requiring knowledge of an exception event under a Span is no longer needed (e.g. more aggressively using Span.status and attributes around "transient failures" as we discussed in Semconv SIG.


## Motivation

Today OTel supports recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries and may be recorded as OTel logs via logging bridge.
Copy link
Member

@pellared pellared Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call out that that it would be helpful for metrics instrumentation libraries (that do not produce spans at all, like runtime instrumentation) so they could report exceptions (errors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OTEP OpenTelemetry Enhancement Proposal (OTEP)
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

9 participants