-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032
base: main
Are you sure you want to change the base?
[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032
Conversation
Benchmark results: TL;DR: the throughput is almost 2x for metrics and over 2x for logs and traces. The allocated bytes/op are reduced by over 70% across the board
|
scopeMetrics := scopeMetrics.At(j) | ||
scope := scopeMetrics.Scope() | ||
groupedDataPointsByIndex := make(map[string]map[uint32][]dataPoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewer: I made it so that documents from different scopes are never merged. This simplified the serialization logic and also fixes a subtle bug in the current implementation where we're only hashing the scope attributes but not the scope name. This leads to grouping of potentially different scopes to the same document. I guess as a consequence, we should also add the scope name as a dimension in the mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by moving this here, rather than outside of the scopeMetrics
loop, we're assuming that there will never be two identical scopes within a resource. Is that a safe assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's no worse than the existing assumption that resourceMetrics is free of duplicate resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will make it safe is that the only consequence of being wrong in the assumption is leaving some storage savings on the table. In other words, we should prioritize elastic/elasticsearch#99123, which turns out to be more of an issue than anticipated in various contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't duplicates resources/scopes lead to duplicate _tsid & doc rejections? Definitely agree on prioritising that issue though...
bytes.Buffer.Write is guaranteed to not return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The amount of handwritten serialisation makes me a little uncomfortable, but we can perhaps improve that with code generation later.
attrCopy := pcommon.NewMap() | ||
attributes.CopyTo(attrCopy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could avoid this copy. The geo.location
merge might be a bit complicated, but the data stream fields should be reasonably straightforward to filter out by iterating through. A job for another day.
// Determine if this log record is an event, as they are mapped differently | ||
// https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/events.md | ||
var bodyType string | ||
if _, hasEventNameAttribute := record.Attributes().Get("event.name"); hasEventNameAttribute || record.EventName() != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do this check in the caller, and in a more efficient way, so should we make it a bool parameter?
Directly serializes pdata to JSON in OTel mode
objmodel.Document
needs to be created first