Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elasticsearchexporter] Direct serialization without objmodel in OTel mode #37032

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Contributor

@felixbarny felixbarny commented Jan 6, 2025

Directly serializes pdata to JSON in OTel mode

@felixbarny
Copy link
Contributor Author

Benchmark results:

TL;DR: the throughput is almost 2x for metrics and over 2x for logs and traces. The allocated bytes/op are reduced by over 70% across the board

                                      │   old.txt   │               new.txt                │
                                      │   sec/op    │    sec/op     vs base                │
Exporter/logs/otel/small_batch-10       79.16µ ± 1%   37.09µ ±  4%  -53.15% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      757.0µ ± 2%   296.9µ ±  6%  -60.78% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7.392m ± 1%   2.793m ±  1%  -62.22% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      70.50m ± 1%   27.57m ±  0%  -60.89% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    414.8µ ± 1%   211.0µ ±  1%  -49.13% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   3.960m ± 0%   2.027m ±  1%  -48.82% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.97m ± 1%   21.37m ±  0%  -46.54% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   421.3m ± 1%   228.2m ± 12%  -45.85% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     79.64µ ± 0%   31.92µ ±  1%  -59.92% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    765.5µ ± 1%   272.8µ ±  1%  -64.36% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7.341m ± 1%   2.608m ±  1%  -64.48% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    71.74m ± 1%   25.73m ±  2%  -64.13% (p=0.000 n=10)
geomean                                 4.171m        1.783m        -57.25%

                                      │   old.txt   │                new.txt                │
                                      │  events/s   │   events/s    vs base                 │
Exporter/logs/otel/small_batch-10       126.3k ± 1%   269.6k ±  4%  +113.43% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      132.1k ± 2%   336.8k ±  6%  +154.97% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       135.3k ± 1%   358.1k ±  1%  +164.69% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      141.8k ± 1%   362.7k ±  0%  +155.70% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    168.8k ± 1%   331.8k ±  1%   +96.59% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   176.7k ± 0%   345.3k ±  1%   +95.39% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    175.1k ± 1%   327.6k ±  0%   +87.05% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   166.1k ± 1%   306.8k ± 11%   +84.66% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     125.6k ± 0%   313.3k ±  1%  +149.52% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    130.6k ± 1%   366.5k ±  1%  +180.58% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     136.2k ± 0%   383.5k ±  1%  +181.50% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    139.4k ± 1%   388.6k ±  2%  +178.79% (p=0.000 n=10)
geomean                                 145.0k        339.3k        +133.93%

                                      │   old.txt    │               new.txt                │
                                      │     B/op     │     B/op      vs base                │
Exporter/logs/otel/small_batch-10       80.58Ki ± 0%   16.98Ki ± 1%  -78.93% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      793.1Ki ± 0%   158.4Ki ± 0%  -80.02% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       7.727Mi ± 0%   1.527Mi ± 0%  -80.24% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      77.16Mi ± 0%   15.20Mi ± 0%  -80.30% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    403.9Ki ± 0%   112.1Ki ± 0%  -72.25% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   3.926Mi ± 0%   1.070Mi ± 0%  -72.75% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    39.51Mi ± 0%   11.08Mi ± 0%  -71.97% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   390.3Mi ± 0%   130.3Mi ± 1%  -66.62% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10     80.74Ki ± 0%   16.27Ki ± 0%  -79.85% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    794.6Ki ± 0%   151.4Ki ± 0%  -80.95% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     7.739Mi ± 0%   1.462Mi ± 0%  -81.11% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    77.34Mi ± 0%   14.56Mi ± 0%  -81.18% (p=0.000 n=10)
geomean                                 4.219Mi        967.0Ki       -77.62%

                                      │   old.txt   │               new.txt               │
                                      │  allocs/op  │  allocs/op   vs base                │
Exporter/logs/otel/small_batch-10        553.0 ± 0%    172.0 ± 1%  -68.90% (p=0.000 n=10)
Exporter/logs/otel/medium_batch-10      5.430k ± 0%   1.619k ± 0%  -70.18% (p=0.000 n=10)
Exporter/logs/otel/large_batch-10       54.19k ± 0%   16.08k ± 0%  -70.32% (p=0.000 n=10)
Exporter/logs/otel/xlarge_batch-10      541.7k ± 0%   160.7k ± 0%  -70.34% (p=0.000 n=10)
Exporter/metrics/otel/small_batch-10    4.301k ± 0%   1.918k ± 0%  -55.41% (p=0.000 n=10)
Exporter/metrics/otel/medium_batch-10   42.83k ± 0%   19.00k ± 0%  -55.65% (p=0.000 n=10)
Exporter/metrics/otel/large_batch-10    428.0k ± 0%   189.8k ± 0%  -55.65% (p=0.000 n=10)
Exporter/metrics/otel/xlarge_batch-10   4.278M ± 0%   1.943M ± 0%  -54.59% (p=0.000 n=10)
Exporter/traces/otel/small_batch-10      594.0 ± 0%    192.0 ± 0%  -67.68% (p=0.000 n=10)
Exporter/traces/otel/medium_batch-10    5.830k ± 0%   1.818k ± 0%  -68.82% (p=0.000 n=10)
Exporter/traces/otel/large_batch-10     58.19k ± 0%   18.08k ± 0%  -68.94% (p=0.000 n=10)
Exporter/traces/otel/xlarge_batch-10    581.7k ± 0%   180.6k ± 0%  -68.95% (p=0.000 n=10)
geomean                                 35.09k        12.21k       -65.19%

scopeMetrics := scopeMetrics.At(j)
scope := scopeMetrics.Scope()
groupedDataPointsByIndex := make(map[string]map[uint32][]dataPoint)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewer: I made it so that documents from different scopes are never merged. This simplified the serialization logic and also fixes a subtle bug in the current implementation where we're only hashing the scope attributes but not the scope name. This leads to grouping of potentially different scopes to the same document. I guess as a consequence, we should also add the scope name as a dimension in the mappings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by moving this here, rather than outside of the scopeMetrics loop, we're assuming that there will never be two identical scopes within a resource. Is that a safe assumption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's no worse than the existing assumption that resourceMetrics is free of duplicate resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will make it safe is that the only consequence of being wrong in the assumption is leaving some storage savings on the table. In other words, we should prioritize elastic/elasticsearch#99123, which turns out to be more of an issue than anticipated in various contexts.

Copy link
Contributor

@axw axw Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't duplicates resources/scopes lead to duplicate _tsid & doc rejections? Definitely agree on prioritising that issue though...

@felixbarny felixbarny marked this pull request as ready for review January 10, 2025 07:42
@felixbarny felixbarny requested a review from a team as a code owner January 10, 2025 07:42
@felixbarny felixbarny requested a review from songy23 January 10, 2025 07:42
bytes.Buffer.Write is guaranteed to not return an error
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The amount of handwritten serialisation makes me a little uncomfortable, but we can perhaps improve that with code generation later.

Comment on lines +275 to +276
attrCopy := pcommon.NewMap()
attributes.CopyTo(attrCopy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could avoid this copy. The geo.location merge might be a bit complicated, but the data stream fields should be reasonably straightforward to filter out by iterating through. A job for another day.

// Determine if this log record is an event, as they are mapped differently
// https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/events.md
var bodyType string
if _, hasEventNameAttribute := record.Attributes().Get("event.name"); hasEventNameAttribute || record.EventName() != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do this check in the caller, and in a more efficient way, so should we make it a bool parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants