Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[elasticsearchexporter]: Add dynamic document id support for logs #37065

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

mauri870
Copy link
Contributor

@mauri870 mauri870 commented Jan 7, 2025

Description

This PR adds a new config option logs_dynamic_id that when set to true reads the elasticsearch.document_id attribute from each log record and uses it as the final document id in Elasticsearch. This is only implemented for logs but I can open subsequent PRs supporting metrics and traces akin to the *_dynamic_index options.

Fixes #36882

Testing

Added tests to verify that the document ID attribute can be read from the log record and that the _id is properly forwarded to Elasticsearch. Also asserted that when there is no doc id attribute the current behavior is retained.

Documentation

Updated the readme to mention the new logs_dynamic_id config option.

@mauri870 mauri870 requested a review from a team as a code owner January 7, 2025 13:45
@mauri870 mauri870 requested a review from djaglowski January 7, 2025 13:45
@mauri870 mauri870 changed the title [elasticsearchexporter]: Add support for for setting a document id for logs [elasticsearchexporter]: Add dynamic document id support for logs Jan 7, 2025
@mauri870 mauri870 marked this pull request as draft January 8, 2025 11:10
@mauri870 mauri870 marked this pull request as ready for review January 8, 2025 12:13
Copy link
Contributor

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good at a high level, thanks!

.chloggen/elasticsearchexporter_logs_dynamic_id.yaml Outdated Show resolved Hide resolved
exporter/elasticsearchexporter/exporter.go Outdated Show resolved Hide resolved
Copy link
Contributor

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: do you expect the elasticsearch.document_id to be indexed under attributes as well, or should it be ignored during serialization?

@mauri870
Copy link
Contributor Author

mauri870 commented Jan 9, 2025

Q: do you expect the elasticsearch.document_id to be indexed under attributes as well, or should it be ignored during serialization?

Haven't thought of that, but I think it makes sense to remove the field, as it only has meaning for the exporter.

Also, elasticsearch.document_id is overly specific, I'm happy to hear suggestions on the naming.

@mauri870
Copy link
Contributor Author

mauri870 commented Jan 9, 2025

I updated the code to remove the id field from the final document.

Regarding the attribute name, I wonder if we should stick to some kind of semantic convention such as https://opentelemetry.io/docs/specs/semconv/database/elasticsearch/. I scrolled throught it but couldn't find an existing attribute for a document id.

Comment on lines +463 to +471
func (e *elasticsearchExporter) extractDocumentIDAttribute(m pcommon.Map) string {
if e.config.LogsDynamicID.Enabled {
docID, ok := getFromAttributes(documentIDAttributeName, "", m)
m.Remove(documentIDAttributeName)
if docID != "" && ok {
return docID
}
}
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (e *elasticsearchExporter) extractDocumentIDAttribute(m pcommon.Map) string {
if e.config.LogsDynamicID.Enabled {
docID, ok := getFromAttributes(documentIDAttributeName, "", m)
m.Remove(documentIDAttributeName)
if docID != "" && ok {
return docID
}
}
return ""
func (e *elasticsearchExporter) extractDocumentIDAttribute(m pcommon.Map) (docID string) {
if e.config.LogsDynamicID.Enabled {
m.RemoveIf(func(k string, value pcommon.Value) bool {
if k == documentIDAttributeName {
docID = value.AsString()
return true
}
return false
})
}
return
}

nit: Get and Remove are both O(N) in pcommon.Map. Use RemoveIf to iterate over the map only once.

@@ -61,6 +61,9 @@ type Config struct {
// fall back to pure TracesIndex, if 'elasticsearch.index.prefix' or 'elasticsearch.index.suffix' are not found in resource or attribute (prio: resource > attribute)
TracesDynamicIndex DynamicIndexSetting `mapstructure:"traces_dynamic_index"`

// LogsDynamicID is used to configure the document id for logs.
LogsDynamicID DynamicIndexSetting `mapstructure:"logs_dynamic_id"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LogsDynamicID DynamicIndexSetting `mapstructure:"logs_dynamic_id"`
LogsDynamicID bool `mapstructure:"logs_dynamic_id"`

nit: This has nothing to do with Index, which makes the use of DynamicIndexSetting slightly odd.

Also, personally I find this 1 level of nesting not very ergonomic. Unless we plan to add features to dynamic ID in the future e.g. logs_dynamic_id::attribute_name to make the attribute name configurable, I feel like adding a layer just to have a ::enabled bool not very useful. You can tell I'm already not very happy with *_dynamic_index::enabled. But I'd be interested to know other codeowners' opinion on this.

@@ -61,6 +61,9 @@ type Config struct {
// fall back to pure TracesIndex, if 'elasticsearch.index.prefix' or 'elasticsearch.index.suffix' are not found in resource or attribute (prio: resource > attribute)
TracesDynamicIndex DynamicIndexSetting `mapstructure:"traces_dynamic_index"`

// LogsDynamicID is used to configure the document id for logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// LogsDynamicID is used to configure the document id for logs.
// LogsDynamicID configures whether log record attribute `elasticsearch.document_id` is set as the document ID in ES.

Comment on lines +148 to +149
- `logs_dynamic_id` (optional): Dynamically determines the document ID to be used in Elasticsearch based on a log record attribute.
- `enabled`(default=false): Enable/Disable dynamic ID for log records. If `elasticsearch.document_id` exists and is != "" in the log record attributes, it will be used as the document ID. Otherwise, the document ID will be generated by Elasticsearch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed below. Also would be good to mention that the attribute would be stripped.

component: elasticsearchexporter

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Support for dynamically setting the document ID of log records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
note: Support for dynamically setting the document ID of log records.
note: Add config `logs_dynamic_id` to dynamically set the document ID of log records using log record attribute `elasticsearch.document_id`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/elasticsearch] Ability to specify the document ID for logs
5 participants