Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registry manifest and Schema diff #400

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Oct 3, 2024

Note: The scope of this PR has been reduced to focus only focus on the schema diff feature. Github issues have been created to track the features that have been postponed #482, #483.

This PR implements the command registry diff, see the following example:

cargo run -- registry diff -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.27.0.zip[model] --baseline-registry https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.26.0.zip[model] --diff-format markdown

In this example, the diff is displayed in markdown format. The following formats are supported: json, yaml, markdown, ansi, ansi_stats

A detailed description of the schema diff data model and the diffing process is visible here.

Notes:

  • The crate weaver_otel_schema is not essential for this PR; it was initially included as part of the preparations for the registry schema-update command. We have decided to implement this command in a future PR. However, for simplicity, I prefer to keep the preparation code in place instead of removing it. Same thing for all_changes in weaver_version.

List of modifications to apply to the semantic conventions repository after the release of the Weaver containing the current PR:

  • Add a registry-manifest.yaml file with the version of the next release.
  • Update all deprecated fields.

Closes: #186

@lquerel lquerel self-assigned this Oct 3, 2024
@lquerel lquerel added the enhancement New feature or request label Oct 3, 2024
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
@lquerel lquerel changed the title [WIP] Registry manifest and OTEL schema update [WIP] Registry manifest and Schema diff Nov 27, 2024
# Conflicts:
#	.clippy.toml
#	Cargo.toml
#	crates/weaver_semconv_gen/src/lib.rs
#	src/registry/search.rs
#	src/registry/stats.rs
#	src/registry/update_markdown.rs
@lquerel
Copy link
Contributor Author

lquerel commented Dec 19, 2024

@lmolkova @jsuereth

Note 1: I have addressed most of the feedback. The main task remaining is the removal of change detection for elements other than attributes. That will be done soon.

Note 2: I will also write a document describing: the format of the schema diff, examples of what can be done with it, the current limitations, and ideas for future development.

I have a question regarding the format of the new deprecated field. In the current version of this PR, a deprecated field can take one of the following three forms:

Old approach (still supported for compatibility reasons):

deprecated: "deprecation message"

or

deprecated:
  action: renamed
  renamed_to: attribute_name

or

deprecated:
  action: deprecated

With this, we can handle simple attribute renaming scenarios, as well as merge scenarios (e.g., A and B are renamed to C; Weaver will detect this automatically). However, we currently have no way to represent a split (e.g., A is renamed to B and C). So with the current implementation, the semconv author will need to set deprecated to action: deprecated and provide a note at the object level to explain the split in textual form.

We could make this explicit in the format of the deprecated field and in the diff output. This would allow for migration documentation that more accurately reflects the desired changes. However, it still wouldn’t enable automatic downgrades in the schema processor for the split scenario (at least without logic taking into account some additional context).

Question: Adding such an advanced definition for the deprecated field isn’t particularly complicated, so I don’t mind including it. What do you think? Are there other types of deprecations you’d like to codify?

@lquerel lquerel marked this pull request as ready for review December 30, 2024 23:45
@lquerel lquerel requested a review from a team as a code owner December 30, 2024 23:45
@lquerel lquerel changed the title [WIP] Registry manifest and Schema diff Registry manifest and Schema diff Dec 30, 2024
@lquerel lquerel requested review from jsuereth and lmolkova December 30, 2024 23:48
semconv_version: v1.26.0
changes:
attributes:
- type: deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit and personal taste, so definitely not blocking

having

- name: http.server_name
  type: deprecated

looks more readable to me, maybe because it's closer to how we write attribute definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will make this change in the documentation.

docs/usage.md Show resolved Hide resolved
and the deprecation note is stored in the note attribute.
- `removed`: An item in the baseline registry was removed in the head
registry. The name of the removed item is stored in the name attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll eventually need a way to describe other (breaking) changes such as attribute type, metric unit or instrument change.

Maybe we can add other category that would be used for change types that we don't formally support yet? This would allow someone to look at the diff and understand that there was some change and maybe some manual intervention is necessary.
Otherwise, the lack of any mention could be perceived as 'there was no change, all looks good and compatible'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a note and describe how this could be represented as a potential future extension.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still haven't finished some of the more mechanical parts, but did review the diff algorithm, deserialization core code.

I think we have an issue around attribute identity we may need to solve. PTAL at comments.

}

/// Returns the number of non-fatal errors, or 1 if the result is a fatal error, 0 otherwise.
pub fn error_count(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we use num_errors?

@@ -22,7 +22,7 @@ Brief: Span attributes used by non-OTLP exporters to represent OpenTelemetry Sco
- Examples: [
"io.opentelemetry.contrib.mongodb",
]
- Deprecated: use the `otel.scope.name` attribute.
- Deprecated:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be updated?

@@ -39,7 +39,7 @@ Brief: {{ resource.brief }}
- Sampling relevant: {{ attribute.sampling_relevant }}
{%- endif %}
{%- if attribute.deprecated %}
- Deprecated: {{ attribute.deprecated }}
- Deprecated: {{ attribute.deprecated.note }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to put this in the changelog or have some kind of migration guide for this then.

/// Creates a new string attribute.
/// Note: This constructor is used for testing purposes.
#[cfg(test)]
pub(crate) fn string<S: AsRef<str>>(name: S, brief: S, note: S) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: name, brief and note would have to be the same concrete type for this helper function.

You should either have three different type parameters for them or use something like impl Into<String>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I like the proposal.


/// Sets the deprecated field of the attribute.
/// Note: This method is used for testing purposes.
#[cfg(test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these helper methods. May replicate the pattern elsewhere.


/// Get the attributes of the resolved telemetry schema.
#[must_use]
pub fn attribute_map(&self) -> HashMap<&str, &Attribute> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some faulty assumptions here.

  1. This assumes that all Attributes MUST show up in an attribute group. That is true for Semantic Conventions (due to custom policy), but not enforced by weaver.
  2. This assumes that all attribute groups are registries. This is not true for Semantic Conventions. We have some attribute groups that just "share" attributes for other non-attribute groups. These may be selected before the registry attributes.

Sadly, I think we likely need a more rigid identifier here than name. I haven't had a chance to look through the rest of the code for implications, but I don't think we can rely on this method to only give us unique attributes or attributes from the registry.

///
/// Note: At the moment (2024-12-30), I don't know a better way to identify
/// the "registry" attributes other than by checking if the group ID starts
/// with "registry.".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because "registry" is a semantic convention concept, not a weaver one. Should we elevate this to weaver itself?


/// Get the groups of a specific type from the resolved telemetry schema.
#[must_use]
pub fn groups(&self, group_type: GroupType) -> HashMap<String, &Group> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Any reason why this one uses String instead of &str like the other helpers?

semconv_specs: Vec<(String, SemConvSpec)>,
) -> SemConvRegistry {
) -> Result<SemConvRegistry, Error> {
static VERSION_REGEX: LazyLock<Regex> =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should use: https://docs.rs/semver/latest/semver/ and URL parser that can give us the last element of the path to send to the parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Release
Development

Successfully merging this pull request may close these issues.

Automate OTEL Schema Generation and Update Process with Migration Guide Support
4 participants