-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APR-281] Add support for blocking specific metrics ingested through DogStatsD #414
base: main
Are you sure you want to change the base?
Conversation
Regression Detector (DogStatsD)Regression Detector ResultsRun ID: 53937327-1bb6-4058-87d2-fc0b0a3e8deb Baseline: 7.61.0-rc.8 Optimization Goals: ✅ No significant changes detected
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | dsd_uds_100mb_3k_contexts_distributions_only | memory utilization | +0.32 | [+0.16, +0.49] | 1 | |
➖ | dsd_uds_500mb_3k_contexts | ingress throughput | +0.13 | [+0.11, +0.15] | 1 | |
➖ | dsd_uds_1mb_50k_contexts | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_1mb_50k_contexts_memlimit | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_512kb_3k_contexts | ingress throughput | +0.00 | [-0.01, +0.01] | 1 | |
➖ | dsd_uds_100mb_250k_contexts | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_1mb_3k_contexts | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_1mb_3k_contexts_dualship | ingress throughput | -0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_40mb_12k_contexts_40_senders | ingress throughput | -0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_10mb_3k_contexts | ingress throughput | -0.00 | [-0.03, +0.02] | 1 | |
➖ | dsd_uds_100mb_3k_contexts | ingress throughput | -0.01 | [-0.05, +0.04] | 1 | |
➖ | quality_gates_idle_rss | memory utilization | -0.02 | [-0.15, +0.11] | 1 |
Bounds Checks: ❌ Failed
perf | experiment | bounds_check_name | replicates_passed | links |
---|---|---|---|---|
❌ | quality_gates_idle_rss | memory_usage | 0/10 |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector (Saluki)Regression Detector ResultsRun ID: f2ca9577-ce66-45c9-b4fb-4b09a4a2a08b Baseline: bf79211 ❌ Experiments with missing or malformed dataThis is a critical error. No usable optimization goal data was produced by the listed experiments. This may be a result of misconfiguration. Ping #single-machine-performance and we can help out.
Optimization Goals: ✅ No significant changes detected
|
perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
---|---|---|---|---|---|---|
➖ | dsd_uds_1mb_50k_contexts | ingress throughput | +0.01 | [-0.00, +0.02] | 1 | |
➖ | dsd_uds_10mb_3k_contexts | ingress throughput | +0.01 | [-0.02, +0.03] | 1 | |
➖ | dsd_uds_1mb_3k_contexts_dualship | ingress throughput | +0.00 | [-0.00, +0.00] | 1 | |
➖ | dsd_uds_512kb_3k_contexts | ingress throughput | -0.00 | [-0.01, +0.01] | 1 | |
➖ | dsd_uds_100mb_250k_contexts | ingress throughput | -0.00 | [-0.03, +0.03] | 1 | |
➖ | dsd_uds_40mb_12k_contexts_40_senders | ingress throughput | -0.00 | [-0.04, +0.04] | 1 | |
➖ | dsd_uds_1mb_3k_contexts | ingress throughput | -0.00 | [-0.01, +0.01] | 1 | |
➖ | dsd_uds_100mb_3k_contexts | ingress throughput | -0.00 | [-0.06, +0.05] | 1 | |
➖ | dsd_uds_100mb_3k_contexts_distributions_only | memory utilization | -0.45 | [-0.59, -0.32] | 1 | |
➖ | dsd_uds_500mb_3k_contexts | ingress throughput | -0.87 | [-0.96, -0.77] | 1 | |
➖ | quality_gates_idle_rss | memory utilization | -1.89 | [-1.94, -1.84] | 1 |
Bounds Checks: ✅ Passed
perf | experiment | bounds_check_name | replicates_passed | links |
---|---|---|---|---|
✅ | quality_gates_idle_rss | memory_usage | 10/10 |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector LinksExperiment Result Links
|
Blocklist { data, match_prefix } | ||
} | ||
|
||
fn test(&self, name: String) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call this contains
, since test
is insanely generic/abstract... and this should also just take &str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it but it seems like the self.data.binary_search
takes in &String
so I have to convert it for that.
let metric_name = metric.context().name().deref(); | ||
let mut new_metric_name = metric_name.to_string(); | ||
|
||
if !self.is_excluded(metric_name) { | ||
new_metric_name = format!("{}{}", self.metric_prefix, metric_name); | ||
} | ||
|
||
if self.blocklist.test(new_metric_name.clone()) { | ||
debug!("Metric {} excluded due to blocklist.", new_metric_name); | ||
return None; | ||
} | ||
|
||
// Update metric with new name. | ||
let new_context = metric.context().with_name(new_metric_name); | ||
let existing_context = metric.context_mut(); | ||
*existing_context = new_context; | ||
|
||
Some(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we really want to avoid doing here is allocating or recreating anything if we didn't actually modify anything. So we know that we're only going to modify the metric name if we have a prefix configured, so we should probably have that be our top-level conditional.
if self.metric_prefix.is_empty() {
// just check if this metric is in the blocklist, return Some/None based on that
} else {
// existing logic goes here
}
That gets us away from having to do gymnastics to generate metric_name
/new_metric_name
prior to passing it to self.blocklist.test(..)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to tweak is_excluded
a little bit.
if let Ok(index) = i { | ||
if index > 0 && name.starts_with(&self.data[index - 1]) { | ||
return true; | ||
} | ||
} else if i.is_err() { | ||
let index = i.unwrap_err(); | ||
if index > 0 && name.starts_with(&self.data[index - 1]) { | ||
return true; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use a match
block to extract index
, and then you can deduplicate this and keep it closer to the original:
if self.match_prefix {
let index = i.map_or_else(|idx| idx, |idx| idx);
if index > 0 && name.starts_with(&self.data[index - 1]) {
return true;
}
}
(The map_or_else
bit is just a condensed form of match
where we return the contained value for either side of Result<T, E>
just they're both usize
.)
data.sort(); | ||
|
||
if match_prefix && !data.is_empty() { | ||
let mut i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more important code comment to have here is that by doing this, we're only keeping unique prefixes: that's the crux of the filtering we're doing here and how it's then used in test
.
if self.metric_prefix.is_empty() { | ||
for s in &self.blocklist.data { | ||
if s == metric_name { | ||
return None; | ||
} | ||
} | ||
} else { | ||
if self.is_excluded(metric_name) { | ||
return None; | ||
} | ||
|
||
// Enrich metric with prefix and check if the blocklist contains the new name. | ||
let new_metric_name = format!("{}{}", self.metric_prefix, metric_name); | ||
if self.blocklist.contains(&new_metric_name) { | ||
debug!("Metric {} excluded due to blocklist.", new_metric_name); | ||
return None; | ||
} | ||
|
||
// Update metric with new name. | ||
let new_context = metric.context().with_name(new_metric_name); | ||
let existing_context = metric.context_mut(); | ||
*existing_context = new_context; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, now that things are a little cleaner, I can see where the Datadog Agent code was going. We only want to prefix the metric if self.metric_prefix
is actually set, and if the original metric doesn't match any prefix configured in self.metric_prefix_blacklist
.
As such, we want something like:
let new_metric_name = if self.is_excluded(...) {
metric.context().name().clone()
} else {
MetaString::from(format!(...))
}
That would end up generally being pretty efficient since the metric name should be inlined/interned so we're not allocating in the case where the metric has an excluded prefix.
] | ||
} | ||
|
||
impl NameFilterConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably call this like... DogStatsDPrefixFilterConfiguration
(and DogStatsDPrefixFilter
for the component itself) since NameFilter
doesn't really explain what it's doing (certainly not that it might be prefixing the metrics, at least.)
} | ||
|
||
#[derive(Debug)] | ||
struct Blocklist { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should definitely have some unit tests to make sure we're doing the right thing.
Context
The Datadog Agent supports the ability to block specific metrics ingested through DogStatsD via the
statsd_metric_namespace_blacklist
,statsd_metric_blocklist
, andstatsd_metric_blocklist_match_prefix
configuration settingsSolution
This pr introduces a new
transform
that sits betweendsd_agg
andenrich
. The logic for blocking metrics is copied from the datadog agent code