-
Notifications
You must be signed in to change notification settings - Fork 581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
otelhttptrace: handle missing getconn hook without panic #5187
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5187 +/- ##
=======================================
- Coverage 64.5% 64.4% -0.1%
=======================================
Files 200 200
Lines 12558 12560 +2
=======================================
- Hits 8103 8101 -2
- Misses 4219 4221 +2
- Partials 236 238 +2
|
This change only handles missing span in Also, I know this is hard to reproduce, and this package doesn't really have any tests at the moment, but would it be possible to write a test? |
What place/variable do you have in mind?
These calls are made from stdlib and I don't know the condition when the specific call order appears. |
I think the main usage of |
We started seeing this as well after introducing the option The original implementation was taking locks at the beginning on the function: func (ct *clientTracer) end(hook string, err error, attrs ...kv.KeyValue) { However since the introduction of It made me wonder, could this have cause some race condition? |
@@ -226,6 +226,10 @@ func (ct *clientTracer) start(hook, spanName string, attrs ...attribute.KeyValue | |||
|
|||
func (ct *clientTracer) end(hook string, err error, attrs ...attribute.KeyValue) { | |||
if !ct.useSpans { | |||
// sometimes end may be called without previous start | |||
if ct.root == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as @tonistiigi mentioned, they've gotten some reports, there's very little consistency on when this happens. We've seen very similar problems with clients very rarely blowing up with this same error.
I'm assuming this is going to be very hard to test because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A unit test doesn't have to be a full reproduction. It can call end()
with conditions that reproduce this case, just to ensure the code path doesn't fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new pull request #5965 which adds unit test for end()
func
This will need a changelog entry. |
We have many reports that end() gets called without the span being defined in start() and causes a panic. Signed-off-by: Tonis Tiigi <[email protected]>
d12420f
to
ded1cd2
Compare
Added changelog |
Hi, just ran into moby/buildkit#4377 . Any update on this PR? I understand only unit test is missing? |
Fork of #5187 updated with main branch and tests, this PR adds nil dereference check for clientTracer.root in `end()` when span events are used instead of sub spans --------- Signed-off-by: Tonis Tiigi <[email protected]> Co-authored-by: Tonis Tiigi <[email protected]> Co-authored-by: Damien Mathieu <[email protected]>
We have many reports that end() gets called without the span being defined in start() and causes a panic.
Ref moby/buildkit#4377
Ref docker/buildx#2232
I have not not fully debugged in what condition this happens in stdlib (I assume some keepalive pool case) but I don't see any other possible explanation for these panic cases.
Note that there are other httptrace hooks that also use
.root
without validation. I don't atm. have any proof that these could be called withoutGetConn()
being called first as well so didn't add extra validation to these.