Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and detect race condition in clienttrace.go #5700

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gregorynisbet-google
Copy link

ct.root can be updated without holding the mutex on ct. Also, addEvent can be called on a nil ct.root.

Fix both of these issues by moving the mutex acquisition logic so that ct.root is only touched while the mutex is held.

@gregorynisbet-google gregorynisbet-google requested a review from a team June 1, 2024 01:52
Copy link

linux-foundation-easycla bot commented Jun 1, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@dmathieu dmathieu added the blocked: CLA Waiting on CLA to be signed before progress can be made label Jun 3, 2024
if !ct.useSpans {
if err != nil {
attrs = append(attrs, attribute.String(hook+".error", err.Error()))
}
if ct.root == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the conditions of this be tested?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with this codebase, I just tried to fix a race condition that I could repro locally.

What should I test instead of checking whether ct.root is nil?

In my latest revision, I moved the test for ct.root being nil to the top of the function and am successfully exiting early if it is nil.

I'm not sure that's right. I'm operating under the assumption that most functions that call events on ct.root cannot do useful work until ct.root is set and should just move on.

for i := 1; i < workers; i++ {
wg.Add(1)
go func() {
resp, err := client.Get("https://example.com")
Copy link
Member

@dmathieu dmathieu Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn use httptest, not make an external network call.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still won't work without an active network / if example.com is unavailable.

@gregorynisbet-google gregorynisbet-google force-pushed the 2024-05-31-fix-race-condition branch 2 times, most recently from 52345f4 to eb1dc63 Compare June 11, 2024 16:01
@gregorynisbet-google
Copy link
Author

Friendly ping.

ct.root can be updated without holding the mutex on ct. Also, addEvent
can be called on a nil ct.root.

Fix both of these issues by moving the mutex acquisition logic so that
ct.root is only touched while the mutex is held.
@gregorynisbet-google gregorynisbet-google force-pushed the 2024-05-31-fix-race-condition branch from 4994805 to 647b6f3 Compare June 20, 2024 21:04
@MrAlias MrAlias removed the blocked: CLA Waiting on CLA to be signed before progress can be made label Jun 21, 2024

// TestNewClientParallelismWithoutSubspans tests running many Gets on a client simultaneously,
// which would trigger a race condition if root were not protected by a mutex.
func TestNewClientParallelismWithoutSubspans(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes when run against main:

$ go test -race -count=20 .
ok  	go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace	5.443s

It does not seem to be testing the fix being applied here.


var wg sync.WaitGroup

for i := 1; i < 10000; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of goroutines being spawned here. What's the reasoning behind this? It is making this test quite resource intensive.

Comment on lines +220 to +222
if ct.root == nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is what is breaking

func TestEndBeforeStartCreatesSpan(t *testing.T) {
sr := tracetest.NewSpanRecorder()
tp := trace.NewTracerProvider(trace.WithSpanProcessor(sr))
otel.SetTracerProvider(tp)
ct := otelhttptrace.NewClientTrace(context.Background())
ct.DNSDone(httptrace.DNSDoneInfo{})
ct.DNSStart(httptrace.DNSStartInfo{Host: "example.com"})
name := "http.dns"
spans := getSpansFromRecorder(sr, name)
require.Len(t, spans, 1)
}

https://github.com/open-telemetry/opentelemetry-go-contrib/actions/runs/9620444944/job/26539940748?pr=5700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants