Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create propagate span event and publish it in to_digest #4193

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ZStriker19
Copy link
Contributor

@ZStriker19 ZStriker19 commented Dec 4, 2024

What does this PR do?
Fixes the circular import error caused by trying to add the tracer to the trace_operation object by using events from the trace_operation instead.

Changes order of sampling so that the trace sampler runs before the span sampler.

In order to include a method that creates a TraceDigest but does not sample (used for otel shim) we created to_digest_without_propagate, and also aliased to_digest which does sample with propagate! which we will move to pushing users towards going forwards. In the future the plan will be to remove to_digest_without_propagate and remove sampling from to_digest. There fore just having propagate! which samples and returns a TraceDigest and to_digest which just returns a TraceDigest.

Motivation:

Change log entry

Yes. Fix circular import in TraceOperation.

Additional Notes:

How to test the change?

Copy link

github-actions bot commented Dec 4, 2024

Thank you for updating Change log entry section 👏

Visited at: 2024-12-05 12:04:27 UTC

@datadog-datadog-prod-us1
Copy link
Contributor

datadog-datadog-prod-us1 bot commented Dec 4, 2024

Datadog Report

Branch report: zachg/fix_circ_import_for_lazy_sampling
Commit report: eba85d9
Test service: dd-trace-rb

✅ 0 Failed, 22114 Passed, 1475 Skipped, 5m 41.75s Total Time

@pr-commenter
Copy link

pr-commenter bot commented Dec 4, 2024

Benchmarks

Benchmark execution time: 2024-12-16 18:57:27

Comparing candidate commit 5a86486 in PR branch zachg/fix_circ_import_for_lazy_sampling with baseline commit 6f2057b in branch master.

Found 0 performance improvements and 2 performance regressions! Performance is the same for 29 metrics, 2 unstable metrics.

scenario:profiler - sample timeline=false

  • 🟥 throughput [-0.566op/s; -0.551op/s] or [-8.603%; -8.376%]

scenario:tracing - trace.to_digest

  • 🟥 throughput [-19038.876op/s; -18317.048op/s] or [-11.529%; -11.092%]

@ZStriker19 ZStriker19 marked this pull request as ready for review December 4, 2024 22:48
@ZStriker19 ZStriker19 requested a review from a team as a code owner December 4, 2024 22:48
@ZStriker19 ZStriker19 requested a review from delner December 4, 2024 22:50
@codecov-commenter
Copy link

codecov-commenter commented Dec 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.75%. Comparing base (11b9ae1) to head (eba85d9).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4193      +/-   ##
==========================================
+ Coverage   97.74%   97.75%   +0.01%     
==========================================
  Files        1355     1355              
  Lines       82333    82362      +29     
  Branches     4226     4228       +2     
==========================================
+ Hits        80477    80515      +38     
+ Misses       1856     1847       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very much in the style of what we discussed; off to a great start.

There are some questions about the design (parameters, naming, idempotency, etc) that I'd like to answer.

I'd also like to see a lot more tests around this new behavior, that affirms a good answer to all the questions above, e.g.:

  • Unit tests, including but not limited to...
    • Does the trace pass correct parameters to the propagate event block?
    • Does triggering the propagation event with different sampling priority values cause sample to run properly?
  • Feature tests, including but not limited to...
    • When HTTP is injected/log correlation is run/trace is manually continued across execution contexts with continue_from, does is trigger the propagation event properly?

lib/datadog/tracing/trace_operation.rb Outdated Show resolved Hide resolved
lib/datadog/tracing/tracer.rb Outdated Show resolved Hide resolved
lib/datadog/tracing/tracer.rb Outdated Show resolved Hide resolved
@@ -311,7 +311,7 @@ def to_digest
span_id = @active_span && @active_span.id
span_id ||= @parent_span_id unless finished?
# sample the trace_operation with the tracer
@tracer&.sample_trace(self) unless sampling_priority
events.propagate.publish(@active_span, self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It dawned on me, I don't think to_digest is just for propagation, it may also be used for log correlations. My concern is that if we print a log message with correlation behavior on, say at the beginning of a web request, then the sampling decision will actually be made very early (based only off the root span in the worst case.)

Should check this isn't the case...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I couldn't find it being used in log correlation, however we do use to_digest for opentelemetry here: https://github.com/DataDog/dd-trace-rb/blob/master/lib/datadog/opentelemetry/trace.rb#L20-L33 Which seems like it could be an issue if I'm understanding the code correctly because this basically runs to_digest every time we create an opentelemetry context? Let me know if I'm off here.

@ZStriker19
Copy link
Contributor Author

  • Does triggering the propagation event with different sampling priority values cause sample to run properly?

Not sure I understand what you mean by this one. The propagation event just triggers to_digest to be run, which will trigger the event in tracer.py to run which will run sampling if there's not already a sampling priority set.

@ZStriker19 ZStriker19 requested review from a team as code owners December 18, 2024 19:21
@github-actions github-actions bot added integrations Involves tracing integrations otel OpenTelemetry-related changes labels Dec 18, 2024
@ZStriker19 ZStriker19 force-pushed the zachg/fix_circ_import_for_lazy_sampling branch from 7986f9a to 7396654 Compare December 18, 2024 19:24
@ZStriker19 ZStriker19 force-pushed the zachg/fix_circ_import_for_lazy_sampling branch from 962be02 to eba85d9 Compare December 18, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Involves tracing integrations otel OpenTelemetry-related changes tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants