Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thanos-query-frontend: Enable Thanos Query Stats Propagation & cache response headers #10

Open
wants to merge 1 commit into
base: monzo-master-v0.35.0-rc-0.65
Choose a base branch
from

Conversation

milesbxf
Copy link

@milesbxf milesbxf commented Nov 15, 2024

Thanos Query gives us some really nice detailed internal stats - but thanos-query-frontend annoyingly blats them by trying to parse the response as a Prometheus response, which has different fields and a different structure.

In addition, thanos-query-frontend doesn't even pass the &stats=all parameter through if you try to request it.

This change fixes this by propagating the stats request parameter to the downstream, and decoding/encoding the Thanos Query stats structure properly.

As an extra, we also set a response header if we get a cache hit so that upstreams can use this.

@milesbxf milesbxf force-pushed the return-stats-response branch 2 times, most recently from a5a0575 to 11d09c7 Compare November 15, 2024 15:47
@milesbxf milesbxf changed the title Return stats response thanos-query-frontend: Enable Thanos Query Stats Propagation & cache response headers Nov 15, 2024
@milesbxf milesbxf force-pushed the return-stats-response branch 9 times, most recently from acee5c0 to de9c1b0 Compare November 18, 2024 09:12
Thanos Query gives us some really nice detailed internal stats - but thanos-query-frontend annoyingly blats them by trying to parse the response as a Prometheus response, which has different fields and a different structure.

In addition, thanos-query-frontend doesn't even pass the &stats=all parameter through if you try to request it.

This change fixes this by propagating the stats request parameter to the downstream, and decoding/encoding the Thanos Query stats structure properly.

As an extra, we also set a response header if we get a cache hit so that upstreams can use this.

Signed-off-by: milesbryant <[email protected]>
@milesbxf milesbxf force-pushed the return-stats-response branch from de9c1b0 to 64eb68e Compare November 18, 2024 09:18
@@ -145,8 +145,6 @@ func registerQueryFrontend(app *extkingpin.App) {

cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexHandlerConfig.LogQueriesLongerThan)
cmd.Flag("query-frontend.query-stats-enabled", "True to enable query statistics tracking. "+
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this now that we return query stats directly in the response

@@ -84,12 +84,22 @@ message Matrix {
}

message PrometheusResponseStats {
PrometheusResponseSamplesStats samples = 1 [(gogoproto.jsontag) = "samples"];
}
message Timings {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the difference between Thanos stats and Prometheus stats responses

@milesbxf milesbxf marked this pull request as ready for review November 18, 2024 09:19
for _, existing := range headers {
if existing.Name == newHeader.Name {
// if headers match, overwrite with the new header
existing = newHeader
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be *existing = *newHeader ?

Also, below we sort the responses by response time and then the first non-null explanation is taken rather than the last one. Is consistency between these important?

func (resp *PrometheusInstantQueryResponse) AddHeader(key, value string) {
resp.Headers = append(resp.Headers, &PrometheusResponseHeader{Name: key, Values: []string{value}})
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll end up with duplicate headers if the same header key is added twice - should this check for the header already existing and either replace it or append to its values?

@@ -217,9 +227,25 @@ func (prometheusCodec) MergeResponse(_ Request, responses ...Response) (Response
// we need to pass on all the headers for results cache gen numbers.
var resultsCacheGenNumberHeaderValues []string

var headers []*PrometheusResponseHeader
// merge headers
for _, res := range responses {
promResponses = append(promResponses, res.(*PrometheusResponse))
resultsCacheGenNumberHeaderValues = append(resultsCacheGenNumberHeaderValues, getHeaderValuesWithName(res, ResultsCacheGenNumberHeaderName)...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this matters given what was happening before, but this could be out of sync with the actual headers we return in the response due to the merging below.

// Copy Prometheus headers into http response
for _, h := range a.Headers {
for _, v := range h.Values {
if strings.HasPrefix(h.Name, "X-") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the headers always upper case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants