Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add monitorType JMX #472

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alaturqua
Copy link
Contributor

Description

This pull request introduces a new monitorType for JMX-based monitoring in Trino Gateway. It includes documentation and configuration details for retrieving cluster information using specific JMX metrics.

Details

  • Component: JMX Monitoring in Trino Gateway

  • New Feature: Added support for monitorType: JMX to enable cluster monitoring via JMX metrics.

  • Metrics Used:

    • trino.execution:name=QueryManager
    • trino.metadata:name=DiscoveryNodeManager
  • Configuration Instructions:

    • Users can configure a username and password using backendState.
    • Credentials must be valid across all clusters with read rights on system_information.
  • Example Configuration:

    backendState:
      username: "user"
      password: "password"
    clusterStatsConfiguration:
      monitorType: JMX

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* 

@cla-bot cla-bot bot added the cla-signed label Sep 17, 2024
@alaturqua
Copy link
Contributor Author

I am not very experienced with Java and Testing in Java. Please feel free to add tests.

@alaturqua alaturqua requested a review from mosabua September 17, 2024 23:49
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from 512ce06 to 82cf53d Compare September 17, 2024 23:59
@alaturqua alaturqua changed the title Add monitorType JMX via trino's /v1/jmx/mbean endpoint Add monitorType JMX Sep 17, 2024
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from 82cf53d to 40cb273 Compare September 18, 2024 00:02
@ebyhr ebyhr requested a review from willmostly September 24, 2024 05:00
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from 40cb273 to 5389a0d Compare September 25, 2024 07:27
@alaturqua alaturqua requested a review from ebyhr September 25, 2024 07:29
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just skimmed.

@alaturqua
Copy link
Contributor Author

Thanks for the reviews. I was on vacation and didn't have time to implement the requested changes yet. I will take care of them in the next days.

I might need some support to activate the /v1/jmx/mbean endpoint in the test environment for testing.

@willmostly
Copy link
Contributor

@alaturqua please ping me when this is ready for another look

@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch 5 times, most recently from fb4ac57 to b333076 Compare November 17, 2024 21:46
@alaturqua alaturqua closed this Nov 17, 2024
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from b333076 to a11e5a4 Compare November 17, 2024 21:49
@alaturqua
Copy link
Contributor Author

I had to sync my fork to be able to change my PR. But it got automatically closed, when I did so.
I am reopening.

@alaturqua alaturqua reopened this Nov 17, 2024
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from 6aebdfb to 995229a Compare November 17, 2024 22:23
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch 3 times, most recently from a79acde to 7671694 Compare November 20, 2024 09:07
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch 2 times, most recently from 4fb7de5 to 847a945 Compare November 20, 2024 18:10
@alaturqua alaturqua requested review from andythsu and ebyhr November 20, 2024 18:21
@alaturqua
Copy link
Contributor Author

I addressed all the points and made the changes. Tests are passing as well. Please review again.

@alaturqua
Copy link
Contributor Author

@willmostly added documentation as well.

@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch 4 times, most recently from 145f0ca to 39585df Compare December 9, 2024 07:18
@alaturqua alaturqua force-pushed the ii-add-monitortype-jmx branch from 39585df to 0abf92c Compare December 11, 2024 09:04
This is using `v1/jmx/mbean` endpoint on trino clusters.
To enable this, [JMX monitoring](https://trino.io/docs/current/admin/jmx.html) must be activated on all Trino clusters.

Ensure that a username and password are configured by adding the `backendState` section to your configuration. The credentials must be consistent across all backend clusters and have `read` rights on the `system_information`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and have read rights on the system_information.
Is there anywhere in the Trino documentation we could link to that gives an example of how to set this up? Or we could provide a snippet of FBAC configuration

Comment on lines +132 to +148
Request preparedRequest = prepareGet()
.setUri(uriBuilderFrom(URI.create(jmxUrl))
.appendPath(JMX_PATH)
.appendPath(mbeanName)
.build()
).addHeader("X-Trino-User", username)
.build();

boolean isHttps = preparedRequest.getUri().getScheme().equalsIgnoreCase("https");

if (isHttps) {
HttpRequestFilter filter = new BasicAuthRequestFilter(username, password);
request = filter.filterRequest(preparedRequest);
}
else {
request = preparedRequest;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a method to create the basic auth header. I'm not sure if this is the intended use case for the HttpRequestFilter classes. Then you can just

        Request preparedRequest = prepareGet()
                .setUri(uriBuilderFrom(URI.create(jmxUrl))
                        .appendPath(JMX_PATH)
                        .appendPath(mbeanName)
                        .build())
               .addHeader("X-Trino-User", username);
        if (isHttps) {
           preparedRequest.addHeader(createBasicAuthHeader(username, password));
        }
        preparedRequest.build();

and use preparedRequest directly.

JmxStatProcessor processor, ClusterStats.Builder clusterStats)
{
queryJmx(backend, mbeanName)
.ifPresent(response -> processor.process(response, clusterStats));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryJmx may return Optional.empty() in the case of an error response. In this case ifPresent will not trigger, and the backend stats will not be updated, so the backend will still appear healthy. You can avoid this by doing something like

Suggested change
.ifPresent(response -> processor.process(response, clusterStats));
.ifPresentOrElse((response -> processor.process(response, clusterStats), () -> clusterStats.trinoStatus(TrinoStatus.UNHEALTHY));

implements ClusterStatsMonitor
{
private static final Logger log = Logger.get(ClusterStatsJmxMonitor.class);
private static final JsonResponseHandler<JsonNode> JMX_JSON_RESPONSE_HANDLER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a pair of record classes for the JMX response and the attributes items, and use this instead of the generic JsonNode here and in processDiscoveryNodeManagerStats and processQueryManagerStats

@@ -61,6 +62,12 @@ void testJdbcMonitor()
testClusterStatsMonitor(backendStateConfiguration -> new ClusterStatsJdbcMonitor(backendStateConfiguration, new MonitorConfiguration()));
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests for the failure cases, 400 and 500 response codes with non json and invalid json response bodies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants