PageRenderTime 946ms CodeModel.GetById 31ms RepoModel.GetById 4ms app.codeStats 0ms

/content/posts/dynamic-resource-groups.md

https://github.com/posulliv/posulliv.github.com
Markdown | 198 lines | 170 code | 28 blank | 0 comment | 0 complexity | b37fc3828d447c76c1958995c0faabf1 MD5 | raw file
  1. ---
  2. title: "Updating resource groups without restarting Trino"
  3. date: 2022-03-01T10:25:06-05:00
  4. draft: false
  5. ---
  6. [Resource groups](https://trino.io/docs/current/admin/resource-groups.html) is
  7. an admission control feature in Trino. Typically, this is configured by storing
  8. the resource groups configuration in a JSON file and telling Trino where to read
  9. this file from. Any updates to the JSON file are not reflected in Trino until
  10. the cluster is restarted.
  11. Recently, I was working on a cluster that had a requirement to be able to update
  12. resource group defintions without restarting the cluster. Trino does have support
  13. for a database-based resource group manager. This means Trino will load the
  14. resource group definitions from a relational database instead of a JSON file. The
  15. supported databases are MySQL, PostgreSQL, and Oracle (in versions prior to 369,
  16. only MySQL is supported).
  17. To configure Trino to use a database-based resource group manager, add an
  18. `etc/resource-groups.properties` file. Here is an example of the file contents
  19. when using a MySQL database for storing resource group definitions:
  20. ```
  21. resource-groups.configuration-manager=db
  22. resource-groups.config-db-url=jdbc:mysql://localhost:3306/resource_groups
  23. resource-groups.config-db-user=trino
  24. resource-groups.config-db-password=trino
  25. ```
  26. The resource groups are configured through tables
  27. `resource_groups_global_properties`, `resource_groups`, and `selectors`.
  28. If any of the tables are not present when Trino starts, they will be created
  29. automatically.
  30. The rules in the `selectors` table are processed in descending order of the
  31. values in the `priority` field.
  32. The `resource_groups` table also contains an `environment` field which is
  33. matched with the value contained in the `node.environment` property in
  34. `node.properties`. This allows the resource group configuration for different
  35. Trino clusters to be stored in the same database.
  36. The configuration is reloaded from the database every second, and the changes
  37. are reflected automatically for incoming queries.
  38. Once Trino is configured to use a database resource group manager, you will
  39. see something like the following in the `server.log` file on startup:
  40. ```
  41. 2022-02-03T20:20:09.623-0500 INFO main io.trino.execution.resourcegroups.InternalResourceGroupManager -- Loading resource group configuration manager --
  42. 2022-02-03T20:20:09.818-0500 INFO main io.trino.plugin.resourcegroups.db.FlywayMigration Performing migrations...
  43. 2022-02-03T20:20:10.173-0500 INFO main org.flywaydb.core.internal.license.VersionPrinter Flyway Community Edition 7.15.0 by Redgate
  44. 2022-02-03T20:20:10.174-0500 INFO main org.flywaydb.core.internal.database.base.BaseDatabaseType Database: jdbc:mysql://localhost:3306/resource_groups (MySQL 8.0)
  45. 2022-02-03T20:20:10.263-0500 INFO main org.flywaydb.core.internal.command.DbValidate Successfully validated 4 migrations (execution time 00:00.045s)
  46. 2022-02-03T20:20:10.522-0500 INFO main org.flywaydb.core.internal.schemahistory.JdbcTableSchemaHistory Creating Schema History table `resource_groups`.`flyway_schema_history` ...
  47. 2022-02-03T20:20:10.669-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Current version of schema `resource_groups`: << Empty Schema >>
  48. 2022-02-03T20:20:10.681-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Migrating schema `resource_groups` to version "1 - add resource groups global properties"
  49. 2022-02-03T20:20:10.760-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Migrating schema `resource_groups` to version "2 - add resource groups"
  50. 2022-02-03T20:20:10.847-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Migrating schema `resource_groups` to version "3 - add selectors"
  51. 2022-02-03T20:20:10.931-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Migrating schema `resource_groups` to version "4 - add exact match source selectors"
  52. 2022-02-03T20:20:11.093-0500 INFO main org.flywaydb.core.internal.command.DbMigrate Successfully applied 4 migrations to schema `resource_groups`, now at version v4 (execution time 00:00.435s)
  53. 2022-02-03T20:20:11.104-0500 INFO main io.trino.plugin.resourcegroups.db.FlywayMigration Performed 4 migrations
  54. ```
  55. Notice the messages related to migrations. This means Trino created the
  56. necessary tables in MySQL because they did not exist. Now the correct schema
  57. exists but there are no resource groups configured.
  58. If you try to run a query in Trino now, you will get an error:
  59. ```
  60. trino> show catalogs;
  61. Query 20220204_012217_00000_ajzng failed: No selectors are configured
  62. trino>
  63. ```
  64. We will use a tool I put together named [trino-db-resource-groups-cli](https://github.com/posulliv/trino-db-resource-groups-cli)
  65. that can take a JSON file with a resource groups defined and load them into a
  66. database (see the README for instructions on how to install the tool). For
  67. example, assume we have a JSON file with the following contents:
  68. ```
  69. {
  70. "rootGroups": [
  71. {
  72. "name": "global",
  73. "softMemoryLimit": "95%",
  74. "hardConcurrencyLimit": 100,
  75. "maxQueued": 1000,
  76. "subGroups": [
  77. {
  78. "name": "adhoc",
  79. "softMemoryLimit": "50%",
  80. "hardConcurrencyLimit": 50,
  81. "maxQueued": 100,
  82. "hardCpuLimit": "10h",
  83. "subGroups": [
  84. {
  85. "name": "adhoc-${USER}",
  86. "softMemoryLimit": "30%",
  87. "hardConcurrencyLimit": 10,
  88. "maxQueued": 10
  89. }
  90. ]
  91. }
  92. ]
  93. },
  94. {
  95. "name": "admin",
  96. "softMemoryLimit": "100%",
  97. "hardConcurrencyLimit": 500,
  98. "maxQueued": 100
  99. }
  100. ],
  101. "selectors": [
  102. {
  103. "user": "bob",
  104. "group": "admin"
  105. },
  106. {
  107. "user": "verifier",
  108. "group": "global.adhoc"
  109. },
  110. {
  111. "source": "jdbc#(?<toolname>.*)",
  112. "clientTags": ["hipri", "urgent"],
  113. "group": "global.adhoc.adhoc-${USER}"
  114. },
  115. {
  116. "group": "global.adhoc.adhoc-${USER}"
  117. }
  118. ],
  119. "cpuQuotaPeriod": "1h"
  120. }
  121. ```
  122. We run `trino-db-resource-groups-cli` to take this JSON and populate the database
  123. tables for a specific environment. When running the tool, you will see output like:
  124. ```
  125. $ trino-db-resource-groups-cli create_resource_groups --db-config=resource-groups.properties --resource-groups-json=example.json --environment=test
  126. 2022-02-03T20:23:34.925-0500 INFO main io.airlift.log.Logging Logging to stderr
  127. 2022-02-03T20:23:34.928-0500 INFO main Bootstrap Loading configuration
  128. 2022-02-03T20:23:35.161-0500 INFO main Bootstrap Initializing logging
  129. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
  130. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap resource-groups.config-db-password [REDACTED] [REDACTED] Database password
  131. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap resource-groups.config-db-url ---- jdbc:mysql://localhost:3306/resource_groups
  132. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap resource-groups.config-db-user ---- trino Database user name
  133. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap resource-groups.exact-match-selector-enabled false false
  134. 2022-02-03T20:23:35.401-0500 INFO main Bootstrap resource-groups.max-refresh-interval 1.00h 1.00h Time period for which the cluster will continue to accept queries after refresh failures cause configuration to become stale
  135. 2022-02-03T20:23:35.605-0500 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle starting...
  136. 2022-02-03T20:23:35.605-0500 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle started
  137. 2022-02-03T20:23:35.606-0500 INFO main io.trino.resourcegroups.db.CreateResourceGroupsCommand Environment to update resource groups for: test
  138. 2022-02-03T20:23:35.606-0500 INFO main io.trino.resourcegroups.db.CreateResourceGroupsCommand Input JSON file: example.json
  139. 2022-02-03T20:23:36.748-0500 INFO main io.trino.resourcegroups.db.CreateResourceGroupsCommand Resource groups created successfully
  140. 2022-02-03T20:23:36.749-0500 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle stopping...
  141. 2022-02-03T20:23:36.749-0500 INFO main io.airlift.bootstrap.LifeCycleManager Life cycle stopped
  142. $
  143. ```
  144. The tool validates the resource groups defined in the JSON file before loading them
  145. into the database tables. If there is any invalid configuration in the input JSON,
  146. the CLI will throw an error and not update the database tables.
  147. Now if we look at the Trino `server.log` file, we see the resource groups have
  148. been loaded from the database automatically:
  149. ```
  150. 2022-02-03T20:23:37.198-0500 INFO DbResourceGroupConfigurationManager io.trino.plugin.resourcegroups.db.DbResourceGroupConfigurationManager Resource group spec global changed to ResourceGroupSpec{name=global, softMemoryLimit=Optional.empty, maxQueued=1000, softConcurrencyLimit=Optional.empty, hardConcurrencyLimit=100, schedulingPolicy=Optional.empty, schedulingWeight=Optional.empty, jmxExport=Optional[false], softCpuLimit=Optional.empty, hardCpuLimit=Optional.empty}
  151. 2022-02-03T20:23:37.199-0500 INFO DbResourceGroupConfigurationManager io.trino.plugin.resourcegroups.db.DbResourceGroupConfigurationManager Resource group spec admin changed to ResourceGroupSpec{name=admin, softMemoryLimit=Optional.empty, maxQueued=100, softConcurrencyLimit=Optional.empty, hardConcurrencyLimit=500, schedulingPolicy=Optional.empty, schedulingWeight=Optional.empty, jmxExport=Optional[false], softCpuLimit=Optional.empty, hardCpuLimit=Optional.empty}
  152. 2022-02-03T20:23:37.199-0500 INFO DbResourceGroupConfigurationManager io.trino.plugin.resourcegroups.db.DbResourceGroupConfigurationManager Resource group spec global.adhoc.adhoc-${USER} changed to ResourceGroupSpec{name=adhoc-${USER}, softMemoryLimit=Optional.empty, maxQueued=10, softConcurrencyLimit=Optional.empty, hardConcurrencyLimit=10, schedulingPolicy=Optional.empty, schedulingWeight=Optional.empty, jmxExport=Optional[false], softCpuLimit=Optional.empty, hardCpuLimit=Optional.empty}
  153. 2022-02-03T20:23:37.199-0500 INFO DbResourceGroupConfigurationManager io.trino.plugin.resourcegroups.db.DbResourceGroupConfigurationManager Resource group spec global.adhoc changed to ResourceGroupSpec{name=adhoc, softMemoryLimit=Optional.empty, maxQueued=100, softConcurrencyLimit=Optional.empty, hardConcurrencyLimit=50, schedulingPolicy=Optional.empty, schedulingWeight=Optional.empty, jmxExport=Optional[false], softCpuLimit=Optional.empty, hardCpuLimit=Optional[10.00h]}
  154. ```
  155. If we try to run a query now, we will see it is successful:
  156. ```
  157. trino> show catalogs;
  158. Catalog
  159. --------------
  160. blackhole
  161. druid
  162. Query 20220204_012425_00001_ajzng, FINISHED, 1 node
  163. Splits: 11 total, 11 done (100.00%)
  164. 0.94 [0 rows, 0B] [0 rows/s, 0B/s]
  165. trino>
  166. ```
  167. We can verify in the web UI that the query was assigned to the correct resource group:
  168. ![Trino web UI](/img/trino_dynamic_resource_groups_post.png)
  169. That's all I wanted to cover in this post. If you have any questions or run
  170. into issues when trying this, you can find me on Trino's [slack](https://trino.io/slack.html).