PageRenderTime 138ms CodeModel.GetById 9ms RepoModel.GetById 0ms app.codeStats 0ms

/doc/administration/database_load_balancing.md

https://gitlab.com/markglenfletcher/gitlab-ee
Markdown | 283 lines | 212 code | 71 blank | 0 comment | 0 complexity | 392d0a087920dfc01e6e3aa87cc75c53 MD5 | raw file
  1. # Database Load Balancing **(PREMIUM ONLY)**
  2. > [Introduced][ee-1283] in [GitLab Premium][eep] 9.0.
  3. Distribute read-only queries among multiple database servers.
  4. ## Overview
  5. Database load balancing improves the distribution of database workloads across
  6. multiple computing resources. Load balancing aims to optimize resource use,
  7. maximize throughput, minimize response time, and avoid overload of any single
  8. resource. Using multiple components with load balancing instead of a single
  9. component may increase reliability and availability through redundancy.
  10. [_Wikipedia article_][wikipedia]
  11. When database load balancing is enabled in GitLab, the load is balanced using
  12. a simple round-robin algorithm, without any external dependencies such as Redis.
  13. Load balancing is not enabled for Sidekiq as this would lead to consistency
  14. problems, and Sidekiq mostly performs writes anyway.
  15. In the following image, you can see the load is balanced rather evenly among
  16. all the secondaries (`db4`, `db5`, `db6`). Because `SELECT` queries are not
  17. sent to the primary (unless necessary), the primary (`db3`) hardly has any load.
  18. ![DB load balancing graph](img/db_load_balancing_postgres_stats.png)
  19. ## Requirements
  20. For load balancing to work you will need at least PostgreSQL 9.2 or newer,
  21. [**MySQL is not supported**][db-req]. You also need to make sure that you have
  22. at least 1 secondary in [hot standby][hot-standby] mode.
  23. Load balancing also requires that the configured hosts **always** point to the
  24. primary, even after a database failover. Furthermore, the additional hosts to
  25. balance load among must **always** point to secondary databases. This means that
  26. you should put a load balance in front of every database, and have GitLab connect
  27. to those load balancers.
  28. For example, say you have a primary (`db1.gitlab.com`) and two secondaries,
  29. `db2.gitlab.com` and `db3.gitlab.com`. For this setup you will need to have 3
  30. load balancers, one for every host. For example:
  31. - `primary.gitlab.com` forwards to `db1.gitlab.com`
  32. - `secondary1.gitlab.com` forwards to `db2.gitlab.com`
  33. - `secondary2.gitlab.com` forwards to `db3.gitlab.com`
  34. Now let's say that a failover happens and db2 becomes the new primary. This
  35. means forwarding should now happen as follows:
  36. - `primary.gitlab.com` forwards to `db2.gitlab.com`
  37. - `secondary1.gitlab.com` forwards to `db1.gitlab.com`
  38. - `secondary2.gitlab.com` forwards to `db3.gitlab.com`
  39. GitLab does not take care of this for you, so you will need to do so yourself.
  40. Finally, load balancing requires that GitLab can connect to all hosts using the
  41. same credentials and port as configured in the
  42. [Enabling load balancing](#enabling-load-balancing) section. Using
  43. different ports or credentials for different hosts is not supported.
  44. ## Use cases
  45. - For GitLab instances with thousands of users and high traffic, you can use
  46. database load balancing to reduce the load on the primary database and
  47. increase responsiveness, thus resulting in faster page load inside GitLab.
  48. ## Enabling load balancing
  49. For the environment in which you want to use load balancing, you'll need to add
  50. the following. This will balance the load between `host1.example.com` and
  51. `host2.example.com`.
  52. **In Omnibus installations:**
  53. 1. Edit `/etc/gitlab/gitlab.rb` and add the following line:
  54. ```ruby
  55. gitlab_rails['db_load_balancing'] = { 'hosts' => ['host1.example.com', 'host2.example.com'] }
  56. ```
  57. 1. Save the file and [reconfigure GitLab][] for the changes to take effect.
  58. ---
  59. **In installations from source:**
  60. 1. Edit `/home/git/gitlab/config/database.yml` and add or amend the following lines:
  61. ```yaml
  62. production:
  63. username: gitlab
  64. database: gitlab
  65. encoding: unicode
  66. load_balancing:
  67. hosts:
  68. - host1.example.com
  69. - host2.example.com
  70. ```
  71. 1. Save the file and [restart GitLab][] for the changes to take effect.
  72. ## Service Discovery
  73. > [Introduced][ee-5883] in [GitLab Premium][eep] 11.0.
  74. Service discovery allows GitLab to automatically retrieve a list of secondary
  75. databases to use, instead of having to manually specify these in the
  76. `database.yml` configuration file. Service discovery works by periodically
  77. checking a DNS A record, using the IPs returned by this record as the addresses
  78. for the secondaries. For service discovery to work, all you need is a DNS server
  79. and an A record containing the IP addresses of your secondaries.
  80. To use service discovery you need to change your `database.yml` configuration
  81. file so it looks like the following:
  82. ```yaml
  83. production:
  84. username: gitlab
  85. database: gitlab
  86. encoding: unicode
  87. load_balancing:
  88. discover:
  89. nameserver: localhost
  90. record: secondary.postgresql.service.consul
  91. record_type: A
  92. port: 8600
  93. interval: 60
  94. disconnect_timeout: 120
  95. ```
  96. Here the `discover:` section specifies the configuration details to use for
  97. service discovery.
  98. ### Configuration
  99. The following options can be set:
  100. | Option | Description | Default |
  101. |----------------------|---------------------------------------------------------------------------------------------------|-----------|
  102. | `nameserver` | The nameserver to use for looking up the DNS record. | localhost |
  103. | `record` | The record to look up. This option is required for service discovery to work. | |
  104. | `record_type` | Optional record type to look up, this can be either A or SRV (since GitLab 12.3) | A |
  105. | `port` | The port of the nameserver. | 8600 |
  106. | `interval` | The minimum time in seconds between checking the DNS record. | 60 |
  107. | `disconnect_timeout` | The time in seconds after which an old connection is closed, after the list of hosts was updated. | 120 |
  108. | `use_tcp` | Lookup DNS resources using TCP instead of UDP | false |
  109. If `record_type` is set to `SRV`, GitLab will continue to use a round-robin algorithm
  110. and will ignore the `weight` and `priority` in the record. Since SRV records usually
  111. return hostnames instead of IPs, GitLab will look for the IPs of returned hostnames
  112. in the additional section of the SRV response. If no IP is found for a hostname, GitLab
  113. will query the configured `nameserver` for ANY record for each such hostname looking for A or AAAA
  114. records, eventually dropping this hostname from rotation if it can't resolve its IP.
  115. The `interval` value specifies the _minimum_ time between checks. If the A
  116. record has a TTL greater than this value, then service discovery will honor said
  117. TTL. For example, if the TTL of the A record is 90 seconds, then service
  118. discovery will wait at least 90 seconds before checking the A record again.
  119. When the list of hosts is updated, it might take a while for the old connections
  120. to be terminated. The `disconnect_timeout` setting can be used to enforce an
  121. upper limit on the time it will take to terminate all old database connections.
  122. Some nameservers (like [Consul][consul-udp]) can return a truncated list of hosts when
  123. queried over UDP. To overcome this issue, you can use TCP for querying by setting
  124. `use_tcp` to `true`.
  125. ### Forking
  126. If you use an application server that forks, such as Unicorn, you _have to_
  127. update your Unicorn configuration to start service discovery _after_ a fork.
  128. Failure to do so will lead to service discovery only running in the parent
  129. process. If you are using Unicorn, then you can add the following to your
  130. Unicorn configuration file:
  131. ```ruby
  132. after_fork do |server, worker|
  133. defined?(Gitlab::Database::LoadBalancing) &&
  134. Gitlab::Database::LoadBalancing.start_service_discovery
  135. end
  136. ```
  137. This will ensure that service discovery is started in both the parent and all
  138. child processes.
  139. ## Balancing queries
  140. Read-only `SELECT` queries will be balanced among all the secondary hosts.
  141. Everything else (including transactions) will be executed on the primary.
  142. Queries such as `SELECT ... FOR UPDATE` are also executed on the primary.
  143. ## Prepared statements
  144. Prepared statements don't work well with load balancing and are disabled
  145. automatically when load balancing is enabled. This should have no impact on
  146. response timings.
  147. ## Primary sticking
  148. After a write has been performed, GitLab will stick to using the primary for a
  149. certain period of time, scoped to the user that performed the write. GitLab will
  150. revert back to using secondaries when they have either caught up, or after 30
  151. seconds.
  152. ## Failover handling
  153. In the event of a failover or an unresponsive database, the load balancer will
  154. try to use the next available host. If no secondaries are available the
  155. operation is performed on the primary instead.
  156. In the event of a connection error being produced when writing data, the
  157. operation will be retried up to 3 times using an exponential back-off.
  158. When using load balancing, you should be able to safely restart a database server
  159. without it immediately leading to errors being presented to the users.
  160. ## Logging
  161. The load balancer logs various events in
  162. [`database_load_balancing.log`](logs.md#database_load_balancinglog-premium-only), such as
  163. - When a host is marked as offline
  164. - When a host comes back online
  165. - When all secondaries are offline
  166. - When a read is retried on a different host due to a query conflict
  167. The log is structured with each entry a JSON object containing at least:
  168. - An `event` field useful for filtering.
  169. - A human-readable `message` field.
  170. - Some event-specific metadata. For example, `db_host`
  171. - Contextual information that is always logged. For example, `severity` and `time`.
  172. For example:
  173. ```json
  174. {"severity":"INFO","time":"2019-09-02T12:12:01.728Z","correlation_id":"abcdefg","event":"host_online","message":"Host came back online","db_host":"111.222.333.444","db_port":null,"tag":"rails.database_load_balancing","environment":"production","hostname":"web-example-1","fqdn":"gitlab.example.com","path":null,"params":null}
  175. ```
  176. ## Handling Stale Reads
  177. > [Introduced][ee-3526] in [GitLab Premium][eep] 10.3.
  178. To prevent reading from an outdated secondary the load balancer will check if it
  179. is in sync with the primary. If the data is determined to be recent enough the
  180. secondary can be used, otherwise it will be ignored. To reduce the overhead of
  181. these checks we only perform these checks at certain intervals.
  182. There are three configuration options that influence this behaviour:
  183. | Option | Description | Default |
  184. |------------------------------|----------------------------------------------------------------------------------------------------------------|------------|
  185. | `max_replication_difference` | The amount of data (in bytes) a secondary is allowed to lag behind when it hasn't replicated data for a while. | 8 MB |
  186. | `max_replication_lag_time` | The maximum number of seconds a secondary is allowed to lag behind before we stop using it. | 60 seconds |
  187. | `replica_check_interval` | The minimum number of seconds we have to wait before checking the status of a secondary. | 60 seconds |
  188. The defaults should be sufficient for most users. Should you want to change them
  189. you can specify them in `config/database.yml` like so:
  190. ```yaml
  191. production:
  192. username: gitlab
  193. database: gitlab
  194. encoding: unicode
  195. load_balancing:
  196. hosts:
  197. - host1.example.com
  198. - host2.example.com
  199. max_replication_difference: 16777216 # 16 MB
  200. max_replication_lag_time: 30
  201. replica_check_interval: 30
  202. ```
  203. [hot-standby]: https://www.postgresql.org/docs/9.6/hot-standby.html
  204. [ee-1283]: https://gitlab.com/gitlab-org/gitlab/merge_requests/1283
  205. [eep]: https://about.gitlab.com/pricing/
  206. [reconfigure gitlab]: restart_gitlab.md#omnibus-gitlab-reconfigure "How to reconfigure Omnibus GitLab"
  207. [restart gitlab]: restart_gitlab.md#installations-from-source "How to restart GitLab"
  208. [wikipedia]: https://en.wikipedia.org/wiki/Load_balancing_(computing)
  209. [db-req]: ../install/requirements.md#database
  210. [ee-3526]: https://gitlab.com/gitlab-org/gitlab/merge_requests/3526
  211. [ee-5883]: https://gitlab.com/gitlab-org/gitlab/merge_requests/5883
  212. [consul-udp]: https://www.consul.io/docs/agent/dns.html#udp-based-dns-queries