PageRenderTime 26ms CodeModel.GetById 19ms RepoModel.GetById 0ms app.codeStats 0ms

/doc/development/sql.md

https://gitlab.com/artofhuman/gitlab-ce
Markdown | 287 lines | 216 code | 71 blank | 0 comment | 0 complexity | 2dc2c84be34c382695bf1ab0b2d27b79 MD5 | raw file
  1. # SQL Query Guidelines
  2. This document describes various guidelines to follow when writing SQL queries,
  3. either using ActiveRecord/Arel or raw SQL queries.
  4. ## Using LIKE Statements
  5. The most common way to search for data is using the `LIKE` statement. For
  6. example, to get all issues with a title starting with "WIP:" you'd write the
  7. following query:
  8. ```sql
  9. SELECT *
  10. FROM issues
  11. WHERE title LIKE 'WIP:%';
  12. ```
  13. On PostgreSQL the `LIKE` statement is case-sensitive. On MySQL this depends on
  14. the case-sensitivity of the collation, which is usually case-insensitive. To
  15. perform a case-insensitive `LIKE` on PostgreSQL you have to use `ILIKE` instead.
  16. This statement in turn isn't supported on MySQL.
  17. To work around this problem you should write `LIKE` queries using Arel instead
  18. of raw SQL fragments as Arel automatically uses `ILIKE` on PostgreSQL and `LIKE`
  19. on MySQL. This means that instead of this:
  20. ```ruby
  21. Issue.where('title LIKE ?', 'WIP:%')
  22. ```
  23. You'd write this instead:
  24. ```ruby
  25. Issue.where(Issue.arel_table[:title].matches('WIP:%'))
  26. ```
  27. Here `matches` generates the correct `LIKE` / `ILIKE` statement depending on the
  28. database being used.
  29. If you need to chain multiple `OR` conditions you can also do this using Arel:
  30. ```ruby
  31. table = Issue.arel_table
  32. Issue.where(table[:title].matches('WIP:%').or(table[:foo].matches('WIP:%')))
  33. ```
  34. For PostgreSQL this produces:
  35. ```sql
  36. SELECT *
  37. FROM issues
  38. WHERE (title ILIKE 'WIP:%' OR foo ILIKE 'WIP:%')
  39. ```
  40. In turn for MySQL this produces:
  41. ```sql
  42. SELECT *
  43. FROM issues
  44. WHERE (title LIKE 'WIP:%' OR foo LIKE 'WIP:%')
  45. ```
  46. ## LIKE & Indexes
  47. Neither PostgreSQL nor MySQL use any indexes when using `LIKE` / `ILIKE` with a
  48. wildcard at the start. For example, this will not use any indexes:
  49. ```sql
  50. SELECT *
  51. FROM issues
  52. WHERE title ILIKE '%WIP:%';
  53. ```
  54. Because the value for `ILIKE` starts with a wildcard the database is not able to
  55. use an index as it doesn't know where to start scanning the indexes.
  56. MySQL provides no known solution to this problem. Luckily PostgreSQL _does_
  57. provide a solution: trigram GIN indexes. These indexes can be created as
  58. follows:
  59. ```sql
  60. CREATE INDEX [CONCURRENTLY] index_name_here
  61. ON table_name
  62. USING GIN(column_name gin_trgm_ops);
  63. ```
  64. The key here is the `GIN(column_name gin_trgm_ops)` part. This creates a [GIN
  65. index][gin-index] with the operator class set to `gin_trgm_ops`. These indexes
  66. _can_ be used by `ILIKE` / `LIKE` and can lead to greatly improved performance.
  67. One downside of these indexes is that they can easily get quite large (depending
  68. on the amount of data indexed).
  69. To keep naming of these indexes consistent please use the following naming
  70. pattern:
  71. index_TABLE_on_COLUMN_trigram
  72. For example, a GIN/trigram index for `issues.title` would be called
  73. `index_issues_on_title_trigram`.
  74. Due to these indexes taking quite some time to be built they should be built
  75. concurrently. This can be done by using `CREATE INDEX CONCURRENTLY` instead of
  76. just `CREATE INDEX`. Concurrent indexes can _not_ be created inside a
  77. transaction. Transactions for migrations can be disabled using the following
  78. pattern:
  79. ```ruby
  80. class MigrationName < ActiveRecord::Migration[4.2]
  81. disable_ddl_transaction!
  82. end
  83. ```
  84. For example:
  85. ```ruby
  86. class AddUsersLowerUsernameEmailIndexes < ActiveRecord::Migration[4.2]
  87. disable_ddl_transaction!
  88. def up
  89. return unless Gitlab::Database.postgresql?
  90. execute 'CREATE INDEX CONCURRENTLY index_on_users_lower_username ON users (LOWER(username));'
  91. execute 'CREATE INDEX CONCURRENTLY index_on_users_lower_email ON users (LOWER(email));'
  92. end
  93. def down
  94. return unless Gitlab::Database.postgresql?
  95. remove_index :users, :index_on_users_lower_username
  96. remove_index :users, :index_on_users_lower_email
  97. end
  98. end
  99. ```
  100. ## Plucking IDs
  101. This can't be stressed enough: **never** use ActiveRecord's `pluck` to pluck a
  102. set of values into memory only to use them as an argument for another query. For
  103. example, this will make the database **very** sad:
  104. ```ruby
  105. projects = Project.all.pluck(:id)
  106. MergeRequest.where(source_project_id: projects)
  107. ```
  108. Instead you can just use sub-queries which perform far better:
  109. ```ruby
  110. MergeRequest.where(source_project_id: Project.all.select(:id))
  111. ```
  112. The _only_ time you should use `pluck` is when you actually need to operate on
  113. the values in Ruby itself (e.g. write them to a file). In almost all other cases
  114. you should ask yourself "Can I not just use a sub-query?".
  115. ## Use UNIONs
  116. UNIONs aren't very commonly used in most Rails applications but they're very
  117. powerful and useful. In most applications queries tend to use a lot of JOINs to
  118. get related data or data based on certain criteria, but JOIN performance can
  119. quickly deteriorate as the data involved grows.
  120. For example, if you want to get a list of projects where the name contains a
  121. value _or_ the name of the namespace contains a value most people would write
  122. the following query:
  123. ```sql
  124. SELECT *
  125. FROM projects
  126. JOIN namespaces ON namespaces.id = projects.namespace_id
  127. WHERE projects.name ILIKE '%gitlab%'
  128. OR namespaces.name ILIKE '%gitlab%';
  129. ```
  130. Using a large database this query can easily take around 800 milliseconds to
  131. run. Using a UNION we'd write the following instead:
  132. ```sql
  133. SELECT projects.*
  134. FROM projects
  135. WHERE projects.name ILIKE '%gitlab%'
  136. UNION
  137. SELECT projects.*
  138. FROM projects
  139. JOIN namespaces ON namespaces.id = projects.namespace_id
  140. WHERE namespaces.name ILIKE '%gitlab%';
  141. ```
  142. This query in turn only takes around 15 milliseconds to complete while returning
  143. the exact same records.
  144. This doesn't mean you should start using UNIONs everywhere, but it's something
  145. to keep in mind when using lots of JOINs in a query and filtering out records
  146. based on the joined data.
  147. GitLab comes with a `Gitlab::SQL::Union` class that can be used to build a UNION
  148. of multiple `ActiveRecord::Relation` objects. You can use this class as
  149. follows:
  150. ```ruby
  151. union = Gitlab::SQL::Union.new([projects, more_projects, ...])
  152. Project.from("(#{union.to_sql}) projects")
  153. ```
  154. ## Ordering by Creation Date
  155. When ordering records based on the time they were created you can simply order
  156. by the `id` column instead of ordering by `created_at`. Because IDs are always
  157. unique and incremented in the order that rows are created this will produce the
  158. exact same results. This also means there's no need to add an index on
  159. `created_at` to ensure consistent performance as `id` is already indexed by
  160. default.
  161. ## Use WHERE EXISTS instead of WHERE IN
  162. While `WHERE IN` and `WHERE EXISTS` can be used to produce the same data it is
  163. recommended to use `WHERE EXISTS` whenever possible. While in many cases
  164. PostgreSQL can optimise `WHERE IN` quite well there are also many cases where
  165. `WHERE EXISTS` will perform (much) better.
  166. In Rails you have to use this by creating SQL fragments:
  167. ```ruby
  168. Project.where('EXISTS (?)', User.select(1).where('projects.creator_id = users.id AND users.foo = X'))
  169. ```
  170. This would then produce a query along the lines of the following:
  171. ```sql
  172. SELECT *
  173. FROM projects
  174. WHERE EXISTS (
  175. SELECT 1
  176. FROM users
  177. WHERE projects.creator_id = users.id
  178. AND users.foo = X
  179. )
  180. ```
  181. [gin-index]: http://www.postgresql.org/docs/current/static/gin.html
  182. ## `.find_or_create_by` is not atomic
  183. The inherent pattern with methods like `.find_or_create_by` and
  184. `.first_or_create` and others is that they are not atomic. This means,
  185. it first runs a `SELECT`, and if there are no results an `INSERT` is
  186. performed. With concurrent processes in mind, there is a race condition
  187. which may lead to trying to insert two similar records. This may not be
  188. desired, or may cause one of the queries to fail due to a constraint
  189. violation, for example.
  190. Using transactions does not solve this problem.
  191. The following pattern should be used to avoid the problem:
  192. ```ruby
  193. Project.transaction do
  194. begin
  195. User.find_or_create_by(username: "foo")
  196. rescue ActiveRecord::RecordNotUnique
  197. retry
  198. end
  199. end
  200. ```
  201. If the above block is run inside a transaction and hits the race
  202. condition, the transaction is aborted and we cannot simply retry (any
  203. further queries inside the aborted transaction are going to fail). We
  204. can employ [nested transactions](http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html#module-ActiveRecord::Transactions::ClassMethods-label-Nested+transactions)
  205. here to only rollback the "inner transaction". Note that `requires_new: true` is required here.
  206. ```ruby
  207. Project.transaction do
  208. begin
  209. User.transaction(requires_new: true) do
  210. User.find_or_create_by(username: "foo")
  211. end
  212. rescue ActiveRecord::RecordNotUnique
  213. retry
  214. end
  215. end
  216. ```