PageRenderTime 59ms CodeModel.GetById 31ms RepoModel.GetById 0ms app.codeStats 0ms

/research/ajax_search.md

https://github.com/sophsec/sophsec.github.io
Markdown | 245 lines | 211 code | 34 blank | 0 comment | 0 complexity | 940b8020ed1e749162b81b980dc55076 MD5 | raw file
  1. ---
  2. layout: page
  3. title: Exploring the Google AJAX Search API
  4. ---
  5. # Exploring the Google AJAX Search API
  6. ## Authors
  7. * [postmodern](mailto:postmodern at sophsec.com)
  8. * [cd](mailto:cd at sophsec.com)
  9. ## Introduction
  10. The [Google AJAX Search API](http://code.google.com/apis/ajaxsearch/)
  11. is a Javascript library that allows web-developers to embed Google Search
  12. dialogues into web-pages. The API utilizes a publically exposed RESTful
  13. web-service which returns the search results in
  14. [JSON (Javascript Serialization Object Notation)][http://www.json.org/]
  15. format. This fact makes the API particularly easy to interface with in
  16. non-Javascript environments.
  17. The API provides the ability for performing Search queries against a number
  18. of Google services. Such services currently include
  19. [Web Search](http://www.google.com/),
  20. [Local Search](http://maps.google.com/),
  21. [Video Search](http://video.google.com/),
  22. [Blog Search](http://blogsearch.google.com/),
  23. [News Search](http://news.google.com/),
  24. [Book Search](http://books.google.com/) and
  25. [Image Search](http://images.google.com/).
  26. In this document we will be focusing only on the
  27. [Web Search](http://code.google.com/apis/ajaxsearch/web.html)
  28. functionality of the API.
  29. ## Disecting the RESTful URLs
  30. We can begin by disecting the
  31. [Web Search](http://code.google.com/apis/ajaxsearch/web.html)
  32. functionality of the API. We will be specifically looking at the RESTful
  33. URLs the API is using and howto construct our own. The example we will begin
  34. disecting is the
  35. [Simple Search Box](http://www.google.com/uds/samples/apidocs/helloworld.html).
  36. Immediately we can see that the Simple Search Box returns results from
  37. Local, Web, Video, Blog, News, Image and Book Search Services. But where
  38. are these results being requested from, we will use
  39. [Tamper Data](https://addons.mozilla.org/en-US/firefox/addon/966)
  40. to find which URLs are actually being requested. Tamper Data shows that the following seven AJAX requests are being made which return text/javascript.
  41. * [http://www.google.com/uds/GlocalSearch?callback=google.search.LocalSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&sll=40.71453,-74.00713&sspn=0.23791,0.30675&gll=40682767,-74038892,40746292,-73975367&llsep=500,500&key=notsupplied&v=1.0](http://www.google.com/uds/GlocalSearch?callback=google.search.LocalSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&sll=40.71453,-74.00713&sspn=0.23791,0.30675&gll=40682767,-74038892,40746292,-73975367&llsep=500,500&key=notsupplied&v=1.0)
  42. * [http://www.google.com/uds/GwebSearch?callback=google.search.WebSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GwebSearch?callback=google.search.WebSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0)
  43. * [http://www.google.com/uds/GvideoSearch?callback=google.search.VideoSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GvideoSearch?callback=google.search.VideoSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0)
  44. * [http://www.google.com/uds/GblogSearch?callback=google.search.BlogSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GblogSearch?callback=google.search.BlogSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0)
  45. * [http://www.google.com/uds/GnewsSearch?callback=google.search.NewsSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GnewsSearch?callback=google.search.NewsSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0)
  46. * [http://www.google.com/uds/GimageSearch?callback=google.search.ImageSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GimageSearch?callback=google.search.ImageSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0).
  47. * [http://www.google.com/uds/GbookSearch?callback=google.search.BookSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0](http://www.google.com/uds/GbookSearch?callback=google.search.BookSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0)
  48. Judging from the paths of the URLs the second URL is the one which returns
  49. the Web Search results.
  50. http://www.google.com/uds/GwebSearch?callback=google.search.WebSearch.RawCompletion&context=0&lstkp=0&rsz=small&hl=en&gss=.com&sig=c8b58b9f22a4c2eca4342449dba29b6f&q=ruby&key=notsupplied&v=1.0
  51. First lets inspect the URL query parameters that are being passed to `http://www.google.com/uds/GwebSearch`:
  52. * `v=1.0`: The desired version of functionality.
  53. * `lstkp=0`: The offset of the returned Search results.
  54. * `rsz=small`: The amount of search results to return. The value of small
  55. returns four results per request, and the value of large returns eight
  56. results per request.
  57. * `hl=en`: The language to return results in.
  58. * `callback=google.search.WebSearch.RawCompletion`: Used in the returned
  59. Javascript.
  60. * `q=ruby`: The Search query.
  61. * `sig=c8b58b9f22a4c2eca4342449dba29b6f`.
  62. * `gss=.com`.
  63. * `context=0`: Used in constructing the callback Javascript.
  64. * `key=notsupplied`: An optional API key.
  65. ## Analyzing the JSON
  66. Now that we have some understanding of the structure of the RESTful URL, we
  67. can begin constructing our own URLs and analyzing the resulting JSON they
  68. return. For this task we will write a small Ruby method and test it in
  69. [irb (Interactive Ruby Shell)](http://en.wikipedia.org/wiki/Interactive_Ruby_Shell).
  70. require 'uri'
  71. require 'net/http'
  72. module SophSec
  73. def SophSec.get_ajax_search(options={})
  74. options[:callback] ||= 'google.search.WebSearch.RawCompletion'
  75. options[:context] ||= 0
  76. options[:lstkp] ||= 0
  77. options[:rsz] ||= 'large'
  78. options[:hl] ||= 'en'
  79. options[:gss] ||= '.com'
  80. options[:start] ||= 0
  81. options[:sig] ||= '582c1116317355adf613a6a843f19ece'
  82. options[:key] ||= 'notsupplied'
  83. options[:v] ||= '1.0'
  84. url = URI("http://www.google.com/uds/GwebSearch?" + options.map { |key,value|
  85. "#{key}=#{value}"
  86. }.join('&'))
  87. return Net::HTTP.get(url)
  88. end
  89. end
  90. The `SophSec.get_ajax_search` method will build the RESTful URL and request
  91. the results from the Web Search API.
  92. irb(main):024:0> SophSec.get_ajax_search(:q => 'ruby', :rsz => 'small')
  93. => "google.search.WebSearch.RawCompletion('0',{\"results\":[{\"GsearchResultClas
  94. s\":\"GwebSearch\",\"unescapedUrl\":\"http://www.ruby-lang.org/\",\"url\":\"http
  95. ://www.ruby-lang.org/\",\"visibleUrl\":\"www.ruby-lang.org\",\"cacheUrl\":\"http
  96. ://www.google.com/search?q\\u003dcache:U0idxbaGKSwJ:www.ruby-lang.org\",\"title\
  97. ":\"\\u003cb\\u003eRuby\\u003c/b\\u003e Programming Language\",\"titleNoFormatti
  98. ng\":\"Ruby Programming Language\",\"content\":\"A dynamic, interpreted, open so
  99. urce programming language with a focus on simplicity and productivity. Site in
  100. cludes news, downloads, documentation, \\u003cb\\u003e...\\u003c/b\\u003e\"},{\"
  101. GsearchResultClass\":\"GwebSearch\",\"unescapedUrl\":\"http://en.wikipedia.org/w
  102. iki/Ruby_programming_language\",\"url\":\"http://en.wikipedia.org/wiki/Ruby_prog
  103. ramming_language\",\"visibleUrl\":\"en.wikipedia.org\",\"cacheUrl\":\"http://www
  104. .google.com/search?q\\u003dcache:ctgTVVq1VwEJ:en.wikipedia.org\",\"title\":\"\\u
  105. 003cb\\u003eRuby\\u003c/b\\u003e (programming language) - Wikipedia, the free en
  106. cyclopedia\",\"titleNoFormatting\":\"Ruby (programming language) - Wikipedia, th
  107. e free encyclopedia\",\"content\":\"Growing article, with links to many related
  108. topics. [Wikipedia]\"},{\"GsearchResultClass\":\"GwebSearch\",\"unescapedUrl\":\
  109. "http://en.wikipedia.org/wiki/Rubies\",\"url\":\"http://en.wikipedia.org/wiki/Ru
  110. bies\",\"visibleUrl\":\"en.wikipedia.org\",\"cacheUrl\":\"http://www.google.com/
  111. search?q\\u003dcache:gtP3_-Y-jd0J:en.wikipedia.org\",\"title\":\"\\u003cb\\u003e
  112. Ruby\\u003c/b\\u003e - Wikipedia, the free encyclopedia\",\"titleNoFormatting\":
  113. \"Ruby - Wikipedia, the free encyclopedia\",\"content\":\"\\u003cb\\u003eRuby\\u
  114. 003c/b\\u003e is a pink to blood red gemstone, a variety of the mineral corundum
  115. (aluminium oxide). The common red color is caused mainly by the element chromi
  116. um. \\u003cb\\u003e...\\u003c/b\\u003e\"},{\"GsearchResultClass\":\"GwebSearch\"
  117. ,\"unescapedUrl\":\"http://www.rubyonrails.org/\",\"url\":\"http://www.rubyonrai
  118. ls.org/\",\"visibleUrl\":\"www.rubyonrails.org\",\"cacheUrl\":\"http://www.googl
  119. e.com/search?q\\u003dcache:kEJNFfIPffoJ:www.rubyonrails.org\",\"title\":\"\\u003
  120. cb\\u003eRuby\\u003c/b\\u003e on Rails\",\"titleNoFormatting\":\"Ruby on Rails\"
  121. ,\"content\":\"RoR home; full stack, Web application framework optimized for sus
  122. tainable programming productivity, allows writing sound code by favoring conve
  123. ntion over \\u003cb\\u003e...\\u003c/b\\u003e\"}],\"cursor\":{\"pages\":[{\"star
  124. t\":\"0\",\"label\":1},{\"start\":\"4\",\"label\":2},{\"start\":\"8\",\"label\":
  125. 3},{\"start\":\"12\",\"label\":4}],\"estimatedResultCount\":\"10900000\",\"curre
  126. ntPageIndex\":0,\"moreResultsUrl\":\"http://www.google.com/search?oe\\u003dutf8\
  127. \u0026ie\\u003dutf8\\u0026source\\u003duds\\u0026start\\u003d0\\u0026hl\\u003den
  128. \\u0026q\\u003druby\"}}, 200, null, 205)"
  129. Wow, that's a huge chunk of Javascript. Clearly this is a Javascript
  130. callback to update the Search dialogue with the JSON Hash containing the
  131. results Array. To make this giant Javascript String useful we will have to
  132. write another method to strip off the callback method and parse the JSON
  133. Hash.
  134. require 'json'
  135. module SophSec
  136. def SophSec.ajax_search(options={})
  137. hash = JSON.parse(SophSec.get_ajax_search(options).scan(/\{.*\}/).first)
  138. if (hash.kind_of?(Hash) && hash['results'])
  139. return hash['results']
  140. end
  141. return []
  142. end
  143. end
  144. The `SophSec.ajax_search` method will extract the first JSON Hash it
  145. recognizes from `SophSec.get_ajax_search` and parse it using Ruby's
  146. [JSON.parse](http://json.rubyforge.org/doc/classes/JSON.html#M000084)
  147. method. Well test out `SophSec.ajax_search` in irb using the Ruby
  148. [pp](http://www.ruby-doc.org/stdlib/libdoc/pp/rdoc/index.html)
  149. method to pretty print the Array of Search results.
  150. irb(main):014:0> require 'pp'
  151. irb(main):015:0> pp SophSec.ajax_search(:q => 'ruby', :rsz => 'small')
  152. [{"GsearchResultClass"=>"GwebSearch",
  153. "title"=>"<b>Ruby</b> Programming Language",
  154. "url"=>"http://www.ruby-lang.org/",
  155. "cacheUrl"=>
  156. "http://www.google.com/search?q=cache:U0idxbaGKSwJ:www.ruby-lang.org",
  157. "content"=>
  158. "A dynamic, interpreted, open source programming language with a focus on simplicity \
  159. and productivity. Site includes news, downloads, documentation, <b>...</b>",
  160. "visibleUrl"=>"www.ruby-lang.org",
  161. "titleNoFormatting"=>"Ruby Programming Language",
  162. "unescapedUrl"=>"http://www.ruby-lang.org/"},
  163. {"GsearchResultClass"=>"GwebSearch",
  164. "title"=>
  165. "<b>Ruby</b> (programming language) - Wikipedia, the free encyclopedia",
  166. "url"=>"http://en.wikipedia.org/wiki/Ruby_programming_language",
  167. "cacheUrl"=>
  168. "http://www.google.com/search?q=cache:ctgTVVq1VwEJ:en.wikipedia.org",
  169. "content"=>"Growing article, with links to many related topics. [Wikipedia]",
  170. "visibleUrl"=>"en.wikipedia.org",
  171. "titleNoFormatting"=>
  172. "Ruby (programming language) - Wikipedia, the free encyclopedia",
  173. "unescapedUrl"=>"http://en.wikipedia.org/wiki/Ruby_programming_language"},
  174. {"GsearchResultClass"=>"GwebSearch",
  175. "title"=>"<b>Ruby</b> - Wikipedia, the free encyclopedia",
  176. "url"=>"http://en.wikipedia.org/wiki/Rubies",
  177. "cacheUrl"=>
  178. "http://www.google.com/search?q=cache:gtP3_-Y-jd0J:en.wikipedia.org",
  179. "content"=>
  180. "<b>Ruby</b> is a pink to blood red gemstone, a variety of the mineral \
  181. corundum (aluminium oxide). The common red color is caused mainly by the element \
  182. chromium. <b>...</b>",
  183. "visibleUrl"=>"en.wikipedia.org",
  184. "titleNoFormatting"=>"Ruby - Wikipedia, the free encyclopedia",
  185. "unescapedUrl"=>"http://en.wikipedia.org/wiki/Rubies"},
  186. {"GsearchResultClass"=>"GwebSearch",
  187. "title"=>"<b>Ruby</b> on Rails",
  188. "url"=>"http://www.rubyonrails.org/",
  189. "cacheUrl"=>
  190. "http://www.google.com/search?q=cache:kEJNFfIPffoJ:www.rubyonrails.org",
  191. "content"=>
  192. "RoR home; full stack, Web application framework optimized for sustainable \
  193. programming productivity, allows writing sound code by favoring convention over \
  194. <b>...</b>",
  195. "visibleUrl"=>"www.rubyonrails.org",
  196. "titleNoFormatting"=>"Ruby on Rails",
  197. "unescapedUrl"=>"http://www.rubyonrails.org/"}]
  198. => nil
  199. Awesome, `SophSec.ajax_search` returns a native Array of Search Results,
  200. ripe for the data-mining.
  201. ## Conclusion
  202. As one can see it takes a surprisingly small amount of code to create a
  203. non-Javascript interface to the Google AJAX Search API. The full source
  204. code to the `SophSec.get_ajax_search` and `SophSec.ajax_search`
  205. methods can be found
  206. [here](http://github.com/sophsec/shards/blob/master/ruby/ajax_search.rb).
  207. An interesting side-note, while testing the `SophSec.ajax_search` method it
  208. was discovered that the Search API does not perform query filtering against
  209. bots or other interesting Search queries. This discovery indicates that the
  210. Google AJAX Search API could be leveraged for automated web application
  211. vulnerability finger-printing.