/README.md

https://github.com/pardocz/zeroclickinfo-fathead · Markdown · 152 lines · 102 code · 50 blank · 0 comment · 0 complexity · 0e302544608bc3dac14ec4eec1ca1a63 MD5 · raw file

  1. DuckDuckGo ZeroClickInfo FatHeads
  2. =================================
  3. About
  4. -----
  5. See https://github.com/duckduckgo/duckduckgo/wiki for a general overview on contributing to DuckDuckGo.
  6. This repository is for contributing static, keyword based content to 0-click, e.g. getting a perl function reference when you search for perl split.
  7. Contributing
  8. ------------
  9. This repository is organized by type of content, each with its own directory. Some of those projects are in use on the live system, and some are still in development.
  10. Inside each directory are a couple of different files for specific cases.
  11. * project/fetch.sh
  12. This shell script is called to fetch the data.
  13. * project/parse.xx
  14. This is the script used to parse the data once it has been fetched. .xx can be .pl or .py or .js depending on what language you use.
  15. * project/parse.sh
  16. This shell script is called to run the parser.
  17. * project/data.url
  18. Please upload datafiles somewhere (off-repository) and then store the URL to them here. It could be to a .zip if there is a whole directory needed.
  19. * project/meta.txt
  20. This is a file that gives meta information about the data source. It should have this format:
  21. ```txt
  22. # This is the name of the source as people would refer to it, e.g. Wikipedia or PerlDoc
  23. Name: jQuery API
  24. # This is the base domain where the source pages are located.
  25. Domain: api.jquery.com
  26. # This is what gets put in quotes next to the source
  27. # It can be blank if it is a source with completely general info spanning many types of topics like Facebook.
  28. Type: jQuery
  29. # Whether the source is from MediaWiki (1) or not (0).
  30. MediaWiki: 1
  31. # Keywords uses to trigger (or prefer) the source over others.
  32. Keywords: jQuery
  33. ```
  34. Output Formats
  35. --------------
  36. Please name the output file project.tsv (tab delimited) but do not store the data file(s) in the repository (as noted above).
  37. The output format from parse.xx depends on the type of content. In any case, it should be a tab delimited file, with one line per entry. Usually there is no need for newline characters, but if there is a need for some reason, escape them with a backslash like \\n.
  38. The general output fields are as follows. Check out http://duckduckgo.com/Perl for reference, which we will refer to in explaining the fields.
  39. ```perl
  40. # REQUIRED: full article title, e.g. Perl.
  41. my $title = $line[0] || '';
  42. # REQUIRED: A for article.
  43. my $type = $line[1] || '';
  44. # Only for redirects -- ask.
  45. my $redirect = $line[2] || '';
  46. # Ignore.
  47. my $otheruses = $line[3] || '';
  48. # You can put the article in multiple categories, and category pages will be created automatically.
  49. # E.g.: http://duckduckgo.com/c/Procedural_programming_languages
  50. # You would do: Procedural programming languages\\n
  51. # You can have several categories, separated by an escaped newline.
  52. my $categories = $line[4] || '';
  53. # Ignore.
  54. my $references = $line[5] || '';
  55. # You can reference related topics here, which get turned into links in the 0-click box.
  56. # On the perl example, e.g. Perl Data Language
  57. # You would do: [[Perl Data Language]]
  58. # If the link name is different, you could do [[Perl Data Language|PDL]]
  59. my $see_also = $line[6] || '';
  60. # Ignore.
  61. my $further_reading = $line[7] || '';
  62. # You can add external links that get put first when this article comes out.
  63. # The canonical example is an official site, which looks like:
  64. # [$url Official site]\\n
  65. # You can have several, separated by an escaped newline though only a few will be used.
  66. # You can also have before and after text or put multiple links in one like this.
  67. # Before text [$url link text] after text [$url2 second link].\\n
  68. my $external_links = $line[8] || '';
  69. # Ignore.
  70. my $disambiguation = $line[9] || '';
  71. # You can reference an external image that we will download and reformat for display.
  72. # You would do: [[Image:$url]]
  73. my $images = $line[10] || '';
  74. # This is the snippet info.
  75. my $abstract = $line[11] || '';
  76. # This is the full URL for the source.
  77. # If all the URLs are relative to the main domain,
  78. # this can be relative to that domain.
  79. my $source_url = $line[12] || '';
  80. In all this may look like:
  81. print OUT "$page\tA\t\t\t$categories\t\t$internal_links\t\t$external_links\t\t$images\t$abstract\t$relative_url\n";
  82. ```
  83. For programming references in particular, the fields are a bit different:
  84. ```perl
  85. # REQURIED: this is the name of the function.
  86. my $page = $line[0] || '';
  87. # Usually blank unless for something like JavaScript
  88. my $namespace = $line[1] || '';
  89. # REQUIRED: this is the target URL for more information.
  90. my $url = $line[2] || '';
  91. # SOME COMBO OF THESE IS REQUIRED.
  92. # Look at https://duckduckgo.com/?q=perl+split
  93. # The part in grey is the $synopsis and the stuff below is the $description
  94. my $description = $line[3] || '';
  95. my $synopsis = $line[4] || '';
  96. my $details = $line[5] || '';
  97. # usually blank
  98. my $type = $line[6] || '';
  99. # usually blank
  100. my $lang = $line[7] || '';
  101. ```
  102. In the programming case, we have a parser that translates the above into the general format by compressing a lot of the fields into the $abstract field in various ways, e.g. synopsis gets put in a code block.