/README.md
https://github.com/pardocz/zeroclickinfo-fathead · Markdown · 152 lines · 102 code · 50 blank · 0 comment · 0 complexity · 0e302544608bc3dac14ec4eec1ca1a63 MD5 · raw file
- DuckDuckGo ZeroClickInfo FatHeads
- =================================
- About
- -----
- See https://github.com/duckduckgo/duckduckgo/wiki for a general overview on contributing to DuckDuckGo.
- This repository is for contributing static, keyword based content to 0-click, e.g. getting a perl function reference when you search for perl split.
- Contributing
- ------------
- This repository is organized by type of content, each with its own directory. Some of those projects are in use on the live system, and some are still in development.
- Inside each directory are a couple of different files for specific cases.
- * project/fetch.sh
- This shell script is called to fetch the data.
- * project/parse.xx
- This is the script used to parse the data once it has been fetched. .xx can be .pl or .py or .js depending on what language you use.
- * project/parse.sh
- This shell script is called to run the parser.
- * project/data.url
- Please upload datafiles somewhere (off-repository) and then store the URL to them here. It could be to a .zip if there is a whole directory needed.
- * project/meta.txt
- This is a file that gives meta information about the data source. It should have this format:
- ```txt
- # This is the name of the source as people would refer to it, e.g. Wikipedia or PerlDoc
- Name: jQuery API
- # This is the base domain where the source pages are located.
- Domain: api.jquery.com
- # This is what gets put in quotes next to the source
- # It can be blank if it is a source with completely general info spanning many types of topics like Facebook.
- Type: jQuery
- # Whether the source is from MediaWiki (1) or not (0).
- MediaWiki: 1
- # Keywords uses to trigger (or prefer) the source over others.
- Keywords: jQuery
- ```
- Output Formats
- --------------
- Please name the output file project.tsv (tab delimited) but do not store the data file(s) in the repository (as noted above).
- The output format from parse.xx depends on the type of content. In any case, it should be a tab delimited file, with one line per entry. Usually there is no need for newline characters, but if there is a need for some reason, escape them with a backslash like \\n.
- The general output fields are as follows. Check out http://duckduckgo.com/Perl for reference, which we will refer to in explaining the fields.
- ```perl
- # REQUIRED: full article title, e.g. Perl.
- my $title = $line[0] || '';
- # REQUIRED: A for article.
- my $type = $line[1] || '';
- # Only for redirects -- ask.
- my $redirect = $line[2] || '';
- # Ignore.
- my $otheruses = $line[3] || '';
- # You can put the article in multiple categories, and category pages will be created automatically.
- # E.g.: http://duckduckgo.com/c/Procedural_programming_languages
- # You would do: Procedural programming languages\\n
- # You can have several categories, separated by an escaped newline.
- my $categories = $line[4] || '';
- # Ignore.
- my $references = $line[5] || '';
- # You can reference related topics here, which get turned into links in the 0-click box.
- # On the perl example, e.g. Perl Data Language
- # You would do: [[Perl Data Language]]
- # If the link name is different, you could do [[Perl Data Language|PDL]]
- my $see_also = $line[6] || '';
- # Ignore.
- my $further_reading = $line[7] || '';
- # You can add external links that get put first when this article comes out.
- # The canonical example is an official site, which looks like:
- # [$url Official site]\\n
- # You can have several, separated by an escaped newline though only a few will be used.
- # You can also have before and after text or put multiple links in one like this.
- # Before text [$url link text] after text [$url2 second link].\\n
- my $external_links = $line[8] || '';
- # Ignore.
- my $disambiguation = $line[9] || '';
- # You can reference an external image that we will download and reformat for display.
- # You would do: [[Image:$url]]
- my $images = $line[10] || '';
- # This is the snippet info.
- my $abstract = $line[11] || '';
- # This is the full URL for the source.
- # If all the URLs are relative to the main domain,
- # this can be relative to that domain.
- my $source_url = $line[12] || '';
- In all this may look like:
- print OUT "$page\tA\t\t\t$categories\t\t$internal_links\t\t$external_links\t\t$images\t$abstract\t$relative_url\n";
- ```
- For programming references in particular, the fields are a bit different:
- ```perl
- # REQURIED: this is the name of the function.
- my $page = $line[0] || '';
- # Usually blank unless for something like JavaScript
- my $namespace = $line[1] || '';
- # REQUIRED: this is the target URL for more information.
- my $url = $line[2] || '';
- # SOME COMBO OF THESE IS REQUIRED.
- # Look at https://duckduckgo.com/?q=perl+split
- # The part in grey is the $synopsis and the stuff below is the $description
- my $description = $line[3] || '';
- my $synopsis = $line[4] || '';
- my $details = $line[5] || '';
- # usually blank
- my $type = $line[6] || '';
- # usually blank
- my $lang = $line[7] || '';
- ```
- In the programming case, we have a parser that translates the above into the general format by compressing a lot of the fields into the $abstract field in various ways, e.g. synopsis gets put in a code block.