/docs/manual/source/templates/classification/quickstart.html.md.erb
Ruby HTML | 466 lines | 378 code | 88 blank | 0 comment | 12 complexity | 26d3065d541728e33c8fdf9609e5a7e7 MD5 | raw file
- ---
- title: Quick Start - Classification Engine Template
- ---
- ## Overview
- An engine template is an almost-complete implementation of an engine.
- PredictionIO's Classification Engine Template
- has integrated **Apache Spark MLlib**'s Naive Bayes algorithm by default.
- The default use case of Classification Engine Template is to predict the service
- plan (*plan*) a user will subscribe to based on his 3 properties: *attr0*,
- *attr1* and *attr2*.
- You can customize it easily to fit your specific use case and needs.
- We are going to show you how to create your own classification engine for
- production use based on this template.
- ## Usage
- ### Event Data Requirements
- By default, the template requires the following events to be collected:
- - user $set event, which set the attributes of the user
- NOTE: You can customize to use other event.
- ### Input Query
- - individual attributes values (for version >= v0.3.1)
- WARNING: for version < v0.3.1, it is array of features values
- ### Output PredictedResult
- - the predicted label
- ## 1. Install and Run PredictionIO
- <%= partial 'shared/quickstart/install' %>
- ## 2. Create a new Engine from an Engine Template
- <%= partial 'shared/quickstart/create_engine', locals: { engine_name: 'MyClassification', template_name: 'Classification Engine Template', template_repo: 'template-scala-parallel-classification' } %>
- ## 3. Generate an App ID and Access Key
- <%= partial 'shared/quickstart/create_app' %>
- ## 4. Collecting Data
- Next, let's collect some training data. By default, the Classification Engine Template reads 4 properties of a user record: attr0, attr1, attr2 and plan. This templates requires '$set' user events.
- INFO: This template can easily be customized to use different or more number of attributes.
- <%= partial 'shared/quickstart/collect_data' %>
- To set properties "attr0", "attr1", "attr2" and "plan" for user "u0" on time `2014-11-02T09:39:45.618-08:00` (current time will be used if eventTime is not specified), you can send `$set` event for the user. To send this event, run the following `curl` command:
- <div class="tabs">
- <div data-tab="REST API" data-lang="json">
- ```
- $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
- -H "Content-Type: application/json" \
- -d '{
- "event" : "$set",
- "entityType" : "user",
- "entityId" : "u0",
- "properties" : {
- "attr0" : 0,
- "attr1" : 1,
- "attr2" : 0,
- "plan" : 1
- }
- "eventTime" : "2014-11-02T09:39:45.618-08:00"
- }'
- ```
- </div>
- <div data-tab="Python SDK" data-lang="python">
- ```python
- import predictionio
- client = predictionio.EventClient(
- access_key=<ACCESS KEY>,
- url=<URL OF EVENTSERVER>,
- threads=5,
- qsize=500
- )
- # Set the 4 properties for a user
- client.create_event(
- event="$set",
- entity_type="user",
- entity_id=<USER ID>,
- properties= {
- "attr0" : int(<VALUE OF ATTR0>),
- "attr1" : int(<VALUE OF ATTR1>),
- "attr2" : int(<VALUE OF ATTR2>),
- "plan" : int(<VALUE OF PLAN>)
- }
- )
- ```
- </div>
- <div data-tab="PHP SDK" data-lang="php">
- ```php
- <?php
- require_once("vendor/autoload.php");
- use predictionio\EventClient;
- $client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
- // Set the 4 properties for a user
- $client->createEvent(array(
- 'event' => '$set',
- 'entityType' => 'user',
- 'entityId' => <USER ID>,
- 'properties' => array(
- 'attr0' => <VALUE OF ATTR0>,
- 'attr1' => <VALUE OF ATTR1>,
- 'attr2' => <VALUE OF ATTR2>,
- 'plan' => <VALUE OF PLAN>
- )
- ));
- ?>
- ```
- </div>
- <div data-tab="Ruby SDK" data-lang="ruby">
- ```ruby
- # Create a client object.
- client = PredictionIO::EventClient.new(<ACCESS KEY>, <URL OF EVENTSERVER>)
- # Set the 4 properties for a user.
- client.create_event(
- '$set',
- 'user',
- <USER ID>, {
- 'properties' => {
- 'attr0' => <VALUE OF ATTR0 (integer)>,
- 'attr1' => <VALUE OF ATTR1 (integer)>,
- 'attr2' => <VALUE OF ATTR2 (integer)>,
- 'plan' => <VALUE OF PLAN (integer)>,
- }
- }
- )
- ```
- </div>
- <div data-tab="Java SDK" data-lang="java">
- ```java
- import com.google.common.collect.ImmutableMap;
- import org.apache.predictionio.Event;
- import org.apache.predictionio.EventClient;
- EventClient client = new EventClient(<ACCESS KEY>, <URL OF EVENTSERVER>);
- // set the 4 properties for a user
- Event event = new Event()
- .event("$set")
- .entityType("user")
- .entityId(<USER ID>)
- .properties(ImmutableMap.<String, Object>of(
- "attr0", <VALUE OF ATTR0>,
- "attr1", <VALUE OF ATTR1>,
- "attr2", <VALUE OF ATTR2>,
- "plan", <VALUE OF PLAN>
- ));
- client.createEvent(event);
- ```
- </div>
- </div>
- Note that you can also set the properties for the user with multiple `$set` events (They will be aggregated during engine training).
- To set properties "attr0", "attr1" and "attr2", and "plan" for user "u1" at different time, you can send follwing `$set` events for the user. To send these events, run the following `curl` command:
- <div class="tabs">
- <div data-tab="REST API" data-lang="json">
- ```
- $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
- -H "Content-Type: application/json" \
- -d '{
- "event" : "$set",
- "entityType" : "user",
- "entityId" : "u1",
- "properties" : {
- "attr0" : 0
- }
- "eventTime" : "2014-11-02T09:39:45.618-08:00"
- }'
- $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
- -H "Content-Type: application/json" \
- -d '{
- "event" : "$set",
- "entityType" : "user",
- "entityId" : "u1",
- "properties" : {
- "attr1" : 1,
- "attr2": 0
- }
- "eventTime" : "2014-11-02T09:39:45.618-08:00"
- }'
- $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY \
- -H "Content-Type: application/json" \
- -d '{
- "event" : "$set",
- "entityType" : "user",
- "entityId" : "u1",
- "properties" : {
- "plan" : 1
- }
- "eventTime" : "2014-11-02T09:39:45.618-08:00"
- }'
- ```
- </div>
- <div data-tab="Python SDK" data-lang="python">
- ```python
- # You may also set the properties one by one
- client.create_event(
- event="$set",
- entity_type="user",
- entity_id=<USER ID>,
- properties= {
- "attr0" : int(<VALUE OF ATTR0>)
- }
- )
- client.create_event(
- event="$set",
- entity_type="user",
- entity_id=<USER ID>,
- properties= {
- "attr1" : int(<VALUE OF ATTR1>),
- "attr2" : int(<VALUE OF ATTR2>)
- }
- )
- client.create_event(
- event="$set",
- entity_type="user",
- entity_id=<USER ID>,
- properties= {
- "plan" : int(<VALUE OF PLAN>)
- }
- )
- ```
- </div>
- <div data-tab="PHP SDK" data-lang="php">
- ```php
- <?php
- // You may also set the properties one by one
- $client->createEvent(array(
- 'event' => '$set',
- 'entityType' => 'user',
- 'entityId' => <USER ID>,
- 'properties' => array(
- 'attr0' => <VALUE OF ATTR0>
- )
- ));
- $client->createEvent(array(
- 'event' => '$set',
- 'entityType' => 'user',
- 'entityId' => <USER ID>,
- 'properties' => array(
- 'attr1' => <VALUE OF ATTR1>,
- 'attr2' => <VALUE OF ATTR2>
- )
- ));
- $client->createEvent(array(
- 'event' => '$set',
- 'entityType' => 'user',
- 'entityId' => <USER ID>,
- 'properties' => array(
- 'plan' => <VALUE OF PLAN>
- )
- ));
- ?>
- ```
- </div>
- <div data-tab="Ruby SDK" data-lang="ruby">
- ```ruby
- # You may also set the properties one by one.
- client.create_event(
- '$set',
- 'user',
- <USER ID>, {
- 'properties' => {
- 'attr0' => <VALUE OF ATTR0 (integer)>
- }
- }
- )
- client.create_event(
- '$set',
- 'user',
- <USER ID>, {
- 'properties' => {
- 'attr1' => <VALUE OF ATTR1 (integer)>,
- }
- }
- )
- # Etc...
- ```
- </div>
- <div data-tab="Java SDK" data-lang="java">
- ```java
- // you may also set the properties one by one
- client.createEvent(new Event()
- .event("$set")
- .entityType("user")
- .entityId(<USER ID>)
- .property("attr0", <VALUE OF ATTR0>));
- client.createEvent(new Event()
- .event("$set")
- .entityType("user")
- .entityId(<USER ID>)
- .property("attr1", <VALUE OF ATTR1>)
- .property("attr2", <VALUE OF ATTR2>));
- client.createEvent(new Event()
- .event("$set")
- .entityType("user")
- .entityId(<USER ID>)
- .property("plan", <VALUE OF PLAN>));
- ```
- </div>
- </div>
- The properties of the `user` can be set, unset, or delete by special events **$set**, **$unset** and **$delete**. Please refer to [Event API](/datacollection/eventapi/#note-about-properties) for more details of using these events.
- <%= partial 'shared/quickstart/query_eventserver' %>
- ### Import More Sample Data
- <%= partial 'shared/quickstart/import_sample_data' %>
- A Python import script `import_eventserver.py` is provided to import the data to
- Event Server using Python SDK. Please upgrade to the latest Python SDK.
- <%= partial 'shared/quickstart/install_python_sdk' %>
- Make sure you are under the `MyClassification` directory. Replace the value of access_key parameter by your **Access Key** and run:
- ```
- $ cd MyClassification
- $ python data/import_eventserver.py --access_key obbiTuSOiMzyFKsvjjkDnWk1vcaHjcjrv9oT3mtN3y6fOlpJoVH459O1bPmDzCdv
- ```
- You should see the following output:
- ```
- Importing data...
- 6 events are imported.
- ```
- Now the training data is stored as events inside the Event Store.
- <%= partial 'shared/quickstart/query_eventserver_short' %>
- ## 5. Deploy the Engine as a Service
- <%= partial 'shared/quickstart/deploy_enginejson', locals: { engine_name: 'MyClassification' } %>
- <%= partial 'shared/quickstart/deploy', locals: { engine_name: 'MyClassification' } %>
- ## 6. Use the Engine
- Now, You can try to retrieve predicted results. For example, to predict the
- label (i.e. *plan* in this case) of a user with attr0=2, attr1=0 and attr2=0,
- you send this JSON `{ "attr0":2, "attr1":0, "attr2":0 }` to the deployed engine and it will
- return a JSON of the predicted plan. Simply send a query by making a HTTP
- request or through the `EngineClient` of an SDK.
- With the deployed engine running, open another temrinal and run the following `curl` command or use SDK to send the query:
- <div class="tabs">
- <div data-tab="REST API" data-lang="bash">
- ```bash
- $ curl -H "Content-Type: application/json" \
- -d '{ "attr0":2, "attr1":0, "attr2":0 }' http://localhost:8000/queries.json
- ```
- </div>
- <div data-tab="Python SDK" data-lang="python">
- ```python
- import predictionio
- engine_client = predictionio.EngineClient(url="http://localhost:8000")
- print engine_client.send_query({"attr0":2, "attr1":0, "attr2":0})
- ```
- </div>
- <div data-tab="PHP SDK" data-lang="php">
- ```php
- <?php
- require_once("vendor/autoload.php");
- use predictionio\EngineClient;
- $client = new EngineClient('http://localhost:8000');
- $response = $client->sendQuery(array('attr0'=> 2, 'attr1' => 0, 'attr2' => 0));
- print_r($response);
- ?>
- ```
- </div>
- <div data-tab="Ruby SDK" data-lang="ruby">
- ```ruby
- # Create client object.
- client = PredictionIO::EngineClient.new(<ENGINE DEPLOY URL>)
- # Query PredictionIO.
- response = client.send_query('attr0' => 2, 'attr1' => 0, 'attr2' => 0)
- puts response
- ```
- </div>
- <div data-tab="Java SDK" data-lang="java">
- ```java
- import com.google.common.collect.ImmutableList;
- import com.google.common.collect.ImmutableMap;
- import com.google.gson.JsonObject;
- import org.apache.predictionio.EngineClient;
- EngineClient engineClient = new EngineClient(<ENGINE DEPLOY URL>);
- JsonObject response = engineClient.sendQuery(ImmutableMap.<String, Object>of(
- "attr0", 2,
- "attr1", 0,
- "attr2", 0
- ));
- ```
- </div>
- </div>
- WARNING: The Query format is changed since version v0.3.1. If you are using old Classification template version v0.3.0 or earlier, the query format is array of feature values instead: `{ "features": [2, 0, 0] } `.
- The following is sample JSON response:
- ```
- {"label":0.0}
- ```
- Similarly, to predict the label (i.e. *plan* in this case) of a user with
- attr0=4, attr1=3 and attr2=8, you send this JSON `{ "attr0": 4, "attr1": 3, "attr2": 8] }` to
- the deployed engine and it will return a JSON of the predicted plan.
- WARNING: For classification template version v0.3.0 or earlier, the query JSON would be `{ "features": [4, 3, 8] }`.
- *MyClassification* is now running.
- <%= partial 'shared/quickstart/production' %>
- Next, we are going to take a look at the engine
- architecture and explain how you can customize it completely.
- #### [Next: DASE Components Explained](/templates/classification/dase/)