PredictionIO：開源的推薦系統

PredictionIO

PredictionIO是一個用Scala編寫的開源機器學習服務器應用，可以幫助你方便地使用RESTFul API搭建推薦引擎。 PredictionIO的核心使用的是一個可伸縮的機器學習庫，基於Spark一個完整的端到端Pipeline，讓使用者可以非常簡單的從零開始搭建一個推薦系統。 “

PredictionIO 是由三個元件所組成：

PredictionIO platform
Event Server：收集來自應用程式的資料，可以是即時也可以定時。
Engine：訓練模型，並且將結果以 Restful API 提供查詢。

PredictionIO

Install

官方有提供快速的一鍵安裝方法，當然也可以手動安裝。

1 2	$ bash -c "$(curl -s https://install.prediction.io/install.sh)" $ PATH=$PATH:/home/yourname/PredictionIO/bin; export PATH

透過以下指定可以檢查是否安裝成功，會回傳每一種套件所連接的狀況

$ pio status

### Return:
[INFO] [Console$] Inspecting PredictionIO...
[INFO] [Console$] PredictionIO 0.9.6 is installed at ...
[INFO] [Console$] Inspecting Apache Spark...
[INFO] [Console$] Apache Spark is installed at ...
[INFO] [Console$] Apache Spark 1.6.0 detected ...
[INFO] [Console$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: MYSQL)...
[INFO] [Storage$] Verifying Model Data Backend (Source: MYSQL)...
[INFO] [Storage$] Verifying Event Data Backend (Source: MYSQL)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [Console$] (sleeping 5 seconds for all messages to show up...)
[INFO] [Console$] Your system is all ready to go.

Quick Start

Step 1. Run PredictionIO

先執行 PredictionIO 主程式，針對不同的儲存器，有不同的執行方法。

$ pio eventserver &
# If you are using PostgreSQL or MySQL, run the following to start PredictionIO Event Server

or

$ pio-start-all
# If instead you are running HBase and Elasticsearch, run the following to start all PredictionIO Event Server, HBase, and Elasticsearch

Step 2. Create a new Engine from an Engine Template

選擇 Engine Templates 一個適合的 Engine。

1 2	$ pio template get <template-repo-path> <your-app-directory> $ cd MyRecommendation

可以從 Engine Templates 選擇，也可以自定義，在這邊我們使用 Universal Recommender 作為範例。

Step 3. Generate an App ID and Access Key

執行指定從 Engine 產生一個 APP 並取得對應的 Key。

$ pio app new MyRecommendation

### Return:
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [App$] Created new app:
[INFO] [App$]       Name: MyRecommendation
[INFO] [App$]         ID: 1
[INFO] [App$] Access Key: ...

$ pio app list

### Return:
[INFO] [App$]         Name       |   ID | Access Key | Allowed Event(s)
[INFO] [App$]   MyRecommendation |    1 |   ...      | (all)
[INFO] [App$] Finished listing 1 app(s).

Step 4. Collecting Data

接著要匯入資料，最基本的推薦演算法（Cooperative Filtering, CF）格式支元： user - action - item 三種元素。使用 data/import_eventserver.py 可以將符合格式的資料匯入資料庫。

1 2	$ curl <sample_data> --create-dirs -o data/<sample_data> $ python data/import_eventserver.py --access_key <access-key>

data/sample_data.txt

...
0::2::3
0::3::1
3::9::4
6::9::1
...

Step 5. Deploy the Engine as a Service

在部署應用程式之前，先在 Engine.json 中設定基礎資料，像是 appName 或是演算法要運行幾次之類的。

./Engine.json

...
"datasource": {
  "params" : {
    "appName": MyRecommendation
    # make sure the appName parameter match your App Name
  }
},
...

部署系統到 Web Service 時，過程中分成三個步驟： pio build -> pio train -> pio deploy
Building 負責準備 Spark 的基礎環境及資料準備。 Training 負責執行演算法建模。 Deployment 則是將結果運行在 Web Service 上，並以 Restful API 開放。

Bulid and Training the Predictive Model

$ pio build

### Return:
[INFO] [Console$] Your engine is ready for training.


$ pio train

### Return:
[INFO] [CoreWorkflow$] Training completed successfully.

$ pio deploy

### Return:
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Bind successful. Ready to serve.

Step 6. Use the Engine

然後就是執行了，預設會開在 port 8000，參數輸入 使用者 即要推薦的 商品數量。

$ curl -H "Content-Type: application/json" \
-d '{ "user": "1", "num": 4 }' https://localhost:8000/queries.json

### Retnrn:
{
  "itemScores":[
    {"item":"22","score":4.072304374729956},
    {"item":"62","score":4.058482414005789},
    {"item":"75","score":4.046063009943821},
    {"item":"68","score":3.8153661512945325}
  ]
}

Reference

[1] PredictionIO
[2] PredictionIO快速入门

License

本著作由 Chang, Wei-Yaun (v123582) 製作，
以創用CC 姓名標示-相同方式分享 3.0 Unported授權條款釋出。