Verifiable XGBoost

In this tutorial you will learn how to use the Giza stack though a XGBoost model.

Installation

To follow this tutorial, you must first proceed with the following installation.

Handling Python versions with Pyenv

You should install Giza tools in a virtual environment. If you’re unfamiliar with Python virtual environments, take a look at this guide. A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.

Install Python 3.11 using pyenv

pyenv install 3.11.0

Set Python 3.11 as local Python version:

pyenv local 3.11.0

Create a virtual environment using Python 3.11:

pyenv virtualenv 3.11.0 my-env

Activate the virtual environment:

pyenv activate my-env

Now, your terminal session will use Python 3.11 for this project.

Install Giza

Install Giza SDK

Install CLI, agents and zkcook using giza-sdk from PyPi

pip install giza-sdk

You'll find more options for installing Giza in the installation guide.

Install Dependencies

You must also install the following dependencies:

pip install xgboost numpy

Setup

From your terminal, create a Giza user through our CLI in order to access the Giza Platform:

giza users create

After creating your user, log into Giza:

giza users login

Optional: you can create an API Key for your user in order to not regenerate your access token every few hours.

giza users create-api-key

Create and Train an XGBoost Model

We'll start by creating a simple XGBoost model using Scikit-Learn and train it on diabetes dataset.

import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Increase the number of trees and maximum depth
n_estimators = 2  # Increase the number of trees
max_depth = 6  # Increase the maximum depth of each tree

xgb_reg = xgb.XGBRegressor(n_estimators=n_estimators, max_depth=max_depth)
xgb_reg.fit(X_train, y_train)

Save the model

Save the model in Json format

from giza.zkcook import serialize_model
serialize_model(xgb_reg, "xgb_diabetes.json")

Transpile your model to Orion Cairo

For more detailed information on transpilation, please consult the Transpiler resource.

We will use Giza-CLI to transpile our saved model to Orion Cairo.

! giza transpile xgb_diabetes.json --output-path xgb_diabetes

>>>>
[giza][2024-05-10 17:14:48.565] No model id provided, checking if model exists βœ…
[giza][2024-05-10 17:14:48.567] Model name is: xgb_diabetes
[giza][2024-05-10 17:14:49.081] Model already exists, using existing model βœ… 
[giza][2024-05-10 17:14:49.083] Model found with id -> 588! βœ…
[giza][2024-05-10 17:14:49.777] Version Created with id -> 2! βœ…
[giza][2024-05-10 17:14:49.780] Sending model for transpilation βœ… 
[giza][2024-05-10 17:15:00.670] Transpilation is fully compatible. Version compiled and Sierra is saved at Giza βœ…
β ™ Transpiling Model...
[giza][2024-05-10 17:15:01.337] Downloading model βœ…
[giza][2024-05-10 17:15:01.339] model saved at: xgb_diabetes

Deploy an inference endpoint

For more detailed information on inference endpoint, please consult the Endpoint resource.

Now that our model is transpiled to Cairo we can deploy an endpoint to run verifiable inferences. We will use Giza CLI again to run and deploy an endpoint. Ensure to replace model-id and version-id with your ids provided during transpilation.

! giza endpoints deploy --model-id 588 --version-id 2

>>>>
β–°β–°β–°β–°β–°β–°β–° Creating endpoint!t!
[giza][2024-05-10 17:15:21.628] Endpoint is successful βœ…
[giza][2024-05-10 17:15:21.635] Endpoint created with id -> 190 βœ…
[giza][2024-05-10 17:15:21.636] Endpoint created with endpoint URL: https://endpoint-raphael-doukhan-588-2-72c9b3b8-7i3yxzspbq-ew.a.run.app πŸŽ‰

Run a verifiable inference

To streamline verifiable inference, you might consider using the endpoint URL obtained after transpilation. However, this approach requires manual serialization of the input for the Cairo program and handling the deserialization process. To make this process more user-friendly and keep you within a Python environment, we've introduced a Python SDK designed to facilitate the creation of ML workflows and execution of verifiable predictions. When you initiate a prediction, our system automatically retrieves the endpoint URL you deployed earlier, converts your input into Cairo-compatible format, executes the prediction, and then converts the output back into a numpy object.

import xgboost as xgb
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

from giza.agents.model import GizaModel


MODEL_ID = 588  # Update with your model ID
VERSION_ID = 2  # Update with your version ID

def prediction(input, model_id, version_id):
    model = GizaModel(id=model_id, version=version_id)

    (result, proof_id) = model.predict(
        input_feed={"input": input}, verifiable=True, model_category="XGB"
    )

    return result, proof_id


def execution():
    # The input data type should match the model's expected input
    input = X_test[1, :]

    (result, proof_id) = prediction(input, MODEL_ID, VERSION_ID)

    print(f"Predicted value for input {input.flatten()[0]} is {result}")

    return result, proof_id


if __name__ == "__main__":
    data = load_diabetes()
    X, y = data.data, data.target

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    _, proof_id = execution()
    print(f"Proof ID: {proof_id}")
πŸš€ Starting deserialization process...
βœ… Deserialization completed! πŸŽ‰
(175.58781, '546f8817fa454db78982463868440e8c')

If your problem is a binary classification problem, you will need to post-process the result obtained after executing the predict method. The code you need to execute to get the probability of class 1 (same probability returned by XGBClassifier.predict_proba()) is in the following code snippet

import json
import math

def logit(x):
    return math.log(x / (1 - x))

def post_process_binary_pred(model_json_path, result):
    """
    Returns the probability of the positive class given a result from GizaModel.predict()
    
    Parameters:
    model_json_path (str): Path to the trained model in JSON format.
    result (float): Result from GizaModel.predict().
    
    Returns:
    float: Probability of the positive class.
    """
    with open(model_json_path, 'r') as f:
        xg_json = json.load(f)
    
    base_score = float(xg_json['learner']['learner_model_param']['base_score'])
    
    if base_score != 0:
        result = result + logit(base_score)
    final_score = 1 / (1 + math.exp(-result))
    
    return final_score

# Usage example
model_path = 'PATH_TO_YOUR_MODEL.json'  # Path to your model JSON file
predict_result = 3.45  # Example result from GizaModel.predict()
probability = post_process_binary_pred(model_path, predict_result)

Download the proof

For more detailed information on proving, please consult the Prove resource.

Initiating a verifiable inference sets off a proving job on our server, sparing you the complexities of installing and configuring the prover yourself. Upon completion, you can download your proof.

First, let's check the status of the proving job to ensure that it has been completed.

Remember to substitute endpoint-id and proof-id with the specific IDs assigned to you throughout this tutorial.

$ giza endpoints get-proof --endpoint-id 190 --proof-id "546f8817fa454db78982463868440e8c"

>>>
[giza][2024-03-19 11:51:45.470] Getting proof from endpoint 190 βœ… 
{
  "id": 664,
  "job_id": 831,
  "metrics": {
    "proving_time": 15.083126
  },
  "created_date": "2024-03-19T10:41:11.120310"
}

Once the proof is ready, you can download it.

$ giza endpoints download-proof --endpoint-id 190 --proof-id "546f8817fa454db78982463868440e8c" --output-path zk_xgboost.proof

>>>>
[giza][2024-03-19 11:55:49.713] Getting proof from endpoint 190 βœ… 
[giza][2024-03-19 11:55:50.493] Proof downloaded to zk_xgboost.proof βœ… 

Better to surround the proof-id in double quotes (") when using the alphanumerical id

Verify the proof

Finally, you can verify the proof.

$ giza verify --proof-id 664

>>>>
[giza][2024-05-21 10:08:59.315] Verifying proof...
[giza][2024-05-21 10:09:00.268] Verification result: True
[giza][2024-05-21 10:09:00.270] Verification time: 0.437505093

Last updated