FairOS vector database

Now store your own vector database with fairOS

View the Project on GitHub fairDataSociety/FaVe

FaVe (FairOS Vector store)

Go Report Card Release GitHub all releases Workflow Issues Closed PRs PRClosed Go Discord Telegram License

FaVe is a truly decentralised, open source vector database build with Fair Data Principals in mind on top of FairOS.

IMPORTANT: FaVe is under heavy development and in early BETA stage. Some abnormal behaviour, data loss can be observed. We do not recommend parallel usage of same account from multiple installations. Doing so might corrupt your data.

How do I install FaVe?

Prerequisites And Requirements

Docker

You can get docker from here

BEE

You will need a bee node running with a valid stamp id.

We encourage Swarm Desktop for setting up your bee node. Here is a guide for it.

FDP account

You will need a FDP/Fairdrive account to use FaVe. You can create one from here

Running FaVe

From Source

Export the following from your terminal

export VERBOSE=                       
export BEE_API=
export RPC_API=
export STAMP_ID=
export VECTORIZER_URL=
export USER=
export PASSWORD=
export POD=

Note :

VECTORIZER_URL is optional in case we want to provide embeddings generated from other sources

Then run the following command

go run cmd/fave-server/main.go --port 1234 --keep-alive 6000m --write-timeout 6000m --read-timeout 6000m

With Docker

docker run -d \
    -e VERBOSE=true \
    -e BEE_API=<BEE_API> \
    -e RPC_API=<RPC_ENDPOINT_FOR_ENS_AUTH> \
    -e STAMP_ID=<STAMP_ID> \
    -e USER=<FAIROS_USERNAME> \
    -e PASSWORD=<FAIROS_PASSWORD> \
    -e POD=<POD_FOR_STORING_DB> \
    -e VECTORIZER_URL=<API_ENDPOINT_FOR_VECTORIZER> \
    -p 1234:1234 \
    fairdatasociety/fave:latest --port 1234 --host 0.0.0.0 --keep-alive 6000m --write-timeout 6000m --read-timeout 6000m

Or, you can build the docker image yourself.

// build 
docker build -t fds/fave .

// run
docker run -d \
    -e VERBOSE=true \
    -e BEE_API=<BEE_API> \
    -e RPC_API=<RPC_ENDPOINT_FOR_ENS_AUTH> \
    -e STAMP_ID=<STAMP_ID> \
    -e USER=<FAIROS_USERNAME> \
    -e PASSWORD=<FAIROS_PASSWORD> \
    -e POD=<POD_FOR_STORING_DB> \
    -e VECTORIZER_URL=<API_ENDPOINT_FOR_VECTORIZER> \
    -p 1234:1234 \
    fds/fave --port 1234 --host 0.0.0.0 --keep-alive 6000m --write-timeout 6000m --read-timeout 6000m

How does FaVe work?

FaVe currently supports only test vectorization.

The system first produces vector representations or embeddings from a chosen vectorizer. Following that, it determines the nearest neighbors based on these embeddings. Once this is done, the content gets uploaded, and subsequently, the information about the nearest neighbors is also uploaded.

When conducting a search for a particular term, it computes the distance from a designated starting point and then searches for a match within the precomputed nearest neighbors.

How to put data in FaVe?

FaVe utilizes fairOS internally, meaning it’s embedded directly rather than through REST APIs. FaVe itself offers a set of REST APIs for various functions.

Before we go any further we need these concepts cleared up:

How do we perform data upload?

FaVe provides a set of REST APIs for creating collections, adding documents, and retrieving nearest documents.

Before uploading data as documents, some preprocessing is required.

Here are the steps:

  1. Start the vectorizer service.
  2. Launch FaVe with fairOS credentials, the bee endpoint, and a batch.
  3. Prepare the data for uploading.
  4. Create a collection.
  5. Upload the documents into FaVe.

We have to prepare the documents in a specific format before uploading them via REST api

{
  "name": "collection1",
  "propertiesToIndex": ["property1"],
  "documents": [
    {
      "id": "721dfcef-5b95-4eeb-99fc-841784a397df",
      "properties": {
        "property1": "foo1",
        "property2": "bar1"
      }
    },
    {
      "id": "721dfcef-5b95-4eeb-99fc-841784a397dg",
      "properties": {
        "property1": "foo2",
        "property2": "bar2"
      }
    }
  ]
}

This is an example of the add documents request body.

We have to provide the name of the collection. The propertiesToIndex is an array of properties that we want to index/vectorize in the vector database. We are only indexing property1 in this example.

The documents array contains the documents that we want to upload. Each document has a unique id and properties. Properties are the features of the document. They should contain key and value pairs. all the documents should have the same properties.

Once we have the data in the correct format, we can upload it to FaVe.

How to search in FaVe?

FaVe provides a REST APIs for retrieving nearest documents from a collection, given a query and a maximum distance.

The response contains the nearest documents along with their properties and their distances from the query.