Working with Content Admin Endpoints

2 min read

note

You may clog your system when over-using these endpoints.

Be reasonable with the number of documents you run at once and take small steady steps.

Ingestion from other sources might become delayed if these endpoints are overused.

These are maintenance endpoints for occasional use.


Some content might be not correctly reflected in our application. You can execute some corrective operations on them.


Each of these methods uses a 2-phase protocol.

  1. Mark the content.

  2. Execute the operations on the content.

info

You will need admin access rights for the following queries.


Execute this query against the Ingestion GraphQL endpoint.

For this you will need an access token this is how you get it:

info

Get an Auth Token using this guideline: How to get a Token for the GraphQL APIs

Checking the content

graphql
query Content {
    content(where: { key: { contains: "MediGroup" } }) {
        metadata
        key
        id
    }
}

Re-Indexing vectors

Synchronizes the vectors from the Postgres with the VectorDB.

This can only be done for files in the state FINISHED.

graphql
mutation MarkAllForReindexing {
    markAllForReindexing(where: { key: { contains: "MediGroup" } })
}
graphql
mutation ReIndexVectorDB {
    reIndexVectorDB(waitAfterRounds: 40, waitInMs: 250)
}

Re-Embed the text of chunks

In some cases, for example changing of the embedding model, it is required to create new embeddings based on the text of the chunks of the contents. This can be done with the following two API requests:

Marking them as RE_EMBEDDING

bash
curl --location 'https://gateway.<tenantName>.unique.app/ingestion/graphql' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{"query":"mutation MarkForReembedding($where: ContentWhereInput) {\n  markForReembedding(where: $where)\n}","variables":{"where":{}}}'

Start the Re-Embedding

bash
curl --location 'https://gateway.<tenantName>.unique.app/ingestion/graphql' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <token>' \
--data '{"query":"mutation ReembedFiles {\n  reembedFiles\n}"}'

Rebuilding Meta Data

This adds the default metadata again on the vectors and keeps the current metadata fields intact, which is used once you need the metadata filtering for search etc. Eventually, all data needs to be migrated like this.

This can only be done for files in the state FINISHED.

graphql
mutation MarkAllForRebuidingMetadata {
    markAllForRebuidingMetadata(where: { key: { contains: "MediGroup" } })
}
graphql
mutation RebuildMetadata {
    rebuildMetadata(waitAfterRounds: 40, waitInMs: 250)
}

Check Integrity

This compares the vectors of the content in the Postgres database and the VectorDB. Should it be out of sync it will call the reindexing to put the vectors into sync again basically repairing the state.
This can only be done for files in the state FINISHED.

graphql
mutation MarkAllForVectorDataIntegriyCheck {
    markAllForVectorDataIntegrityCheck(where: { key: { contains: "MediGroup" } })
}
graphql
mutation CheckVectorDataIntegrity {
    checkVectorDataIntegrity(waitAfterRounds: 40, waitInMs: 250)
}

Re-Ingest

It sometimes occurs that you encounter an issue with the current ingestion and need to re-ingest the whole file again eg. too long chunks because of tables that were not taken apart nicely.
Then you would like to re-ingest.

Limitations
You can only reingest files that are stored on our blobs this is handled automatically though.
This can only be done for files in state FINISHED & FAILED.

This uses the producer worker principle as normal ingestion.

info

Calling this mutation with scripts or automations will clog your ingestion queue and delay any other document from ingestion.

graphql
mutation MarkAllForReingestion {
    markAllForReingestion(where: { key: { contains: "MediGroup" } })
}
graphql
mutation ReingestFiles {
    reingestFiles(waitAfterRounds: 40, waitInMs: 250)
}

Force a certain state for content

Content may end up in the wrong state for whatever reason but can be forced into a state.

note

Forcing content into states skips transitions!

graphql
mutation ForceIngestionState {
     forceIngestionState(ingestionState: FAILED, 
        where: { id: { equals: "scope_1231231323" } 
     })
}
Last updated