Skip to main content
Version: v1.7.0

Document Reprocessing

Some AlphaSense features are using AI models that are periodically updated.

To reflect updates to these models to existing documents in the system, documents need to be fed again through corresponding subsystem to update the storages and/or search index.

We are working on making these operations as automatic as possible, but currently they require some manual actions.

Reprocessing Manager

reprocessing-manager is a service that coordinates reprocessing requests in the cluster by calling processing services to process documents.

Currently, reprocessing requests need to be executed by port-forwarding to reprocessing-manager pod and submitting the request by e.g. curl command line tool.

Exact request depends on the use case is provided by AlphaSense support.

Prerequisite

  • kubectl command line tool. At least version v1.25.16 is required

Executing requests

Port-forward to reprocessing-manager pod:

kubectl port-forward -n platform service/reprocessing-manager 8080:80

Execute request:

curl --location 'http://localhost:8080/internal/v2/submit' \
--header 'Content-Type: application/json' \
--data '{
"solrFilterQueries": [
"AcceptanceDate:[NOW-1DAYS TO NOW]",
],
"stages": ["reprocessing-postinghandler"]
}

The call returns task id (number) that can be used to track status of the process.

Tracking status

There is another endpoint to track status of the task. For example for task id 1234:

curl -X 'GET' \
'http://localhost:8080/internal/status/1234' \
-H 'accept: */*'

It returns counts of documents by status:

{
"id": 1234,
"requestType": "REPROCESSING",
"status": "IN_PROGRESS",
"countByStatus": {
"QUEUED": 0,
"FINISHED": 0,
"IN_PROGRESS": 9
}
}