Document Reprocessing
Some AlphaSense features are using AI models that are periodically updated.
To reflect updates to these models to existing documents in the system, documents need to be fed again through corresponding subsystem to update the storages and/or search index.
We are working on making these operations as automatic as possible, but currently they require some manual actions.
Reprocessing Manager
reprocessing-manager
is a service that coordinates reprocessing requests in the cluster by calling
processing services to process documents.
Currently, reprocessing requests need to be executed by port-forwarding to reprocessing-manager
pod and submitting the request by e.g. curl command line tool.
Exact request depends on the use case is provided by AlphaSense support.
Prerequisite
- kubectl command line tool. At least version v1.25.16 is required
Executing requests
Port-forward to reprocessing-manager
pod:
kubectl port-forward -n platform service/reprocessing-manager 8080:80
Execute request:
curl --location 'http://localhost:8080/internal/v2/submit' \
--header 'Content-Type: application/json' \
--data '{
"solrFilterQueries": [
"AcceptanceDate:[NOW-1DAYS TO NOW]",
],
"stages": ["reprocessing-postinghandler"]
}
The call returns task id (number) that can be used to track status of the process.
Tracking status
There is another endpoint to track status of the task. For example for task id 1234
:
curl -X 'GET' \
'http://localhost:8080/internal/status/1234' \
-H 'accept: */*'
It returns counts of documents by status:
{
"id": 1234,
"requestType": "REPROCESSING",
"status": "IN_PROGRESS",
"countByStatus": {
"QUEUED": 0,
"FINISHED": 0,
"IN_PROGRESS": 9
}
}