Document Ingestion Troubleshooting
Overview
AlphaSense offers two ways for customers to ingest documents to our platform:
- UI Upload: Customers can upload documents using the File upload feature in the user interface or through other 3rd party integrations such as Microsoft OneNote.
- Ingestion API: Customers can programmatically push content to AlphaSense using the Ingestion API.
This document outlines common failure scenarios encountered during document ingestion and provides troubleshooting steps for users. It is recommended that users follow these troubleshooting steps before reaching out to AlphaSense support.
To ensure the stability of our platform and meet Service Level Agreement (SLA) commitments, AlphaSense has established certain limits:
- User Document Initial Ingestion: The maximum volume allowed per day is 1TB.
- User Document Incremental Ingestion: The maximum volume allowed per day is 100GB.
Please note that these limits may vary based on the terms of your contract. Refer to your contract terms for precise numbers.
Failure Scenarios
Slow Document Ingestion Rate
Description:
When documents are uploaded to AlphaSense, they undergo processing in our data ingestion pipeline before being indexed and presented on the user interface. Throughout this process, the following issues may arise:
- Document metadata may temporarily be missing.
- A low document ingestion rate may be experienced.
Triage:
AlphaSense utilizes internal queues to prioritize document processing within the pipeline. Monitoring the contents of these queues and the processing speed can help identify where slowdowns are occurring.
We are working on providing the capability to observe the queue size and processing speed via Grafana. While this feature is not currently available, it will be implemented in the future
Troubleshoot:
In order to scale up To enhance document ingestion speed, users can scale up the following services using the kubectl command:
- applications/dp-userdocprocessing
- platform-search/solr-queue-indexer-usercontent
In the near future, we aim to streamline this process for a better user experience by offering options through our development portal or a dedicated helper script.
Document Tags Shared via Ingestion API Not Displaying
Description:
After a user uploads a document via the ingestion API with custom tags, they may find that upon indexing, these tags are not visible when searching or opening the document.
Triage:
Open a document with a known private doc's docid
and verify that custom tags doesn't show up
Troubleshoot:
Solution: follow instructions to run /bulk/modify/metadata
to trigger document re-ingestion
Error Encountered When Deleting Document
Description:
Errors when deleting document via the UI or Ingestion API
Triage:
Error events are recorded in dp-eventsdataservice
log
Troubleshoot:
If the error indicates a user-permission related issue, please verify the user's permissions.
Uploaded Documents with Default Sharing Preference Not Visible to All Group Members
Description:
After a user finishes uploading documents via the UI or Ingestion API, other users in the same group are unable to see the documents.
This issue primarily applies to documents not shared using OneNote integration due to a known issue.
Triage:
Steps to reproduce
- Impersonate the user
- Upload a document via UI
- Verify that the upload is successful and document is visible
- Impersonate another user
- Verify that the uploaded document is not visible
Troubleshoot:
- Check logs of
dp-eventsdataservice
to see if there is user permission related issue indicated. Example kubectl command:
kubectl logs -ndata-platform -l 'app=dp-eventsdataservice' --tail=1000 | grep -i "error"
Example Loki query: {app="dp-eventsdataservice"}
Known Issues
Document Permissions Not Shared When Uploaded Using OneNote Integration
Due to limitations, document permissions from OneNote are not automatically synced after upload. Users are required to manually share documents."