Document Ingestion Troubleshooting
Overview
AlphaSense offers two ways for customers to ingest documents to our platform:
- UI Upload: Customers can upload documents using the File upload feature in the user interface or through other 3rd party integrations such as Microsoft OneNote.
- Ingestion API: Customers can programmatically push content to AlphaSense using the Ingestion API.
This document outlines common failure scenarios encountered during document ingestion and provides troubleshooting steps for users. It is recommended that users follow these troubleshooting steps before reaching out to AlphaSense support.
To ensure the stability of our platform and meet Service Level Agreement (SLA) commitments, AlphaSense has established certain limits:
- User Document Initial Ingestion: The maximum volume allowed per day is 1TB.
- User Document Incremental Ingestion: The maximum volume allowed per day is 100GB.
Please note that these limits may vary based on the terms of your contract. Refer to your contract terms for precise numbers.
Failure Scenarios
Slow Document Ingestion Rate
Description:
When documents are uploaded to AlphaSense, they undergo processing in our data ingestion pipeline before being indexed and presented on the user interface. Throughout this process, the following issues may arise:
- Document metadata may temporarily be missing.
- A low document ingestion rate may be experienced.
Triage:
AlphaSense utilizes internal queues to prioritize document processing within the pipeline. Monitoring the contents of these queues and the processing speed can help identify where slowdowns are occurring.
We are working on providing the capability to observe the queue size and processing speed via Grafana. While this feature is not currently available, it will be implemented in the future
Troubleshoot:
In order to scale up To enhance document ingestion speed, users can scale up the following services using the kubectl command:
- applications/dp-userdocprocessing
- platform-search/solr-queue-indexer-usercontent
In the near future, we aim to streamline this process for a better user experience by offering options through our development portal or a dedicated helper script.