Skip to main content
Version: v1.7.0

Document Ingestion Troubleshooting

Overview

AlphaSense offers two ways for customers to ingest documents to our platform:

  • UI Upload: Customers can upload documents using the File upload feature in the user interface or through other 3rd party integrations such as Microsoft OneNote.
  • Ingestion API: Customers can programmatically push content to AlphaSense using the Ingestion API.

This document outlines common failure scenarios encountered during document ingestion and provides troubleshooting steps for users. It is recommended that users follow these troubleshooting steps before reaching out to AlphaSense support.

To ensure the stability of our platform and meet Service Level Agreement (SLA) commitments, AlphaSense has established certain limits:

  • User Document Initial Ingestion: The maximum volume allowed per day is 1TB.
  • User Document Incremental Ingestion: The maximum volume allowed per day is 100GB.
Note:

Please note that these limits may vary based on the terms of your contract. Refer to your contract terms for precise numbers.

Failure Scenarios

Slow Document Ingestion Rate

Description:

When documents are uploaded to AlphaSense, they undergo processing in our data ingestion pipeline before being indexed and presented on the user interface. Throughout this process, the following issues may arise:

  • Document metadata may temporarily be missing.
  • A low document ingestion rate may be experienced.

Triage:

AlphaSense utilizes internal queues to prioritize document processing within the pipeline. Monitoring the contents of these queues and the processing speed can help identify where slowdowns are occurring.

Note:

We are working on providing the capability to observe the queue size and processing speed via Grafana. While this feature is not currently available, it will be implemented in the future

Troubleshoot:

In order to scale up To enhance document ingestion speed, users can scale up the following services using the kubectl command:

  • applications/dp-userdocprocessing
  • platform-search/solr-queue-indexer-usercontent

In the near future, we aim to streamline this process for a better user experience by offering options through our development portal or a dedicated helper script.

Document Tags Shared via Ingestion API Not Displaying

Description:

After a user uploads a document via the ingestion API with custom tags, they may find that upon indexing, these tags are not visible when searching or opening the document.

Triage:

Open a document with a known private doc's docid and verify that custom tags doesn't show up

Troubleshoot:

Solution: follow instructions to run /bulk/modify/metadata to trigger document re-ingestion

Error Encountered When Deleting Document

Description:

Errors when deleting document via the UI or Ingestion API

Triage:

Error events are recorded in dp-eventsdataservice log

Troubleshoot:

If the error indicates a user-permission related issue, please verify the user's permissions.

Uploaded Documents with Default Sharing Preference Not Visible to All Group Members

Description:

After a user finishes uploading documents via the UI or Ingestion API, other users in the same group are unable to see the documents.

Note:

This issue primarily applies to documents not shared using OneNote integration due to a known issue.

Triage:

Steps to reproduce

  • Impersonate the user
  • Upload a document via UI
  • Verify that the upload is successful and document is visible
  • Impersonate another user
  • Verify that the uploaded document is not visible

Troubleshoot:

  • Check logs of dp-eventsdataservice to see if there is user permission related issue indicated. Example kubectl command:
kubectl logs -ndata-platform -l 'app=dp-eventsdataservice' --tail=1000 | grep -i "error"

Example Loki query: {app="dp-eventsdataservice"}

Known Issues

Document Permissions Not Shared When Uploaded Using OneNote Integration

Due to limitations, document permissions from OneNote are not automatically synced after upload. Users are required to manually share documents."