Skip to main content
Version: v1.7.0

Backup and Restore

Your AlphaSense Enterprise Intelligence Private Cloud runs on top of cloud-managed services and native Kubernetes. Depending on your organization's Recovery Point Objective (RPO), you should adjust the backup interval accordingly.

What Needs to be Backed Up

MySQL Cluster

The MySQL cluster holds your organization's user info (username, email), orchestration events (document ID, alerts subscription ID, etc.), AlphaSense-owned company ticker ID, etc.

We recommend using cloud-managed MySQL database services (AWS Aurora, Azure Database, GCP Cloud SQL). Please follow your cloud provider's best practices to configure your backup strategies:

In case you decide to self-host a MySQL cluster, please follow MySQL backup and recovery.

DynamoDB and ScyllaDB

DynamoDB (AWS) and ScyllaDB (GCP, Azure) hold ingested documents metadata, fast storage for AI features (sentiment, themes, ...) and ack as distributed lock for document processing events.

DynamoDB is fully managed service in AWS, AlphaSense manages tables and their schema automatically with crossplane. For additional backup configuration, please follow Backup and restore for DynamoDB.

In GCP and Azure, AlphaSense uses ScyllaDB instead of DynamoDB. AlphaSense automates backing up ScyllaDB to GCP cloud storage and Azure blob storage. Since ScyllaDB cluster is a self-hosted cluster, run as kubernetes statefulset pods with persistent volumes, backing up kubernetes persistent volumes below provides additional backup for ScyllaDB cluster.

Kubernetes Persistent Volumes

Kubernetes persistent volumes hold AlphaSense Solr cluster indexes, Elasticsearch data (ticker and keyword suggestions), MongoDB data (activity events), and Redis persistent logs. Solr, Elasticsearch, and MongoDB are critical database clusters, and they store data in native Kubernetes persistent volumes. You should back up the underlying cloud block storage volumes based on your organization's RPO policy.

There is also the open source tool Velero which supports AWS, GCP, and Azure for backing up and restoring persistent volumes.

Object Storage

Object storage buckets/containers hold original and processed contents (email alerts), documents you ingested, monitoring metrics and logs, buffer space for large payloads in messaging layer communication, etc. Cloud-managed object storage (AWS, GCP, Azure) provides very high data availability and durability by default. If your organization has compliance needs for auditing, you can use built-in tools provided by your cloud provider to move data to cold storage.

Messaging Layer

The messaging layer (topics, queues) holds in-flight messaging events for application communication.

AlphaSense has a built-in retry mechanism and dead letter queue configuration, so customers do not need to back up this data.

In case of outages, customers can re-ingest documents or use AlphaSense's built-in reprocessing-manager feature to re-process and re-index documents.