Backup and Restore
Your AlphaSense Enterprise Intelligence Private Cloud runs on top of cloud-managed services and native Kubernetes. Depending on your organization's Recovery Point Objective (RPO), you should adjust the backup interval accordingly.
What Needs to be Backed Up
MySQL Cluster
The MySQL cluster holds your organization's user info (username, email), orchestration events (document ID, alerts subscription ID, etc.), AlphaSense-owned company ticker ID, etc.
We recommend using cloud-managed MySQL database services (AWS Aurora, Azure Database, GCP Cloud SQL). Please follow your cloud provider's best practices to configure your backup strategies:
- AWS backup and restore Aurora DB cluster
- GCP Cloud SQL backup
- Azure Database for MySQL backup and restore
In case you decide to self-host a MySQL cluster, please follow MySQL backup and recovery.
DynamoDB and ScyllaDB
DynamoDB (AWS) and ScyllaDB (GCP, Azure) hold ingested documents metadata, fast storage for AI features (sentiment, themes, ...) and ack as distributed lock for document processing events.
DynamoDB is fully managed service in AWS, AlphaSense manages tables and their schema automatically with crossplane. For additional backup configuration, please follow Backup and restore for DynamoDB.
In GCP and Azure, AlphaSense uses ScyllaDB instead of DynamoDB. AlphaSense automates backing up ScyllaDB to GCP cloud storage and Azure blob storage. Since ScyllaDB cluster is a self-hosted cluster, run as kubernetes statefulset pods with persistent volumes, backing up kubernetes persistent volumes below provides additional backup for ScyllaDB cluster.
Kubernetes Persistent Volumes
Kubernetes persistent volumes hold AlphaSense Solr cluster indexes, Elasticsearch data (ticker and keyword suggestions), MongoDB data (activity events), and Redis persistent logs. Solr, Elasticsearch, and MongoDB are critical database clusters, and they store data in native Kubernetes persistent volumes. You should back up the underlying cloud block storage volumes based on your organization's RPO policy.
- AWS EBS backup and restore
- GKE backup and restore persistent storage
- Azure backup Azure Kubernetes Service
There is also the open source tool Velero which supports AWS, GCP, and Azure for backing up and restoring persistent volumes.
Object Storage
Object storage buckets/containers hold original and processed contents (email alerts), documents you ingested, monitoring metrics and logs, buffer space for large payloads in messaging layer communication, etc. Cloud-managed object storage (AWS, GCP, Azure) provides very high data availability and durability by default. If your organization has compliance needs for auditing, you can use built-in tools provided by your cloud provider to move data to cold storage.
Messaging Layer
The messaging layer (topics, queues) holds in-flight messaging events for application
communication.
AlphaSense has a built-in retry mechanism and dead letter queue configuration, so customers do not
need to back up this data.
In case of outages, customers can re-ingest documents or use AlphaSense's built-in reprocessing-manager feature to re-process and re-index documents.