Exit Strategy: Transitioning from Specify Cloud to a Self-Hosted Specify 7 Environment
This outline describes the process steps for moving Specify databases from a SCC-managed cloud service (usually Amazon Web Services) to servers at your own institutions or to a private-cloud infrastructure, such as a country-wide research or shared university network. Coordination with the Specify Collections Consortium (SCC) technical team would be very useful for efficiently migrating Specify databases and digital assets to a new system.
[!info]
If you are looking for information on how to configure a self-hosted Specify 7 installation, see our guide on Self-Hosting Specify 7.
1. Plan Your Self-Hosted Environment
-
Infrastructure Requirements
- A Linux host (or VM) with Docker Engine & Docker Compose installed
- Sufficient CPU, RAM, and disk for:
• MariaDB (your database size + growth)
• File storage for attachments (S3 data size)
• Specify 7 application, report runner, asset server, and web server - Public- or private-DNS entries for your web-facing Nginx proxy (e.g.
specify.example.org
)
-
Networking & Security
- Firewall rules allowing HTTP(S) and database ports
- SSL/TLS certificate for your Specify domain (Let’s Encrypt or your CA)
- Secure SSH/VPN access for administrative tasks
-
Credentials & Access
- A dedicated “IT user” database account for migrations (with
ALTER
,CREATE
,INSERT
privileges) - An administrative “master” user for everyday Specify use
- An asset-server key
- A dedicated “IT user” database account for migrations (with
2. Coordinate Data Export
First steps include assembling all data, database, assets, and environment, from the hosted instances. The SCC backs up databases and assets according to individual member agreements and will make these available to members either on request, or deposited to a location accessible to both parties on an agreed-upon schedule.
Retrieve Your Database, Assets, and Environment Configuration
If requested during the database configuration, the main point of contact for a Specify Cloud database can access the database, asset backups, and deployment environment configuration directly from an S3 bucket they have full access to.
If you have not requested that backups and assets be available to you directly, please perform the following:
-
Request Your Database Dump
Emailmembership@specifysoftware.org
with:- Your Specify Cloud account name
- Preferred database dump format (plain SQL, compressed)
- Desired date for export
- Reason for request
If you mention your reason for requesting the dump is to migrate to a self-hosted setup, we will provide you with the deployment environment configuration files as well to make it a seamless transition.
-
Request Your S3 Asset Data
We’ll provide:- An S3 bucket name and region
- Temporary AWS credentials (read-only IAM user)
- A manifest of top-level asset folders
3. Receive & Store the Exported Data
-
Download the SQL Dump
Store it in your deployment directory, e.g.migrations/seed-database/dump.sql
. -
Sync or Download Attachments
On your host or a jump box:
# Install AWS CLI if needed
pip install awscli
# Configure the provided credentials
aws configure set aws_access_key_id YOUR_KEY_ID
aws configure set aws_secret_access_key YOUR_SECRET
aws configure set default.region us-east-1 # example
# Sync assets to a local directory
aws s3 sync s3://your-export-bucket/path/to/attachments/ ./migrations/attachments/
4. Prepare Your Docker Compose Deployment
-
Clone the Specify 7 Docker Examples
Choose the scenario that matches your needs (All-in-One, Just Specify 7, etc.) and copy or fork the directory (docker-compositions). -
Place Exported Data
your-deploy-dir/
├─ seed-database/
│ └─ dump.sql
├─ attachments/ ← your synced asset files
├─ docker-compose.yml
└─ .env
-
Configure
.env
ordocker-compose.yml
•MYSQL_ROOT_PASSWORD
,MYSQL_USER
,MYSQL_PASSWORD
•SEED_DB_FILE=dump.sql
•SPECIFY_ASSET_DIR=/volumes/attachments
(or your mount point)
•ASSET_SERVER_KEY=<<key from SCC>>
•ASSET_SERVER_URL=http://asset-server/web_asset_store.xml
• Domain names, ports, and any report-runner settings -
Ensure Docker Volumes
volumes:
attachments:
static-files:
webpack-output:
5. Restore the Database & Assets
- Start Database-Only Container
# Temporarily comment out other services in docker-compose.yml
docker-compose up -d mariadb
- Load the Dump
docker exec -i $(docker-compose ps -q mariadb) \
mysql -u root -p"${MYSQL_ROOT_PASSWORD}" < seed-database/dump.sql
- Verify Data
docker exec -it $(docker-compose ps -q mariadb) \
mysql -u root -p"${MYSQL_ROOT_PASSWORD}" -e "SHOW DATABASES; USE specify; SHOW TABLES;"
- Provision Attachments Volume
docker volume create attachments
docker run --rm \
-v attachments:/data \
-v "$(pwd)/attachments":/backup \
alpine \
sh -c "cp -a /backup/. /data/"
6. Launch the Full Specify Stack
-
Un-comment All Services in
docker-compose.yml
(specify7, webpack, asset-server, report-runner, nginx). -
Pull & Tag the Right Images (use
v7
for automatic updates, orv7.x.x
for pinning). -
Start Everything
docker-compose up -d
- Run Migrations with the IT User
(Django will auto-run migrations on boot, but you can run explicitly:)
docker-compose run --rm specify7 \
./ve/bin/python manage.py migrate --noinput
7. Validate & Transition
-
Access the UI at your Specify domain.
-
Log in using the master user credentials.
-
Spot-check Data:
- Collections, specimens, metadata, attachments
- Generate a test report
- Upload a new attachment and view thumbnail
-
Enable HTTPS on Nginx and update your DNS or firewall rules.
-
We will decommission the legacy cloud instance once you give the OK.
8. Post-Migration Best Practices
- Backups: Automate nightly database dumps and S3-style archive of attachments.
- Monitoring: Track container health, disk usage, and logs via
docker-compose logs
or a centralized system (Prometheus, ELK). - Security: Rotate your database and AWS credentials periodically.
- Updates: Use the
v7
tag for minor upgrades; pin tov7.x.x
if you need strict control.
Recommended IT Training for IT/Database Administrators for Local Specify Installations
To ensure your team can confidently manage and maintain a self-hosted Specify 7 environment, the Specify Collections Consortium (SCC) can provide a tailored training program covering the following areas:
A. Deployment Fundamentals
- Docker & Docker Compose basics
• Installing the Docker Engine & Compose
• Understanding images, containers, volumes, and networks - SCC’s Specify 7 Docker architecture
• Service roles (specify7, mariadb, asset-server, report-runner, nginx)
• Environment variable management (.env
, secrets, and overrides)
B. Database Administration
- MariaDB user management and permissions
- Restoring and migrating databases
- Performing schema migrations using the IT user
- Backup strategies and automated dumps
- Monitoring database health, performance tuning, and indexing best practices
C. Asset Server & Static Files
- Configuring and securing the asset-server service
- Synchronizing large object stores (from S3 to local volumes)
- Capacity planning for file storage
D. Web Server & Networking
- Nginx reverse proxy setup
- SSL/TLS certificate issuance (Let’s Encrypt) and renewal automation
- Virtual hosting for multiple databases
- Firewall rules and port management
E. Application Maintenance
- Log management and centralized logging options (ELK, Prometheus + Grafana)
- Upgrading Specify 7: tag strategy (
v7
,v7.x
,v7.x.x
), pull & restart workflow - Debug vs. production mode toggles (
SP7_DEBUG
,DEBUG_MODE
) - Routine health checks and status endpoints
F. Troubleshooting & Support
- Common failure modes (container start failures, migration errors, volume permission issues)
- Using
docker-compose logs
, container exit codes, and system logs - SCC support channels and escalation procedures
- Performing rollbacks safely using backups and pinned image tags
G. Hands-On Workshops & Documentation
- Live Workshops (2–4 hours): Guided setup in a sandbox environment
- Recorded Webinars: Step-by-step demos of deploy, backup, upgrade, and restore
- Detailed Runbooks: Written procedures customized to your environment
- Q&A Office Hours: Weekly drop-in sessions for real-time troubleshooting
Additional Feedback Questions and Answers
“Regarding place of storage of data, Specify is asked, in their final offer, to please ensure that there is a clear description of where CMS-natur’s data would be stored at present. Describe also the possibility for storage of CMS-nature’s data elsewhere (Europa or Norway), and describe what is needed to receive the data. “
The nearest Amazon Web Service (AWS) Data Center is located in Stockholm, with several additional centers available in Europe and the U.K. We have the capability to manage Specify data from any AWS center and to move databases among centers at any time. We already have several existing Specify databases in the AWS Paris Data Center, for administrative convenience we would initially plan to use the Paris data center for CMS-natur data and its backups. The capability to move CMS-natur databases to Stockholm or to any other AWS Data Center, for whatever reason, is available to us.
Intuitively, it makes sense to locate the databases as close to the users as possible, unless there are governmental restrictions on the location of national data. We have found that the location of Specify databases within the AWS ecosystem makes only a minor, if not imperceptible, difference in performance of the Specify application. The amount of server memory is a much stronger determinant of Specify 7 server and application responsiveness, particularly for batch operations, like large data uploads, or a large batch editing session involving thousands or tens of thousands of records.
“In both cases it would be reassuring if Specify in their final offer includes a description of an exercise where one actually performs a rehearsal where the worst-case scenario is anticipated. Obviously, this will have a cost, and the cost element should be included. At present CMS-natur does not expect this rehearsal ever to be performed, but the existence of it proves a likely remedy for worst case scenario, and hence security for data.
It is clarified that the scope of the exit-strategy is not to move data away from Specify, but to move the instance of the solution to a local server, along with the data. Necessary training should be included. The strategy should include price and uncertainties.”
Rehearsal of Exit Database Migration Process
The technical strategy for migrating a Specify database to a local server from the cloud, is largely described in the technical steps described above. What is missing from a full rehearsal agenda is a description of the technical human resource expertise and time needed. If the SCC performs the rehearsal exit, we have the expertise to execute all of the steps mentioned above, but will require help from CMS-natur staff to set up the local computing environment and to administrate and manage a local server. We can perform all of those actions remotely on your servers, if we have permission, but if the servers are restricted access, local technical staff will need to do the initial server account configuration and setup steps as described above.
A great advantage of using Specify is that it is designed to operate with open source or free commercial products that are widespread, robust and easily available. Linux, Docker, and our open source licensing model for Specify 7 all prevent lock-in, and should provide a sensible appreciation of lowered risks in contrast to using proprietary, commercially-licensed (not open source) code on proprietary hardware platforms. Our strong research collections community engagement governance model ensures that member institution stakeholders are an integral part of our vision and priorities.
The cost to perform such a test would be small. Most of the time required, if we wanted to test and exit from AWS to a local CMS-natur server, would be the communication and coordination involved in setting up the local environment, installing Docker and Specify 7. Although it’s hard to precisely estimate the time required, with good engagement with local technical IT staff, we should be able to coordinate and complete an exit test for a single database within a day for which we estimate a cost of $1,000 USD.
Somewhat duplicating the technical steps described above for migration, here is a summary of the steps that would be included for a rehearsal exit from a cloud-hosted Specify database to one installed on a local (institutional) server.
1. Environment Setup
- Create the Rehearsal Exit Environment: Set up a separate, isolated environment that mirrors your intended production self-hosted environment. This should include a Linux host/VM with Docker, necessary resources (CPU, RAM, storage), and networking configurations.
- Deploy Docker Containers: Use the Docker Compose files.
2. Data Export and Preparation
- Identify Exit Data: Coordinate with the SCC team to get a representative sample of your production databases.
- Receive S3 Asset Data: Get the test S3 bucket information and temporary AWS credentials to access the test asset data.
- Download the Database: Download the SQL dump and digital assets to the test environment.
3. Migration Execution
- Restore Exit Database: Load the SQL dump into the new MariaDB instance.
- Provision Digital Assets Volume: Create the attachments volume and copy the test attachments.
- Launch Specify Stack: Start all the services (Specify 7, webpack, asset-server, report-runner, Nginx) in your test environment using Docker Compose.
4. Validation and Testing
- Access Exit-Test UI: Access the Specify 7 UI in your test environment using the test domain or IP address.
- Log in and Spot-check Data: Log in and verify that the test data migrated correctly. Perform various checks by comparing data completeness between the original and the exit-test database.
- Parameter Validation: Verify that security configurations and all other customizations and parameterizations are correctly applied in the test-exit database.