I do not have advice to offer, but instead am looking for it. From people who have already created a Specify database for your collection(s), what is some advice you wish you’d known? What are the most important questions to ask or issues to hammer out before beginning?
From the research I’ve done I’ve managed to identify two key issues:
Organization of the database
If you want to publish, mapping data fields to required or recommended Darwin Core fields for exporting
Are there other issues to consider? Do you have advice within those issues? Are there differences that one might not expect when setting up a database for different scientific fields? Tell the newbies what you’ve learned with your hard-won experience.
I wish I could go back in time (not that I am unhappy with how things turned out, but I have learned a ton from three years ago). Below are my own thoughts, Grant has written a great set of articles recently on many things, including the Data Conversion Process
Architecture and setup choices
From the get-go, decide with all stakeholders whether you are going to aim to standardize across units as part of the transition, or use the transition to give units greater flexibility. This will inform your choices around whether you want multiple databases, disciplines, divisions etc. Disciplines are especially important because this is the level at which the internal schema tool is shared.
Having multiple divisions should be given careful consideration, in most instances, I would argue you want one division.
While you can preload the taxon tree, you don’t have to, and instead the taxon tree can be populated with only the taxon that the collection already has when the data is imported. This can be decided on a discipline level. Benefit: cleaner taxon tree that doesn’t contain unnecessary nodes. Drawback: the collection will have to enter new nodes later, increasing risk of spelling mistakes. Consider how likely it is that the collection will take in new taxon that it has not seen before. The same applies to the Geography tree, though most decide to stick with the tree pre-imported.
Data Cleaning, Standardization and Import
I recently stumbled upon pointblank, I would have used this for cleaning back and forth with collections if it was available back then. It should support most of the business rules that Specify would apply through the workbench, with the exception of disambiguation.
Split your workbench uploads into chunks, 3000-5000 seems optimal.
Standardize the schema across all collections in a discipline wherever possible.
To your point about DwC, you don’t have to, but it is nice to try to utilize the darwin core equivalents wherever possible. However, fields don’t need to go by their DwC names. Many managers will disagree that recordedBy makes sense for the name of the collector, and furthermore, Specify will split that into firstName, lastName anyways, so it can get very confusing trying to teach Specify terminology plus DwC at the same time. You can leave the DwC mapping for after you have data in the system, it is always easy to go from a Specify field to DwC terms later when mapping for publication. Note that there are also Darwin Core Data Packages on the horizon!
Forms
Don’t use the label= argument in form definitions for regular fields. This creates the opportunity for a field to have one caption in the schema, and another when viewed on the form. There should be a single source of truth (the schema definition) and it is better to use the schema because changes are applied across all collections in the discipline, and fields are easier to track/hide etc.
When first rolling out, it sometimes makes sense to disable the button beside querycomboboxes for a few weeks until users grasp the concept of relational databases. Otherwise, they will edit a taxon thinking that they are just editing information for a single record. Erroneous additions are easier to fix later through merging.
I could go on but will leave it at that for now, always happy to chat if you have any specific questions! Others will also have some great advice I am sure, and I’ll note that Specify team has some consulting pricing options