Encoding issues in Workbench

Isn’t Workbench supposed to take UTF-8 as is? I’ve uploaded the following UTF-8 encoded file:

specimenAA (2).tsv (4.0 KB)

The file itself show special characters as they are:

image

However, as soon as it arrives in Workbench it’s turned into this:

image

What to do?

Hi @fedoras,

After uploading this data set using Specify 7.8.11, there does not appear to be an encoding issue.

You can select the character encoding and delimiter when creating a new data set from an imported file. Are you on an earlier version of Specify 7 when this is happening?

Thanks. We’re currently stuck at v7.7.5, because we can’t upgrade until we migrate from RHEL7 to RHEL8 servers, which is dragging out.

We have updated to the latest version of Specify7, but the dropdownlist for specifying the encoding still does not appear. What is going on?

image

Hi @fedoras,

I am on v7.9.5 here and uploaded the TSV file you attached at the beginning of this thread and can see the encoding options. I tried with a few other CSV files as well and am seeing the option

Can you share the CSV file you are uploading with me? You can send it via email or private message if the contents are sensitive.

Thank you!

Hi!

This is an example file right here:

NHMADtest 20240213.csv (4.5 KB)

Also, I don’t get encoding options with my original file either. I am on 7.9.5 too!

image

Was going to comment that I could replicate @fedoras’s issue as I remember doing an upload yesterday and specifically looking for this option, however today the encoding option has appeared. As I doubt a hotfix was applied in the 4 hours since this was posted, the only other difference is that today I am on a mac and yesterday I was using a windows machine. I could perform further testing on Monday to gather more information.

Thank you, Mark! Yes, indeed one other end user reported that she gets the dropdownlist when working from her Mac. There’s something going on here with this. Maybe the dropdownlist disappears when the format is autodetected? I would prefer it to be always there even so.

Confirming my comment on Friday now that I am back on windows, encoding dropdown not present. Seems we have some evidence that the operating system may be the culprit.

Hi everyone!

I have done a little investigating into this issue and have uncovered some interesting results and potential workarounds.

I was able to recreate the issue on Windows using Google Chrome Version 125.0.6422.142 (64-bit).
Expectedly, I was not able to recreate the issue on macOS using Google Chrome Version 125.0.6422.142 (arm64).

For delimiter-separated files, Specify 7 allows specifying the character encoding and delimiter. For excel based files, Specify 7 does not allow customizing these characteristics.
Thus the problem is because specify is (incorrectly) inferring the file is excel based.

Specify uses the browser’s built-in File API to get the file’s MimeType and determine if the file is delimiter-separated or excel based.
There seems to be a difference in the browser’s File API for arm and the standard 64-bit versions where the arm builds can correctly infer the MimeType while the standard 64-bit versions can not.

In all of the files provided in the following posts:

I was able to workaround this issue by ensuring the file has a .csv extension. If this workaround is not sufficient, what browser and version of browser are you using?

If .csv files (and their contents) are still not being interpreted as delimiter-separated, then the browser’s File API is likely determining the MimeType as something which Specify does not recognize.
Assuming this is the case, you can still work around this and force Specify to recognize the file as delimiter-separated by changing the file extension to psv.

Hi @jason_m,

I am not able to solve by having the csv extension using Firefox version 127.0 64-bit, Windows 10.

I am able to solve by switching to a .psv file extension, but this is slightly impractical. I have found that using the .txt extension also causes the encoding picker to appear, and would prefer to use that instead unless there is a specific reason why psv was chosen.

Hi @markp!
I’m glad that the .txt file extension worked for your use case!

There was a reason as to why .psv was mentioned explicitly. When determining whether the data set is delimiter-separated or excel based, if Specify can not determine the file’s MimeType, it checks if the file has a .psv extension and if so, it always assumes the file is delimiter-separated. Otherwise if the file does not have a .psv extension and the Browser/Specify can not determine the MimeType, the file is determined to be excel based (which is what ultimately caused this issue).

If any are interested, the direct lines in the code base which handle the aforementioned logic I have provided below.

The .txt extension worked in your case because the Browser and Specify recognized the MimeType of the file. This might not be the case for all Browsers.

1 Like