Hi,
we are importing some of our legacy datasets, and we discovered a new type of error: Uncaught RangeError: Maximum call stack size exceeded.
It’s a pretty large dataset with >150.000 rows and >100 columns.
We are wondering that is there a way to expand the limit? If yes, how should we proceed, if not, how could we divide and conquer - less columns and/or less rows? For the sake of fewer round of validations we would like to keep the dataset as big as possible.
My understanding is that there isn’t a way to increase the call stack of a particular browser, but that the maximum size does differ by browser. You could try a different browser and see if the validation fits in the stack. The issue is that too many functions are being added to the stack (through nesting, recursion) without the original function being kicked.
We have done chunks of 25,000 before for a collection of 187,000, split via the script that was done for general migration cleaning. Something like this
def export_chunks(df) -> None:
chunk_size = 25000
list_df = [df[i : i + chunk_size] for i in range(0, df.shape[0], chunk_size)]
counter = 0
for n in range(0, int(len(df) / chunk_size + 1)):
list_df[n].to_csv(
f"processed/chunk{n+1}.tsv", # Python is zero indexed, so + 1
sep="\t",
encoding="utf-8",
index=False,
)
counter += 1
return None
However, after going through the experience of 25k validations, I am of the opinion now that smaller chunks are actually better. If you have a large validation, you need to wait longer to receive feedback. Second, disambiguation errors within the same workbench dataset can be more difficult to solve than disambiguation with records already in the system. It is definitely a dance, super small chunks can feel inefficient, but I’m not convinced bigger chunks are always better.
I am curious just how many rows it takes to cause the overflow on various browsers, (hunch is that columns don’t matter), will see if I can run some tests this afternoon.
EDIT: Walking back that hunch, workbench has no problems chewing through 100k rows if its just one table. Tree fields do seem to add a bunch of computation time, have not yet been able to get it to overflow in Firefox v128.7.0esr
I’m beginning to think that this may be caused by something other than purely the number of rows or columns, unless the problem only manifests itself at truly large dimensions. Have been running a workbench this afternoon, 250,000 rows by 10 columns, and no errors. @ZsPapp, would you be able to provide more detail in terms of when the error occurs? Does the validation get to a certain number before crashing?
Also, I am impressed by your patience! Just loading and waiting for the validation on this test set is hours, I can’t imagine how long 150,000 by 100 would take on an actual database.
Trying to think if there could be any recursion introduced via the mapping, similar to how you can make a recursive query.
I tried two different browsers (Chrome and Edge), both of which produced the same error.
I’m considering splitting the dataset into smaller chunks - however, out of curiosity, I’m still interested in investigating this issue.
My message would have been misleading: the problem did not occur during validation - it happened a few seconds after I clicked on the ‘Import File’ button. So I didn’t even reach the validation phase, not even mapping columns.
I’m happy to send the Crash report, if it helps.
And yes, validation is a game of patience - also, it gives plenty of opportunities for multitasking.
If you can provide more details about the dataset or the specific configurations used, it would help us better understand the root cause. If you share the exported dataset along with a copy of your upload plan, we can attempt to recreate this issue in several browsers and see if we observe the same behavior.
@markp shares our perspective on large data sets as well. We typically recommend splitting uploads into reasonably sized chunks whenever possible. This approach allows for faster validation, quicker uploads, and easier disambiguation, all of which are significant advantages.
Chrome and Edge both use the same underlying platform, known as Chromium. Since this serves as the foundation for both browsers, they share core features and functionalities, so it makes sense they are behaving similarly. You may want to try using Firefox for your upload as it may have a different restriction?