Maximum call stack size exceeded

Hi,
we are importing some of our legacy datasets, and we discovered a new type of error: Uncaught RangeError: Maximum call stack size exceeded.

It’s a pretty large dataset with >150.000 rows and >100 columns.

We are wondering that is there a way to expand the limit? If yes, how should we proceed, if not, how could we divide and conquer - less columns and/or less rows? For the sake of fewer round of validations we would like to keep the dataset as big as possible.

Used version: 7.9.6.2

Thanks!

My understanding is that there isn’t a way to increase the call stack of a particular browser, but that the maximum size does differ by browser. You could try a different browser and see if the validation fits in the stack. The issue is that too many functions are being added to the stack (through nesting, recursion) without the original function being kicked.

We have done chunks of 25,000 before for a collection of 187,000, split via the script that was done for general migration cleaning. Something like this

def export_chunks(df) -> None:
    chunk_size = 25000
    list_df = [df[i : i + chunk_size] for i in range(0, df.shape[0], chunk_size)]
    counter = 0
    for n in range(0, int(len(df) / chunk_size + 1)):
        list_df[n].to_csv(
            f"processed/chunk{n+1}.tsv", # Python is zero indexed, so + 1 
            sep="\t",
            encoding="utf-8",
            index=False,
        )
        counter += 1
    return None

However, after going through the experience of 25k validations, I am of the opinion now that smaller chunks are actually better. If you have a large validation, you need to wait longer to receive feedback. Second, disambiguation errors within the same workbench dataset can be more difficult to solve than disambiguation with records already in the system. It is definitely a dance, super small chunks can feel inefficient, but I’m not convinced bigger chunks are always better.

I am curious just how many rows it takes to cause the overflow on various browsers, (hunch is that columns don’t matter), will see if I can run some tests this afternoon.

EDIT: Walking back that hunch, workbench has no problems chewing through 100k rows if its just one table. Tree fields do seem to add a bunch of computation time, have not yet been able to get it to overflow in Firefox v128.7.0esr

1 Like

I’m beginning to think that this may be caused by something other than purely the number of rows or columns, unless the problem only manifests itself at truly large dimensions. Have been running a workbench this afternoon, 250,000 rows by 10 columns, and no errors. @ZsPapp, would you be able to provide more detail in terms of when the error occurs? Does the validation get to a certain number before crashing?

Also, I am impressed by your patience! Just loading and waiting for the validation on this test set is hours, I can’t imagine how long 150,000 by 100 would take on an actual database.

Trying to think if there could be any recursion introduced via the mapping, similar to how you can make a recursive query.

1 Like

Thank you @markp for looking into this!

I tried two different browsers (Chrome and Edge), both of which produced the same error.

I’m considering splitting the dataset into smaller chunks - however, out of curiosity, I’m still interested in investigating this issue.
My message would have been misleading: the problem did not occur during validation - it happened a few seconds after I clicked on the ‘Import File’ button. So I didn’t even reach the validation phase, not even mapping columns.
I’m happy to send the Crash report, if it helps.

And yes, validation is a game of patience - also, it gives plenty of opportunities for multitasking. :slight_smile:

@markp – Thank you so much for your help and for digging into this before we could!

Hi @ZsPapp,

If you can provide more details about the dataset or the specific configurations used, it would help us better understand the root cause. If you share the exported dataset along with a copy of your upload plan, we can attempt to recreate this issue in several browsers and see if we observe the same behavior.

@markp shares our perspective on large data sets as well. We typically recommend splitting uploads into reasonably sized chunks whenever possible. This approach allows for faster validation, quicker uploads, and easier disambiguation, all of which are significant advantages.

Chrome and Edge both use the same underlying platform, known as Chromium. Since this serves as the foundation for both browsers, they share core features and functionalities, so it makes sense they are behaving similarly. You may want to try using Firefox for your upload as it may have a different restriction?

Thank you!

Thank you for the additional information regarding when the error occurs.

Here is what I got on 7.9.2

Firefox

128.7.0esr

Dimensions Import Outcome
250,000 by 100 :white_check_mark:

Note although you can successfully import 250k, your browser will likely slow to a crawl when previewing in the workbench.

Chromium

134.0.6998.35-1~deb12u1

Dimensions Import Outcome
100,000 by 100 :white_check_mark:
150,000 by 100 :white_check_mark:
250,000 by 100 :x: Uncaught RangeError: Invalid array length within the parsing operation

So perhaps, a crude estimate, the limit exists within the 150k to 250k range on chromium based browsers.

1 Like