Profiling JavaScript Apps

Profiling JavaScript Apps

Processing bulk load on the front-end

In our front-end application we have a configuration section, in which the users add items to blacklists and whitelists used to perform traffic shaping. The simplest example is application id list as shown below.

As adding items individually was a tedious process, we added a bulk loading feature. The user provides a CSV style input into a text field that we load and transform each line into a configuration item. The second screenshot shows how the bulk loader looks after having processed these items.

The app was tested with up to a thousand of entries. After being deployed to production, multiple customers used the bulk loader for several months and experienced a quick response time measuring less than five seconds.

After some months, a new customer was trying to use the same feature with approximately 30,000 items. The application froze until the web browser declared the app as not responding and bulk loading simply didn’t complete. This led us to analyse the validity of the use case. Alternative solutions like file upload were discussed, but we decided to first determine where the issue was, regardless of new features to add.

Isolating the problem with Chrome developer tools

Chrome comes with a powerful set of developer tools for inspecting different aspects of web apps. One of these tools is the network activity. You can use the network activity tab to see which requests are being made, the responses and also the response time.

In our particular problem, the network tab was useful to identify that the issue was not the initial input normalization. The application was parsing the input and submitting the parsed content to server side processing. The response from server was completed in less than two seconds. The application was hanging trying to process the answer from server. The screenshot below shows the network tab for our application after a few interactions. The server side normalization request appears in the screenshot named as 'app'.

As you can see in the below image, the app took less than two seconds to receive the response back from the backend. After a code review, nothing obvious was identified that could cause any issue. When bulk loading less than 1,000 items, the client-side processing was instantaneous. When bulk loading 5,000 items, the app would freeze for a few seconds but complete normally.

Using Chrome's CPU Profiling to identify the source of the problem

Another useful tool that comes with Chrome is the profiler, which can be used to determine where CPU or memory is being spent on the app. Since Chrome's developer documentation provides detailed information on the profiler, we won't go into details here. More information on how to use Chrome CPU Profiler can be found at https://developer.chrome.com/devtools/docs/cpu-profiling. 

With this new tool at hand this exercise becomes simple. Just start the profiler, use the feature and we are done identifying where the problem is, right?

The catch here is that profiling is highly CPU demanding. Trying to load 27,000 items hangs the browser completely to a point where the developer tools crashed. Then again the profiler crashed with 5,000 items, which was a number that worked when profiling was disabled. When trying with 500 or 1,000 items, the bulk loader would complete quickly (even with profiling) but the profiler results would not point to any particular area as taking more time than others. Finally with 2,000 items, it took a few seconds and completed.

Analyzing the profiling results to solve the mystery

The following is what the profiler result looked like:

By summing the percentages in the self column, we see that functions coming from `ramda.js` take ~73% of CPU time. We make heavy usage of a functional programming library for JavaScript called Ramda (aliased in code to `R`). Because of this usage, multiple occurrences of these functions do not immediately surprise us.

Analyzing the stack by expanding the triangle besides the function name, it is possible to track what is calling these functions. All were ultimately under `R.uniq`. It is a single function that was causing the bottleneck in the processing. The remaining processing time was mostly related to application execution itself and the React rendering pipeline.

One of the steps of processing the response from server is removing duplicated items against preexisting items. For that we used `Ramda.uniq`, after appending the received list to the existing item list:

const updatedList = R.uniq(R.concat(receivedList, existingList))

The issue was that the implementation of R.uniq used R.equals with every pair combination from the list. This means that the number of interactions grows exponentially with the size of the list.

Optimizing the function causing the performance issue

Since most of the processing time was spent on a single function, the solution was finding an alternative implementation for that particular function:

export function unique_by(ar, prop) {
  var f = {},
    i = 0,
    l = ar.length,
    r = [];
  while (i < l) {
    !f[ar[i][prop]] && r.push(ar[i]);
    f[ar[i++][prop]] = 1;
  }
  return r;
}

In this function, instead of checking if items added were equal to existing ones, the solution adds each item as a key to a map (Object). This way, duplicates are removed automatically (overridden). In this function, I am also making use of the fact that objects being checked have a well known format and have a particular property that is unique. Instead of using the whole object as a key, I only use one of its properties.

const updatedList = unique_by(R.concat(receivedList, existingList), 'key');

Note that this optimized implementation of `unique_by` doesn't follow some functional programming principles. It relies heavily on data mutation and while loops. It however, is still referentially transparent. The result depends only on the input. Calling this function doesn't have any side effects.

At the time of writing this blog post, Ramda had already released a new version which has a better implementation for R.uniq. We have decided to keep our version of this function as it is optimized to our specific use.

After this change, the processing of response was not the slowest part of bulk loading anymore. The app was now able to process an input of 70K items under 8s on the same machine that was crashing with ~10K items.

What we learned

Modern browsers come with powerful tools for measuring and debugging front-end applications. Chrome's CPU Profiler in particular is useful to pin down processing bottlenecks. When dealing with performance issues, we need to be able to measure in order to identify what is happening to our app before we can think of changing the app.

Another learning that came from this experience was that when designing capabilities that handle multiple items, like the bulk loader, we need to think of about the limitations of the system. I.e. How many items are supported. This yielded a change in our design process to include the definition of system upper boundaries when adding new features or creating new applications.

Next time you are designing a feature, think about what are the boundaries of the system. For example, if you are adding a feature that should be expected to handle 10,000 items, what happens when the usage passes that limit?