A Quick Burp Suite Sequencer Lab

Gavin Watson

Co-Founder

Gavin is an experienced Security Engineer adept at leading teams through complex testing to mitigate security threats.

A Quick Burp Suite Sequencer Lab

Introduction to Sequencer

Burp Suite Pro is arguably one of the most popular Web Application Testing tools available, and one that I myself have used for many years. It provides a variety of powerful automated and manual tools to interrogate applications and identify vulnerabilities. Yet there are certain tools within Burp Suite that most testers seldom use. One such tool is called ‘Sequencer’, a powerful automated solution for finding weaknesses in the ‘randomness’ of token values. One ‘practical’ application is to find anomalies in the apparently random generation of session token values, and use this information to predict valid values and ultimately hijack an authenticated user’s session.

One reason that Sequencer is rarely used by testers, is that developers rarely build a ‘custom’ session solution likely to have flaws. The session management solutions offered by most languages and frameworks are well-established and consequently typically robust, and are unlikely to have a leverageable weakness in their randomness. Additionally, the quantity of possible session tokens in a typical implementation is an astronomical figure, so even if a weakness was identified, the chances of exploiting it to predict a valid token is usually insignificant enough to render an attack unfeasible.

As a learning exercise, I decided to build a session token solution so tragically flawed that Sequencer could easily identify significant weaknesses. I was keen to understand how best to interpret the results in order to build a targeted attack. Hopefully, fellow testers that have themselves rarely dived into the enigmatic settings and results of Sequencer may find this blog post useful, or at least mildly interesting.

What a Good Session Token Looks Like

To provide a basis for comparison with our broken session solution, we need to first establish what a reasonable token would look like. To provide this example, a simple PHP page is created that generates a fresh ‘PHPSESSID’ session token on each request.
<?php
session_start();
session_regenerate_id();
?>
When requesting the page four times, the following session token values are received:
Set-Cookie: PHPSESSID=a348b22db2009fbb240e30bc3e3c9dfe; path=/;
Set-Cookie: PHPSESSID=3046f1c07a72be1739baf1c6f6fa7ea9; path=/;
Set-Cookie: PHPSESSID=ff0c1e142c8c3f7b9781aabbcfb95218; path=/;
Set-Cookie: PHPSESSID=412f4fbf3932ee3a2fd3c6b0902fc618; path=/;
The use of session regeneration here is important, as Sequencer will need to gather large quantities of different token values. In a real world scenario, the tester would need to identify a request that results in a response that sets a new token value. This could simply be a request that does not contain a cookie value (resulting in one being set by the application), or it could be a request for the logout page, which could invalidate the current session and set a new unauthenticated token.

We can send one of these requests to Burp Sequencer by right-clicking the request in the ‘Proxy’ > ‘HTTP history’ tab and selecting ‘Send to Sequencer’.

Sequencer will automatically search the response for a valid cookie value and populate the ‘Live capture’ tab configuration. However, a ‘Custom location’ supporting regular expression matching is also supported.

The tab also includes the usual options for throttling requests as found in the ‘Scanner’ and ‘Intruder’ options.

The ‘Manual load’ tab offers the ability to import a list of tokens that have already been captured by some other means, such as by another tool or script.

The ‘Analysis options’ tab allows the user to configure options such as padding and base64 decoding tokens before they are subjected to analysis, as well as specifying what types of specific analysis methods should be used. For the purposes of this lab, these settings all remain at their default values.

Once ‘Start live capture’ is clicked, Burp will repeat the request (potentially thousands of times) and gather the tokens from the responses. The ‘Auto analyze’ tick box can be clicked to see results as soon as they are available as the tokens are being gathered. If you suspect that the tool is not successfully gathering different token values, you can click ‘Copy tokens’ and paste the contents into a text editor to confirm the values (if any) that Burp has collected.

The results are presented in three tabs, ‘Summary’, ‘Character-level analysis’, and ‘Bit-level analysis’. A fourth tab ‘Analysis Options’ presents the same options as those available before launching the live capture.

The ‘Summary’ Tab

This first tab has four sections covering the results from a high-level perspective, and can be used to quickly determine whether there are issues worth investigating. In our example, the ‘Overall result’ of analysis of 20,000+ tokens suggests that the solution is robust (with the randomness estimated to be ‘excellent’).

The ‘significance level’ is a measure of what strength of evidence is required to reject the hypothesis that the tokens are being generated in a truly random way. The lower this level is, the stronger the evidence needs to be. Typically this value is between 1% and 5%, though Burp Suite’s Sequencer uses 0.002% to 0.03% as part of ‘FIPS tests for randomness’.

The ‘effective entropy’ is a complicated term, but in its simplest sense can be thought of as the range of possible values that analysis has confirmed as being sufficiently random. So if the token at each bit position has a large set of possible values (a large character set) and analysis of the values determined that they were being chosen randomly (based on the significance level), then the token could be considered to have a high level of effective entropy. In this example, the effective entropy is 115 bits.

The summary includes a chart showing the effective entropy at key significance levels. Finally, details regarding the data’s reliability and sample size are displayed.

The ‘Character-Level Analysis’ Tab

This ‘Summary’ sub-tab is the first to begin breaking down the results, in this instance it is character by character. The chart shows the ‘confidence’ in the randomness of the data at each character position. In our example, we’re using the PHPSESSID which is a 32 character hexadecimal value, and the chart shows the values to be sufficiently random at each position.

The ‘Count’ sub-tab displays the distribution of characters at each position in the token value. So if there are positive or negative biases towards certain characters, or if parts are always the same value, then we should expect to see the anomaly highlighted clearly in this chart.

In our example we can see a good uniform distribution of characters, a trait that is expected if the values are being chosen at random.

The ‘Transitions’ sub-tab includes a chart showing the confidence that any one character in each position is followed by any other possible character from the set. So if the character ‘b’ was followed by ‘5’ more often than would be expected from a random selection, the anomaly would be highlighted here.

The final ‘Character set’ sub-tab includes a chart showing the character set observed at each position in the token. So if the first character of the token was only ever observed as being, ‘1’, ‘2’, or ‘3’ and never any other character observed at the other positions, then the anomaly would be highlighted.

The ‘Bit-level analysis’ tab includes a series of sub-tabs that cover similar statistics to that described above, only at the ‘bit’ rather than ‘character’ level. Charts displaying the results in terms of FIPS tests such as the ‘FIPS monobit test’,’FIPS poker test’, ‘FIPS runs test’, and ‘FIPS long runs test’ as well as others are included in their own sub-tabs. However, even a high-level explanation of these tabs is well beyond the scope of this blog.

What a Bad Session Token Looks Like

Now that we have established a basis for comparison, we can create and examine a very bad session token solution. I’m not going to explain in detail exactly how these tokens are generated, as the purpose here is to examine the tokens with Sequencer, and let the tool identify the flaws.As before, the application issues a fresh token on each request, but this time using our broken solution rather than the robust PHP implementation. The following responses are received:
Set-Cookie: BADSESSID=d3b82492267c9277; path=/;
Set-Cookie: BADSESSID=002b94b096f7f0a2; path=/;
Set-Cookie: BADSESSID=9fa0a42b06d4f2a7; path=/;
Set-Cookie: BADSESSID=3fb8940906b4f244; path=/;
The ‘BADSESSID’ token is 16 characters in length and hexadecimal, and should therefore provide a gigantic amount of possible keys. Looking at the four tokens generated above, the keen eyed will notice that the characters at position six and ten are the same in each instance. The probability of this occurring in a good system are very slim indeed, so we would be justified in our suspicions that something is very wrong.

As before, we will send one of the above requests to Sequencer and examine the results to see how they compare.

After 20,000+ tokens are gathered and analysed, the summary doesn’t look great for our token, with the ‘quality of randomness’ estimated to be ‘extremely poor’. With a ‘significant level’ of 1% the effective entropy is estimated to be 13 bits, or 2¹³. That is certainly not a great result, considering that a hexadecimal token of this length should be achieving upwards of 45 bits of effective entropy.

So at this stage, we know the token generation is extremely poor, but we don’t yet know exactly how.

Looking at the ‘Character-level analysis’ and ‘Summary’ sub-tab, we can begin to see the first clues as to what is happening. The level of confidence of the randomness of the characters at positions 5, 9, 10, 11, 12, 13, 14, and 15 is essentially zero, meaning that these characters are either fixed or massively biased.

When looking at the results in the ‘Count’ sub-tab, the chart is almost identical to that shown in the ‘Summary’ sub-tab. However, below the chart is a list of specific ‘Anomalies’ and these give us insight into the biases that the chart is suggesting.

In total, there are 72 count based anomalies, though these are referencing positions 10, 11, 12, 13, 14 and 15 only. There are several positive biases:

character 4 is too common at position 10 (count: 2828…
character 7 is too common at position 10 (count: 4545…
character c is too common at position 10 (count: 5732…
character 4 is too common at position 11 (count: 2939…
character 7 is too common at position 11 (count: 4606…
character c is too common at position 11 (count: 5729…
character 2 is too common at position 12 (count: 5714…
character f is too common at position 12 (count: 9848…
character 2 is too common at position 13 (count: 5398…
character f is too common at position 13 (count: 10137…
character 4 is too common at position 14 (count: 2861…
character 7 is too common at position 14 (count: 4609…
character c is too common at position 14 (count: 5754…
character 4 is too common at position 15 (count: 2901…
character 7 is too common at position 15 (count: 4608…
character c is too common at position 15 (count: 5685…

Based on the above output we can conclude that positions 10,11,14 and 15 in our predicted tokens should be the characters ‘4’, ‘7’ or ‘c’ to stand the best chance of hitting a valid token. Similarly, positions 12 and 13 should ideally be either a ‘2’ or ‘f’ character.

When viewing the ‘Transitions’ sub-tab we can see more anomalies presented in the chart. In total, Sequencer identifies 537 ‘transition’ based anomalies.

We already know the reason for positions 5 and 9 being highlighted, these are the two characters that are always ‘4’ and ‘6’ respectively. Therefore, as each will not transition to another character, we’d expect to see this anomaly in the chart. Positions 12 and 13 have also been highlighted as an issue, but we know from the tokens that these characters do change. So to have a result like this suggests that only a small character set is used at these positions.

This brings us to the fourth sub-tab ‘Character set’.

The chart above shows that the character set at certain positions is severely limited. We already know about positions 5 and 9, but the results show that positions 2,3,4,6,7,8,12 and 13 are also limited.

The chart above shows the effective entropy at each position. As positions 5 and 9 have just 1 character, the result of zero bits of entropy is hardly surprising. The other results are expected for the positions we know to have a limited set of characters.

When sorting the 20,000 tokens into 1 position per column and sorting to show only unique entries, we see the following character sets, matching the above chart exactly. We can also highlight the positive biases identify by Sequencer (shown in bold).

Ultimately, our hypothetical aim is to use the anomalies to successfully predict a valid live token in an automated attack on an application. So based on the above results, we would build an automated attack with the following structure.

Position 0-1: Randomly choose a character from the full set of ‘0-1’ and ‘a-f’
Position 2-4: Randomly choose a character from the partial set of ‘0’,’2’,’8’,’9’,’a’,’b’
Position 5: Select character ‘4’ only
Position 6-8: Randomly choose a character from the partial set of ‘0’,’2’,’8’,’9’,’a’,’b’
Position 9: Select character ‘6’ only
Position 10-11: Randomly choose a character from the partial set of ‘4’,’7’,’c’
Position 10-11: Randomly choose a character from the partial set of ‘2’,’f’
Position 10-11: Randomly choose a character from the partial set of ‘4’,’7’,’c’

So how many of the tokens analysed actually fell within these boundaries? Using the following regular expression we can extract all the predictable tokens.

egrep -E '(^[0-9a-f]{2}[0,2,8,9,a,b]{3}4[0,2,8,9,a,b]{3}6[4,7,c]{2}[2,f]{2}[4,7,c]{2})' tokens.txt

Doing so reveals that 2,015 of the 20,000 hashes (10%) are highly predictable, which is a very significant quantity.

To perform such an attack, Burp’s Intruder with ‘custom iterator’ could potentially be used, or a custom script could be written to generate the tokens.

IS SUCH AN ATTACK LIKELY TO SUCCEED IN A REAL-WORLD SCENARIO?

The rather obvious answer is, it depends. Even with these huge flaws in the randomness of the token, and generating only keys that fall within the structure described above, we still have upwards of five billion possible keys to work through. However, there are many other variables to consider. For example, if the application has no timeout configured on sessions, then the amount of valid sessions could be considerable. Additionally, when targeting massive global applications with millions of users, the chances of one of those five billion possible keys being valid is pretty feasible. Additionally, as new users authenticate with the application, new sessions will then be established, so an attacker continually attempting highly predictable session tokens may eventually have success. However, this is all based around this highly unlikely flawed implementation. In a real-world scenario, chances are the session tokens will be generated by an established solution.

If testers are unlikely to come across exploitable tokens, why bother with Sequencer? It comes down to the level of risk a business is willing to accept. For high-security applications such as those associated with banking, or for gambling applications who rely on truly random results, any possible flaw in the randomness of tokens may be a cause for concern. Sequencer’s power is in its ability to identify the tiniest of flaws in a token implementation, and through a series of widely accepted tests, provide certain assurances for those ultimately responsible for an application’s security.

So if you’re testing a high-security security application, or believe a custom session solution may have been written, try using sequencer to interrogate the randomness.

Download Burp Suite here and improve your testing experience!

Click here to find out more about our Web Application Testing Services.