Suraj Kapoor


Director of Product @WayUp, formerly @Lerer Hippeau. Technologist, Optimistic Contrarian.


Building a URL link shortener in Python

I recently built a link shortener in Python. The crux of the shortener is a function that condenses a long URL into something much shorter:

http://bit.ly/1j469oO  

How is it possible to condense a 100+ length character string into a unique 6 length string? How's it possible to potentially generate a unique shortened link for the billions upon billions on the web? There are many ways but this is the strategy I used for my shortener.

A good start is to assign a number to every URL entered into our shortener, so the first URL is http://link/1, second is http://link2, etc... However, this approach will only work for the first 999,999 URLs before our URL length is 7 characters. Although, we've shortened the URL somewhat, we are not fully solving our problem. Converting the URL into a number is however a helpful step. We need to think about how to convert that number into a unique combination of alphanumeric digits used in URLs.

Let's take a 62 length string of lower and upper case alphanumeric characters.

"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

Now let's map the entry number of urls against it. Somethign like the following:

http://link/1 > http://link/a
http://link/2
> http://link/b
http://link/3 > http://link/c
http://link/63
> http://link/aa
http://link/64 > http://link/ab

Once we get to the 62nd link (http://link/9), we can start generating links such as http://link/aa, http://link/ab, http://link/ac and so on. Now we have the ability to generate a huge number of URLs. Let's put this strategy into context of the web.

Let's say there are one trillion pages on the web. I think it's much lower but one day, we may get there. How many combinations of different length strings are available to us using the above strategy? We are looking at simple maths that involve powers of 62.

1 length path (e.g. http://link/a) 62 ** 1 62
2 length path(e.g. http://link/ab)
62 ** 2 3844
3 length path(e.g. http://link/abc)
62 ** 3 238,328
4 length path(e.g. http://link/abcd)
62 ** 4 14,776,336
5 length path(e.g. http://link/abcde)
62 ** 5 916,132,832
6 length path(e.g. http://link/abcdef)
62 ** 6 == 5.68e +10 (or 56 billion)

Now, our link shortener has the ability to take up to 56 billion unique URLs and remain at only a 6 character length. Pretty cool! All we need is a simple function that will just generate a unique combination of letters, and here is what I used:

alpha = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

def numb_to_string(num):  
    output = []
    while num:
        rem = num % len(alpha)
        num = num / len(alpha)
        output.append(alpha[rem])
    return str(''.join(output))

The function is not disimilar to a hashing function. The entire code for my shortener can be found here https://github.com/surajkapoor/Link-Shortener.

There are issues with the above, namely that URL generation can become predicatble, but it's still pretty cool to be able to built something so useful in a few lines of code!