Yet Another URL Shortener
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 

4.5 KiB

Design Principles

Context

URL shortening is a technique in which a URL may be made substantially shorter and still direct to the required page, tells us Wikipedia. URL Shorteners are ubiquitous but nevertheless have some drawbacks.

  • They are run by big companies that can collect usage data and thus impair privacy.
  • Public URL shortening services usually bind their users with an EULA.
  • There is no guarantee that short URLs from a public service will be kept active. In fact, they can disappear at anytime if the backing company goes bankrupt.

And last but not least, all URL shorteners, be it a public service or a self-hosted one will complicate the tasks of the web archeologist that will try to re-construct the history of the Web from the available evidences in ten, fifty or hundred years.

All those URL shortener keep their mapping table (short URL -> long URL) private and this means the user (the one accessing the short URL, not the one creating it) is completely dependent from the URL shortening service.

Project goals

This project strives to provide a URL shortening service, that:

  • is public and transparent: the mapping table (short URL -> long URL) is public and versioned in a GIT repository. Everyone can fork it, keep it in a safe place or update it with his own URLs.
  • is self-hostable and respects the user privacy: instead of a couple of big instances of the URL shortening service, we would like to encourage smaller instances that would drive significantly less traffic and less temptation to drive income from usage data.

Technical Design

A GIT Repository to store the mapping table

The mapping table is stored in a GIT repository as a file or collection of files. The file(s) contains the mapping table in a format that is easy to write for a human, easily parseable by the machine and "merge-friendly".

The user wanting to add a short URL to the mapping table could:

  • Fork the repository containing the mapping table
  • Add his mappings to the table
  • Create a branch with his modifications
  • Commit his changes
  • Push them to his own fork
  • Submit a Pull Request to ask for inclusion in the main mapping table

The Pull Request could be subject to approval, review, etc.

Auto-generated vs custom code

Usual URL Shorteners can generate a random short code or take a custom code from the user.

In order to be as stateless as possible, the generated short code cannot be random but needs to be generated deterministically. It can be a hash from the input URL for instance.

Hashing algorithm

URL Shorteners such as bit.ly use a combination of lower case, upper case letters plus digits to generate a random code. The code is seven characters long.

This translates to about 41 bits of entropy:

$ echo 'l(62^7)/l(2)' |bc -l
41.67937417270812646207

To implement a similar mechanism but fully deterministic, the SHA256 algorithm is used to hash the target URL and the first six bytes are encoded in base64.

The short code for https://framasoft.org/ is computed as such:

$ echo -n https://framasoft.org/ | openssl dgst -sha256 -binary |head -c6 |openssl base64 -e
t0P0JMya

File Format of the mapping table

The mapping table uses YAML as file format it fits all the requirements.

A sample mapping table looks like:

---
base_url: https://short.code/
mapping:
- url: https://framasoft.org/
- url: https://www.gnu.org/
  short-code: gnu-home

This sample table defines two entries:

  • https://short.code/t0P0JMya that maps to https://framasoft.org/
  • https://short.code/gnu-home that maps to https://www.gnu.org/

Deployment

The app is packaged and deployed as a container. In this case, we need to take into account that the filesystem might be read-only. Updates of the mapping table comes with a new deployment of an updated image of the container.

In order to achieve rolling updates without service interruption, a health probe needs to be implemented.

When the app is deployed outside of a container, an update of the mapping table can be triggered by a git pull (from a crontab for instance). The app needs to monitor the file containing the mapping table and hot reload the file once modifications are detected.

Coding principles

This app follows the 12 factors.

Minimal Viable Product

The MVP of this project has the following features:

  • reads only one mapping table
  • serves the requests for only one domain
  • supports auto-generated and custom codes
  • packaged as a container