4.5 KiB
Design Principles
Context
URL shortening is a technique in which a URL may be made substantially shorter and still direct to the required page, tells us Wikipedia. URL Shorteners are ubiquitous but nevertheless have some drawbacks.
- They are run by big companies that can collect usage data and thus impair privacy.
- Public URL shortening services usually bind their users with an EULA.
- There is no guarantee that short URLs from a public service will be kept active. In fact, they can disappear at anytime if the backing company goes bankrupt.
And last but not least, all URL shorteners, be it a public service or a self-hosted one will complicate the tasks of the web archeologist that will try to re-construct the history of the Web from the available evidences in ten, fifty or hundred years.
All those URL shortener keep their mapping table (short URL -> long URL) private and this means the user (the one accessing the short URL, not the one creating it) is completely dependent from the URL shortening service.
Project goals
This project strives to provide a URL shortening service, that:
- is public and transparent: the mapping table (short URL -> long URL) is public and versioned in a GIT repository. Everyone can fork it, keep it in a safe place or update it with his own URLs.
- is self-hostable and respects the user privacy: instead of a couple of big instances of the URL shortening service, we would like to encourage smaller instances that would drive significantly less traffic and less temptation to drive income from usage data.
Technical Design
A GIT Repository to store the mapping table
The mapping table is stored in a GIT repository as a file or collection of files. The file(s) contains the mapping table in a format that is easy to write for a human, easily parseable by the machine and "merge-friendly".
The user wanting to add a short URL to the mapping table could:
- Fork the repository containing the mapping table
- Add his mappings to the table
- Create a branch with his modifications
- Commit his changes
- Push them to his own fork
- Submit a Pull Request to ask for inclusion in the main mapping table
The Pull Request could be subject to approval, review, etc.
Auto-generated vs custom code
Usual URL Shorteners can generate a random short code or take a custom code from the user.
In order to be as stateless as possible, the generated short code cannot be random but needs to be generated deterministically. It can be a hash from the input URL for instance.
Hashing algorithm
URL Shorteners such as bit.ly use a combination of lower case, upper case letters plus digits to generate a random code. The code is seven characters long.
This translates to about 41 bits of entropy:
$ echo 'l(62^7)/l(2)' |bc -l
41.67937417270812646207
To implement a similar mechanism but fully deterministic, the SHA256 algorithm is used to hash the target URL and the first six bytes are encoded in base64.
The short code for https://framasoft.org/ is computed as such:
$ echo -n https://framasoft.org/ | openssl dgst -sha256 -binary |head -c6 |openssl base64 -e
t0P0JMya
File Format of the mapping table
The mapping table uses YAML as file format it fits all the requirements.
A sample mapping table looks like:
---
base_url: https://short.code/
mapping:
- url: https://framasoft.org/
- url: https://www.gnu.org/
short-code: gnu-home
This sample table defines two entries:
https://short.code/t0P0JMyathat maps tohttps://framasoft.org/https://short.code/gnu-homethat maps tohttps://www.gnu.org/
Deployment
The app is packaged and deployed as a container. In this case, we need to take into account that the filesystem might be read-only. Updates of the mapping table comes with a new deployment of an updated image of the container.
In order to achieve rolling updates without service interruption, a health probe needs to be implemented.
When the app is deployed outside of a container, an update of the mapping table
can be triggered by a git pull (from a crontab for instance). The app needs
to monitor the file containing the mapping table and hot reload the file once
modifications are detected.
Coding principles
This app follows the 12 factors.
Minimal Viable Product
The MVP of this project has the following features:
- reads only one mapping table
- serves the requests for only one domain
- supports auto-generated and custom codes
- packaged as a container